在阿里云的GPU服务器上安装cuda没有什么问题,在自己的服务器上安装遇到一些问题,记录如下
注意
硬盘空间 需要root用户安装
参考
CentOS 7 设置默认进入图形界面或文本界面
Linux安装Nvidia显卡驱动:禁用The Nouveau kernel driver的方法!
查看显卡是否存在
[cndaqiang@mom ~]$ lspci | grep -i vga
05:00.0 VGA compatible controller: NVIDIA Corporation GM107 [GeForce GTX 750 Ti] (rev a2)
09:00.0 VGA compatible controller: Matrox Electronics Systems Ltd. MGA G200e [Pilot] ServerEngines (SEP1) (rev 05)
禁用Nouveau驱动
Centos之前安装了Nouveau的驱动
- 把驱动加入黑名单中: 在
/etc/modprobe.d/blacklist.conf
在后面加入:blacklist nouveau
- 使用 dracut重新建立 initramfs p_w_picpath file :
#备份 the initramfs file $ sudo mv /boot/initramfs-$(uname -r).img /boot/initramfs-$(uname -r).img.bak #重新建立 the initramfs file $ sudo dracut -v /boot/initramfs-$(uname -r).img $(uname -r)
开机默认文本模式
[cndaqiang@mom ~]$ systemctl get-default
multi-user.target
[cndaqiang@mom ~]$ sudo systemctl set-default multi-user.target
重启,检查nouveau driver确保没有被加载!
$ lsmod | grep nouveau
一些依赖
yum install epel-release
yum install --enablerepo=epel dkms
安装
从CUDA Toolkit 10.1 Download下载cuda_10.1.105_418.39_linux.run
./cuda_10.1.105_418.39_linux.run
安装即可,安装结果
[root@mom cndaqiang]# ./cuda_10.1.105_418.39_linux.run
===========
= Summary =
===========
Driver: Installed
Toolkit: Installed in /usr/local/cuda-10.1/
Samples: Not Selected
Please make sure that
- PATH includes /usr/local/cuda-10.1/bin
- LD_LIBRARY_PATH includes /usr/local/cuda-10.1/lib64, or, add /usr/local/cuda-10.1/lib64 to /etc/ld.so.conf and run ldconfig as root
最后把安装所说的添加PATH和LD_LIBRARY_PATH
编译vasp所需的cublas
和cuda
库,在安装cuda时自定义的安装目录的lib目录下,或者在 /usr/lib
里面,编译时,注意修改makefile.include
查看运行状况
[cndaqiang@mom vasp.5.4.4-gpu]$ nvidia-smi
Mon Apr 1 21:37:29 2019
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 418.39 Driver Version: 418.39 CUDA Version: 10.1 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce GTX 750 Ti Off | 00000000:05:00.0 Off | N/A |
| 30% 25C P0 1W / 38W | 0MiB / 2001MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
或实时显示watch -n 0 "nvidia-smi"
本文首发于我的博客@cndaqiang.
本博客所有文章除特别声明外,均采用 CC BY-SA 4.0 协议 ,转载请注明出处!