Ubuntu(这里是用16.04LTS桌面版,若是是17.04及之后版本,由于使用的显示服务器不一样,可能又会有所不一样)安装NVIDIA的显卡驱动常常出现启动后死循环进不去系统的状况,这里推荐的方法能够安装最新的驱动(版本396)和Cuda Toolkit,在最新的Titan V显卡测试可用。linux
sudo apt install synaptic sudo synaptic
这样安装的驱动是通过Ubuntu测试过得,比较保险。不过,版本较旧一点,我安装的Ubuntu16.04 LTS里的NVidia驱动默认是384版本。git
sudo apt install nvidia-384
sudo add-apt-repository ppa:graphics-drivers/ppa sudo apt update sudo apt install nvidia-396 #开发使用 sudo apt install nvidia-396-dev
安装CUDA驱动会自动安装图形卡驱动,能够从 https://developer.nvidia.com/cuda-downloads 下载安装,已经支持最新的Volta架构(目前只有采用V100芯片的Titan V图形卡和Tesla计算卡使用)。github
#获取Cuda9.1安装文件文件和2018.5.5的补丁包: wget -c https://developer.nvidia.com/compute/cuda/9.1/Prod/local_installers/cuda_9.1.85_387.26_linux wget -c https://developer.nvidia.com/compute/cuda/9.1/Prod/patches/3/cuda_9.1.85.3_linux #而后运行 sudo chmod +x ...,再执行就能够了。
Ubuntu18.04:docker
https://developer.nvidia.com/compute/cuda/10.0/Prod/local_installers/cuda-repo-ubuntu1804-10-0-local-10.0.130-410.48_1.0-1_amd64
Install:ubuntu
sudo dpkg -i cuda-repo-ubuntu1804-10-0-local-10.0.130-410.48_1.0-1_amd64.deb sudo apt-key add /var/cuda-repo-<version>/7fa2af80.pub sudo apt-get update sudo apt-get install cuda
装完后还须要一些设置,才可使用,提示以下:服务器
=========== = Summary = =========== Driver: Not Selected Toolkit: Installed in /usr/local/cuda-9.1 Samples: Installed in /home/openthings, but missing recommended libraries Please make sure that - PATH includes /usr/local/cuda-9.1/bin - LD_LIBRARY_PATH includes /usr/local/cuda-9.1/lib64, or, add /usr/local/cuda-9.1/lib64 to /etc/ld.so.conf and run ldconfig as root To uninstall the CUDA Toolkit, run the uninstall script in /usr/local/cuda-9.1/bin Please see CUDA_Installation_Guide_Linux.pdf in /usr/local/cuda-9.1/doc/pdf for detailed information on setting up CUDA. ***WARNING: Incomplete installation! This installation did not install the CUDA Driver. A driver of version at least 384.00 is required for CUDA 9.1 functionality to work. To install the driver using this installer, run the following command, replacing <CudaInstaller> with the name of this run file: sudo <CudaInstaller>.run -silent -driver
要安装更新的驱动,能够到NVidia官网(http://www.nvidia.cn/Download/index.aspx?lang=cn)下载。架构
安装时,要求关闭xserver,运行:curl
sudo service lightdm stop
按ctl+alt+F1进入命令行模式。再按ctl+alt+F7能够返回图形界面。ide
运行完后,重启lightdm,再运行:测试
sudo service lightdm start
不过,因为测试不太充分,安装复杂不说,还会遇到重启后挂起的现象,致使没法登陆。
能够启动时进入“高级-Recovery”模式,而后在命令行下从新设置。
运行:
dpkg-reconfigure lightdm
系统修复措施,参考:
要是还不行的话,就只能从新安装系统了。
安装NVidia支持的Docker引擎,就能够在容器中使用GPU了。具体步骤以下:
# If you have nvidia-docker 1.0 installed: we need to remove it and all existing GPU containers docker volume ls -q -f driver=nvidia-docker | xargs -r -I{} -n1 docker ps -q -a -f volume={} | xargs -r docker rm -f sudo apt-get purge -y nvidia-docker # Add the package repositories curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | \ sudo apt-key add - distribution=$(. /etc/os-release;echo $ID$VERSION_ID) curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | \ sudo tee /etc/apt/sources.list.d/nvidia-docker.list sudo apt-get update # Install nvidia-docker2 and reload the Docker daemon configuration sudo apt-get install -y nvidia-docker2 sudo pkill -SIGHUP dockerd # Test nvidia-smi with the latest official CUDA image docker run --runtime=nvidia --rm nvidia/cuda nvidia-smi
注意,如今像上面运行 Docker 能够直接支持GPU了,不用再单独运行Docker-Nvidia命令了,大大加强了与各类容器编排系统的兼容性,Kubernetes目前也已经能够支持Docker容器运行GPU了。
目前版本依赖Docker 18.03版,若是已经安装了其它版本,能够指定安装的版本,以下:
sudo apt install docker-ce=18.03.1~ce-0~ubuntu
详细的参考: