NVidia驱动更新及Docker GPU安装更新

NVidia对Linux支持最近进步挺大的,Docker和Kubernetes能够直接使用GPU能力。NVidia最新的显卡驱动是440.31,而Ubuntu 18.04的内置库也到了430版本,CUDA到了10.1版本。git

一、NVidia驱动更新

Docker中使用GPU原来是须要安装nvidia-docker2的(方法在下面),已经不须要了:github

Kubernetes中的容器也能够直接使用GPU了。以下:docker

#### Test nvidia-smi with the latest official CUDA image
$ docker run --gpus all nvidia/cuda:9.0-base nvidia-smi

# Start a GPU enabled container on two GPUs
$ docker run --gpus 2 nvidia/cuda:9.0-base nvidia-smi

# Starting a GPU enabled container on specific GPUs
$ docker run --gpus '"device=1,2"' nvidia/cuda:9.0-base nvidia-smi
$ docker run --gpus '"device=UUID-ABCDEF,1"' nvidia/cuda:9.0-base nvidia-smi

# Specifying a capability (graphics, compute, ...) for my container
# Note this is rarely if ever used this way
$ docker run --gpus all,capabilities=utility nvidia/cuda:9.0-base nvidia-smi

问题:ubuntu

  • 安装如上方法安装后,docker ps显示没有任何容器,在安装nvidia-docker2后就能够了。估计是版本兼容性的问题,须要进一步验证。

1.1 NVidia驱动下载

直接下载:curl

wget -c http://us.download.nvidia.com/XFree86/Linux-x86_64/440.31/NVIDIA-Linux-x86_64-440.31.run

若是之前安装过NVidia的驱动,须要先卸载,而后再安装。参考:测试

AS:this

sudo apt-get --purge remove nvidia-*
# sudo ./NVIDIA-Linux-x86_64-410.57.run -uninstall

sudo update-initramfs -u
sudo reboot now

1.2 CUDA驱动下载

在Ubuntu上,执行:url

wget -c https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/cuda-ubuntu1804.pin
sudo mv cuda-ubuntu1804.pin /etc/apt/preferences.d/cuda-repository-pin-600
wget -c http://developer.download.nvidia.com/compute/cuda/10.1/Prod/local_installers/cuda-repo-ubuntu1804-10-1-local-10.1.243-418.87.00_1.0-1_amd64.deb
sudo dpkg -i cuda-repo-ubuntu1804-10-1-local-10.1.243-418.87.00_1.0-1_amd64.deb
sudo apt-key add /var/cuda-repo-10-1-local-10.1.243-418.87.00/7fa2af80.pub
sudo apt-get update
sudo apt-get -y install cuda

1.3 测试Docker的GPU能力

Docker版本(须要指定runtime):spa

docker run --runtime=nvidia --rm nvidia/cuda nvidia-smi

原来的--runtime=nvidia也能运行(需安装nvidia-docker2),但最新的版本使用--gpus参数(不须要安装nvidia-docker2)。操作系统

二、Docker GPU支持更新问题解决

在Ubuntu 18.04上运行apt update时出现下面的错误信息:

“没法下载 https://nvidia.github.io/nvidia-container-runtime/ubuntu18.04/amd64/InRelease  因为没有公钥,没法验证下列签名: NO_PUBKEY xxx"

估计是之前版本的pubkey过时了,解决办法:

  • 基于Debian的Linux(如Ubuntu):
DIST=$(. /etc/os-release; echo $ID$VERSION_ID)
curl -s -L https://nvidia.github.io/libnvidia-container/gpgkey | \
  sudo apt-key add -
curl -s -L https://nvidia.github.io/libnvidia-container/$DIST/libnvidia-container.list | \
  sudo tee /etc/apt/sources.list.d/libnvidia-container.list
sudo apt-get update

而后,就能够正常更新了。

三、nvidia-docker2安装与更新

参考NVidia的主页(https://github.com/NVIDIA/nvidia-docker)。

  • 从Docker 19.03开始,NVIDIA GPU已经内置在Docker中,再也不须要nvidia-docker2。
  • 原说明以下:
    • Note that with the release of Docker 19.03, usage of nvidia-docker2 packages are deprecated since NVIDIA GPUs are now natively supported as devices in the Docker runtime. If you are an existing user of the nvidia-docker2 packages, review the instructions in the “Upgrading with nvidia-docker2” section.

以下:

docker run --gpus all nvidia/cuda:9.0-base nvidia-smi

安装nvidia-docker2:

# Add the package repositories
$ distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
$ curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add -
$ curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list

$ sudo apt-get update && sudo apt-get install -y nvidia-container-toolkit
$ sudo systemctl restart docker

 

其它操做系统,参考:

相关文章
相关标签/搜索