本文共 6531 字,大约阅读时间需要 21 分钟。
准备安装包
安装基础环境
# 检查显卡$ lspci | grep -i vga04:00.0 VGA compatible controller: NVIDIA Corporation Device 1b00 (rev a1)# 检查系统版本,确保系统支持(需要Linux-64bit系统)$ uname -m && cat /etc/*releasex86_64CentOS Linux release 7.2.1511 (Core)# 安装GCC$ yum install gcc gcc-c++# 安装Kernel Headers Packages$ yum install kernel-devel-$(uname -r) kernel-headers-$(uname -r)
开始安装显卡驱动
$ sh NVIDIA-Linux-x86_64-381.22.run
开始安装CUDA
$ sh cuda_8.0.61_375.26_linux.run# accept------------------------------------------------------------- Do you accept the previously read EULA?accept/decline/quit: accept# noInstall NVIDIA Accelerated Graphics Driver for Linux-x86_64 375.26?(y)es/(n)o/(q)uit: n-------------------------------------------------------------# 后面的就都选yes或者defaultDo you want to install the OpenGL libraries?(y)es/(n)o/(q)uit [ default is yes ]: Do you want to run nvidia-xconfig?This will update the system X configuration file so that the NVIDIA X driveris used. The pre-existing X configuration file will be backed up.This option should not be used on systems that require a customX configuration, such as systems with multiple GPU vendors.(y)es/(n)o/(q)uit [ default is no ]: yInstall the CUDA 8.0 Toolkit?(y)es/(n)o/(q)uit: yEnter Toolkit Location [ default is /usr/local/cuda-8.0 ]: Do you want to install a symbolic link at /usr/local/cuda?(y)es/(n)o/(q)uit: yInstall the CUDA 8.0 Samples?(y)es/(n)o/(q)uit: yEnter CUDA Samples Location [ default is /root ]: Installing the NVIDIA display driver...The driver installation has failed due to an unknown error. Please consult the driver installation log located at /var/log/nvidia-installer.log.============ Summary ============Driver: Not SelectedToolkit: Installed in /usr/local/cuda-8.0Samples: Installed in /root, but missing recommended librariesPlease make sure that - PATH includes /usr/local/cuda-8.0/bin - LD_LIBRARY_PATH includes /usr/local/cuda-8.0/lib64, or, add /usr/local/cuda-8.0/lib64 to /etc/ld.so.conf and run ldconfig as rootTo uninstall the CUDA Toolkit, run the uninstall script in /usr/local/cuda-8.0/binPlease see CUDA_Installation_Guide_Linux.pdf in /usr/local/cuda-8.0/doc/pdf for detailed information on setting up CUDA.***WARNING: Incomplete installation! This installation did not install the CUDA Driver. A driver of version at least 361.00 is required for CUDA 8.0 functionality to work.To install the driver using this installer, run the following command, replacingwith the name of this run file: sudo .run -silent -driverLogfile is /tmp/cuda_install_192.log
验证安装结果
# 添加环境变量# 在 ~/.bashrc的最后面添加下面两行export PATH=/usr/local/cuda-8.0/bin:$PATHexport LD_LIBRARY_PATH=/usr/local/cuda-8.0/lib64:/usr/local/cuda-8.0/extras/CUPTI/lib64:$LD_LIBRARY_PATH# 使生效$ source ~/.bashrc# 验证安装结果$ nvcc -Vnvcc: NVIDIA (R) Cuda compiler driverCopyright (c) 2005-2016 NVIDIA CorporationBuilt on Tue_Jan_10_13:22:03_CST_2017Cuda compilation tools, release 8.0, V8.0.61$ nvidia-smi+-----------------------------------------------------------------------------+| NVIDIA-SMI 381.22 Driver Version: 381.22 ||-------------------------------+----------------------+----------------------+| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC || Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. ||===============================+======================+======================|| 0 Graphics Device Off | 0000:02:00.0 Off | N/A || 21% 50C P8 33W / 265W | 8MiB / 11172MiB | 0% Default |+-------------------------------+----------------------+----------------------++-----------------------------------------------------------------------------+| Processes: GPU Memory || GPU PID Type Process name Usage ||=============================================================================|| No running processes found |+-----------------------------------------------------------------------------+
$ tar -xvzf cudnn-8.0-linux-x64-v6.0.tgz$ cp -P cuda/include/cudnn.h /usr/local/cuda-8.0/include$ cp -P cuda/lib64/libcudnn* /usr/local/cuda-8.0/lib64$ chmod a+r /usr/local/cuda-8.0/include/cudnn.h /usr/local/cuda-8.0/lib64/libcudnn*
下载安装GPU版本的TensorFlow,运行以下代码即可测试,无报错说明cuda安装成功
import tensorflow as tfsess = tf.Session()sess.run()
下面介绍了如何在docker中使用cuda,主要使用了nvidia-docker
安装docker
$ yum install docker# 启动 Docker 服务,并将其设置为开机启动$ systemctl start docker.service $ systemctl enable docker.service
安装
# Install nvidia-docker and nvidia-docker-pluginwget -P /tmp https://github.com/NVIDIA/nvidia-docker/releases/download/v1.0.1/nvidia-docker-1.0.1-1.x86_64.rpmsudo rpm -i /tmp/nvidia-docker*.rpm && rm /tmp/nvidia-docker*.rpmsudo systemctl start nvidia-docker
编写Dockerfile,添加Label标签
# 创建文件夹$ mkdir dockerBuild$ cd dockerBuild/# 新建Dockerfile文件$ vim Dockerfile# 输入以下内容# FROM 原DOCKER_IMAGEFROM {SOURCE_DOCKER_IMAGE_NAME}# MAINTAINER 作者MAINTAINER {your name}# needed cudaLABEL com.nvidia.volumes.needed="nvidia_driver"# cuda versionLABEL com.nvidia.cuda.version="8.0"# docker build 注意最后有个点 指的是使用当前目录下的Dockerfile进行build$ docker build -t DOCKER_IMAGE_NAME .# 运行结果 我使用的bamos/openface来编译的Sending build context to Docker daemon 2.048 kBStep 1 : FROM {DOCKER_IMAGE_NAME} ---> 62d1673065e8Step 2 : MAINTAINER {your name} ---> Running in b4dc9a88db63 ---> 7a77f65d0908Removing intermediate container b4dc9a88db63Step 3 : LABEL com.nvidia.volumes.needed "nvidia_driver" ---> Running in 126095cdc342 ---> c9035ebe54f8Removing intermediate container 126095cdc342Step 4 : LABEL com.nvidia.cuda.version "8.0" ---> Running in 12a0e5298d1e ---> 682db15bd6caRemoving intermediate container 12a0e5298d1eSuccessfully built 682db15bd6ca# 查看Label是否成功添加$ docker inspect DOCKER_IMAGE_NAME ......"Labels": { "com.nvidia.cuda.version": "8.0", "com.nvidia.volumes.needed": "nvidia_driver",}......# 使用nvidia-docker run 来运行编译好的image$ nvidia-docker run -it -d DOCKER_IMAGE_NAME /bin/bash
给容器安装cuda, cuDNN
前面提到通过nvidia-docker run 可以将显卡设备和显卡驱动加载到container里,但这个时候只是能够使用显卡而已,cuda, cuDNN 还没有安装
# 进入容器$ docker attach CONTAINER_NAME/CONTAINER_ID# 开始执行本文的 cuda, cuDNN安装步骤即可!