TensorFlow 从入门到精通(一):安装和使用
安装过程PIP 安装安装 PIP安装 TensorFlowPIP 安装的优缺点源码编译安装下载源码安装 Bazel配置编译源码安装的优缺点Docker 镜像安装官方镜像创建 Docker 用户组启动 Docker 容器Docker 镜像安装的优缺点使用过程K40 上运行输出M40 上运行输出GTX1080 上运行输出P40 上输出注意事项安装过程目前较为稳定
安装过程
目前较为稳定的版本为 0.12,本文以此为例。其他版本请读者自行甄别安装步骤是否需要根据实际情况修改。
TensorFlow 支持以下几种安装方式:
- PIP 安装
- 源码编译安装
- Docker 镜像安装
PIP 安装
PIP 是一种包管理系统,用于安装和管理用 Python 写的软件包。 —— [ Python PIP ]
安装 PIP
# Ubuntu/Linux 64-bit
$ sudo apt-get install python-pip python-dev
# CentOS, Fedora, RHEL
$ sudo yum install python-pip python-devel
# Mac OS X
$ sudo easy_install pip
安装 TensorFlow
# Python 2
$ sudo pip install --upgrade $TF_BINARY_URL
# Python 3
$ sudo pip3 install --upgrade $TF_BINARY_URL
其中环境变量 TF_BINARY_URL 根据你的环境进行设置,典型选项如下:
# Ubuntu/Linux 64-bit, CPU only, Python 2.7
$ export TF_BINARY_URL=https://storage.googleapis.com/tensorflow/linux/cpu/tensorflow-0.12.1-cp27-none-linux_x86_64.whl
# Ubuntu/Linux 64-bit, GPU enabled, Python 2.7
# 需要 CUDA toolkit 8.0 和 CuDNN v5. 其他版本只能用源码方式安装
$ export TF_BINARY_URL=https://storage.googleapis.com/tensorflow/linux/gpu/tensorflow_gpu-0.12.1-cp27-none-linux_x86_64.whl
# Mac OS X, CPU only, Python 2.7:
$ export TF_BINARY_URL=https://storage.googleapis.com/tensorflow/mac/cpu/tensorflow-0.12.1-py2-none-any.whl
# Mac OS X, GPU enabled, Python 2.7:
$ export TF_BINARY_URL=https://storage.googleapis.com/tensorflow/mac/gpu/tensorflow_gpu-0.12.1-py2-none-any.whl
# Ubuntu/Linux 64-bit, CPU only, Python 3.4
$ export TF_BINARY_URL=https://storage.googleapis.com/tensorflow/linux/cpu/tensorflow-0.12.1-cp34-cp34m-linux_x86_64.whl
# Ubuntu/Linux 64-bit, GPU enabled, Python 3.4
# 需要 CUDA toolkit 8.0 和 CuDNN v5. 其他版本只能用源码方式安装
$ export TF_BINARY_URL=https://storage.googleapis.com/tensorflow/linux/gpu/tensorflow_gpu-0.12.1-cp34-cp34m-linux_x86_64.whl
# Ubuntu/Linux 64-bit, CPU only, Python 3.5
$ export TF_BINARY_URL=https://storage.googleapis.com/tensorflow/linux/cpu/tensorflow-0.12.1-cp35-cp35m-linux_x86_64.whl
# Ubuntu/Linux 64-bit, GPU enabled, Python 3.5
# Requires CUDA toolkit 8.0 and CuDNN v5. For other versions, see "Installing from sources" below.
$ export TF_BINARY_URL=https://storage.googleapis.com/tensorflow/linux/gpu/tensorflow_gpu-0.12.1-cp35-cp35m-linux_x86_64.whl
# Mac OS X, CPU only, Python 3.4 or 3.5:
$ export TF_BINARY_URL=https://storage.googleapis.com/tensorflow/mac/cpu/tensorflow-0.12.1-py3-none-any.whl
# Mac OS X, GPU enabled, Python 3.4 or 3.5:
$ export TF_BINARY_URL=https://storage.googleapis.com/tensorflow/mac/gpu/tensorflow_gpu-0.12.1-py3-none-any.whl
我的做法是先把 whl 文件下载到本地,然后拷贝到其他机器上安装。
运行记录如下:
# pip install tensorflow_gpu-0.12.1-cp27-none-linux_x86_64.whl
You are using pip version 7.1.0, however version 9.0.1 is available.
You should consider upgrading via the 'pip install --upgrade pip' command.
Processing ./tensorflow_gpu-0.12.1-cp27-none-linux_x86_64.whl
Collecting mock>=2.0.0 (from tensorflow-gpu==0.12.1)
Using cached mock-2.0.0-py2.py3-none-any.whl
Collecting protobuf>=3.1.0 (from tensorflow-gpu==0.12.1)
Downloading protobuf-3.1.0.post1-py2.py3-none-any.whl (347kB)
100% |████████████████████████████████| 348kB 164kB/s
Collecting numpy>=1.11.0 (from tensorflow-gpu==0.12.1)
Downloading numpy-1.11.3.zip (4.7MB)
100% |████████████████████████████████| 4.7MB 34kB/s
Collecting wheel (from tensorflow-gpu==0.12.1)
Downloading wheel-0.29.0-py2.py3-none-any.whl (66kB)
100% |████████████████████████████████| 69kB 58kB/s
Collecting six>=1.10.0 (from tensorflow-gpu==0.12.1)
Downloading six-1.10.0-py2.py3-none-any.whl
Collecting funcsigs>=1 (from mock>=2.0.0->tensorflow-gpu==0.12.1)
Downloading funcsigs-1.0.2-py2.py3-none-any.whl
Collecting pbr>=0.11 (from mock>=2.0.0->tensorflow-gpu==0.12.1)
Downloading pbr-1.10.0-py2.py3-none-any.whl (96kB)
100% |████████████████████████████████| 98kB 93kB/s
Requirement already satisfied (use --upgrade to upgrade): setuptools in /usr/lib/python2.7/site-packages (from protobuf>=3.1.0->tensorflow-gpu==0.12.1)
Installing collected packages: funcsigs, six, pbr, mock, protobuf, numpy, wheel, tensorflow-gpu
Found existing installation: six 1.9.0
Uninstalling six-1.9.0:
Successfully uninstalled six-1.9.0
Running setup.py install for numpy
Successfully installed funcsigs-1.0.2 mock-2.0.0 numpy-1.11.3 pbr-1.10.0 protobuf-3.1.0.post1 six-1.10.0 tensorflow-gpu-0.12.1 wheel-0.29.0
从运行记录可以看到 pip 安装 TensorFlow 时会自动将依赖安装。如果你需要在另一台没有网络的机器上安装 TensorFlow,需要将依赖打包拷过去才能正常运行。
PIP 安装的优缺点
优点:应该是几种安装方法中最快的一种
缺点:不能灵活定制,操作系统、GPU 硬件、CUDA 版本、cuDNN 版本必须与官方标称一致
源码编译安装
本小节的安装方法适合对 TensorFlow 做定制的场景。
下载源码
$ git clone --recurse-submodules https://github.com/tensorflow/tensorflow
安装 Bazel
参考 http://bazel.io/docs/install.html
配置
$ ./configure
根据你的实际情况如实回答一系列问题。回答之后 bazel 会对环境进行配置,此时需要机器可以访问外网,便于获取一些编译依赖包。一些包可能需要翻墙。
编译
仅 CPU 支持,无 GPU 支持:
$ bazel build -c opt //tensorflow/tools/pip_package:build_pip_package
有 GPU 支持:
$ bazel build -c opt --config=cuda //tensorflow/tools/pip_package:build_pip_package
- 生成 pip 安装包
$ bazel-bin/tensorflow/tools/pip_package/build_pip_package /tmp/tensorflow_pkg
- 使用 PIP 工具安装
$ pip install /tmp/tensorflow_pkg/tensorflow-x.x.x-py2-none-linux_x86_64.whl
源码安装的优缺点
优点:灵活定制,可按需增加新特性
缺点:门槛高,配置编译环境耗时较长
Docker 镜像安装
Docker 是一个开源的应用容器引擎,让开发者可以打包他们的应用以及依赖包到一个可移植的容器中,然后发布到任何流行的 Linux 机器上,也可以实现虚拟化。 —— [ Docker ]
当你通过 Docker 安装和运行 TensorFlow 时,它与你机器上之前已安装的软件包完全隔离。
官方镜像
官方提供了 4 个 Docker 镜像可供使用:
- 仅 CPU 版,无开发环境:
gcr.io/tensorflow/tensorflow
- 仅 CPU 版,有开发环境:
gcr.io/tensorflow/tensorflow:latest-devel
- 支持 GPU,无开发环境:
gcr.io/tensorflow/tensorflow:latest-gpu
- 支持 GPU,有开发环境:
gcr.io/tensorflow/tensorflow:latest-devel-gpu
另外提供了对应某个发布版本的镜像,只需将上面 tag 中 latest 替换为发布版本号。安装详细步骤如下:
创建 Docker 用户组
允许普通用户无需 sudo 即可启动容器。
$ usermod -a -G docker YOURNAME
启动 Docker 容器
选择上述 4 个镜像中的一个,创建容器。第一次执行该命令时会自动下载镜像,以后不需要再次下载。
$ docker run -it gcr.io/tensorflow/tensorflow
如果你使用了支持 GPU 的容器,在运行该命令时需要增加额外参数,目的是将宿主机上的 GPU 设备暴露给容器。使用 TensorFlow 源码中提供的脚本可以实现该功能
$ cd $TENSORFLOW_ROOT/tensorflow/tools/docker/
$ ./docker_run_gpu.sh gcr.io/tensorflow/tensorflow:gpu
好奇心驱使我们查看该脚本的具体细节:
#!/usr/bin/env bash
set -e
export CUDA_HOME=${CUDA_HOME:-/usr/local/cuda}
if [ ! -d ${CUDA_HOME}/lib64 ]; then
echo "Failed to locate CUDA libs at ${CUDA_HOME}/lib64."
exit 1
fi
export CUDA_SO=$(\ls /usr/lib/x86_64-linux-gnu/libcuda.* | \
xargs -I{} echo '-v {}:{}')
export DEVICES=$(\ls /dev/nvidia* | \
xargs -I{} echo '--device {}:{}')
if [[ "${DEVICES}" = "" ]]; then
echo "Failed to locate NVidia device(s). Did you want the non-GPU container?"
exit 1
fi
docker run -it $CUDA_SO $DEVICES "$@"
主要做了几件事:
- 暴露宿主机的 CUDA_HOME 环境变量给容器使用;
- 暴露宿主机的 libcuda.* 动态链接库给容器访问;
- 暴露宿主机的 /dev/nvidia* 设备给容器访问;
Docker 镜像安装的优缺点
优点:适合在大量相同环境机器构成的集群上批量部署
缺点:有墙的孩子像根草,增加了 Docker 学习成本
使用过程
假设读者已经按照上述步骤安装了 GPU 版本 TensorFlow 0.12,接下来可以运行经典例程(MNIST):
# python -m tensorflow.models.image.mnist.convolutional
K40 上运行输出
I tensorflow/stream_executor/dso_loader.cc:111] successfully opened CUDA library libcublas.so locally
I tensorflow/stream_executor/dso_loader.cc:111] successfully opened CUDA library libcudnn.so locally
I tensorflow/stream_executor/dso_loader.cc:111] successfully opened CUDA library libcufft.so locally
I tensorflow/stream_executor/dso_loader.cc:111] successfully opened CUDA library libcuda.so.1 locally
I tensorflow/stream_executor/dso_loader.cc:111] successfully opened CUDA library libcurand.so locally
Successfully downloaded train-images-idx3-ubyte.gz 9912422 bytes.
Successfully downloaded train-labels-idx1-ubyte.gz 28881 bytes.
Successfully downloaded t10k-images-idx3-ubyte.gz 1648877 bytes.
Successfully downloaded t10k-labels-idx1-ubyte.gz 4542 bytes.
Extracting data/train-images-idx3-ubyte.gz
Extracting data/train-labels-idx1-ubyte.gz
Extracting data/t10k-images-idx3-ubyte.gz
Extracting data/t10k-labels-idx1-ubyte.gz
I tensorflow/core/common_runtime/gpu/gpu_device.cc:951] Found device 0 with properties:
name: Tesla K40m
major: 3 minor: 5 memoryClockRate (GHz) 0.745
pciBusID 0000:02:00.0
Total memory: 11.25GiB
Free memory: 11.12GiB
W tensorflow/stream_executor/cuda/cuda_driver.cc:572] creating context when one is currently active; existing: 0x4c81230
I tensorflow/core/common_runtime/gpu/gpu_device.cc:951] Found device 1 with properties:
name: Tesla K40m
major: 3 minor: 5 memoryClockRate (GHz) 0.745
pciBusID 0000:03:00.0
Total memory: 11.25GiB
Free memory: 11.12GiB
I tensorflow/core/common_runtime/gpu/gpu_device.cc:972] DMA: 0 1
I tensorflow/core/common_runtime/gpu/gpu_device.cc:982] 0: Y Y
I tensorflow/core/common_runtime/gpu/gpu_device.cc:982] 1: Y Y
I tensorflow/core/common_runtime/gpu/gpu_device.cc:1041] Creating TensorFlow device (/gpu:0) -> (device: 0, name: Tesla K40m, pci bus id: 0000:02:00.0)
I tensorflow/core/common_runtime/gpu/gpu_device.cc:1041] Creating TensorFlow device (/gpu:1) -> (device: 1, name: Tesla K40m, pci bus id: 0000:03:00.0)
Initialized!
Step 0 (epoch 0.00), 67.1 ms
Minibatch loss: 12.054, learning rate: 0.010000
Minibatch error: 90.6%
Validation error: 84.6%
Step 100 (epoch 0.12), 14.2 ms
Minibatch loss: 3.303, learning rate: 0.010000
Minibatch error: 6.2%
Validation error: 7.0%
Step 200 (epoch 0.23), 13.8 ms
Minibatch loss: 3.481, learning rate: 0.010000
Minibatch error: 12.5%
Validation error: 4.0%
……
Step 8500 (epoch 9.89), 13.9 ms
Minibatch loss: 1.618, learning rate: 0.006302
Minibatch error: 1.6%
Validation error: 0.8%
Test error: 0.8%
M40 上运行输出
I tensorflow/stream_executor/dso_loader.cc:111] successfully opened CUDA library libcublas.so locally
I tensorflow/stream_executor/dso_loader.cc:111] successfully opened CUDA library libcudnn.so.5 locally
I tensorflow/stream_executor/dso_loader.cc:111] successfully opened CUDA library libcufft.so locally
I tensorflow/stream_executor/dso_loader.cc:111] successfully opened CUDA library libcuda.so.1 locally
I tensorflow/stream_executor/dso_loader.cc:111] successfully opened CUDA library libcurand.so locally
Successfully downloaded train-images-idx3-ubyte.gz 9912422 bytes.
Successfully downloaded train-labels-idx1-ubyte.gz 28881 bytes.
Successfully downloaded t10k-images-idx3-ubyte.gz 1648877 bytes.
Successfully downloaded t10k-labels-idx1-ubyte.gz 4542 bytes.
Extracting data/train-images-idx3-ubyte.gz
Extracting data/train-labels-idx1-ubyte.gz
Extracting data/t10k-images-idx3-ubyte.gz
Extracting data/t10k-labels-idx1-ubyte.gz
I tensorflow/core/common_runtime/gpu/gpu_device.cc:951] Found device 0 with properties:
name: Tesla M40
major: 5 minor: 2 memoryClockRate (GHz) 1.112
pciBusID 0000:06:00.0
Total memory: 11.18GiB
Free memory: 11.07GiB
W tensorflow/stream_executor/cuda/cuda_driver.cc:572] creating context when one is currently active; existing: 0x37b6440
I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:925] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
I tensorflow/core/common_runtime/gpu/gpu_device.cc:951] Found device 1 with properties:
name: Tesla M40
major: 5 minor: 2 memoryClockRate (GHz) 1.112
pciBusID 0000:87:00.0
Total memory: 11.18GiB
Free memory: 11.07GiB
I tensorflow/core/common_runtime/gpu/gpu_device.cc:855] cannot enable peer access from device ordinal 0 to device ordinal 1
I tensorflow/core/common_runtime/gpu/gpu_device.cc:855] cannot enable peer access from device ordinal 1 to device ordinal 0
I tensorflow/core/common_runtime/gpu/gpu_device.cc:972] DMA: 0 1
I tensorflow/core/common_runtime/gpu/gpu_device.cc:982] 0: Y N
I tensorflow/core/common_runtime/gpu/gpu_device.cc:982] 1: N Y
I tensorflow/core/common_runtime/gpu/gpu_device.cc:1041] Creating TensorFlow device (/gpu:0) -> (device: 0, name: Tesla M40, pci bus id: 0000:06:00.0)
I tensorflow/core/common_runtime/gpu/gpu_device.cc:1041] Creating TensorFlow device (/gpu:1) -> (device: 1, name: Tesla M40, pci bus id: 0000:87:00.0)
Initialized!
Step 0 (epoch 0.00), 23.6 ms
Minibatch loss: 12.054, learning rate: 0.010000
Minibatch error: 90.6%
Validation error: 84.6%
Step 100 (epoch 0.12), 6.7 ms
Minibatch loss: 3.297, learning rate: 0.010000
Minibatch error: 7.8%
Validation error: 7.3%
Step 200 (epoch 0.23), 6.5 ms
Minibatch loss: 3.448, learning rate: 0.010000
Minibatch error: 10.9%
Validation error: 3.8%
……
Step 8500 (epoch 9.89), 6.4 ms
Minibatch loss: 1.605, learning rate: 0.006302
Minibatch error: 0.0%
Validation error: 0.9%
Test error: 0.8%
GTX1080 上运行输出
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcublas.so locally
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcudnn.so locally
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcufft.so locally
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcuda.so.1 locally
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcurand.so locally
Successfully downloaded train-images-idx3-ubyte.gz 9912422 bytes.
Successfully downloaded train-labels-idx1-ubyte.gz 28881 bytes.
Successfully downloaded t10k-images-idx3-ubyte.gz 1648877 bytes.
Successfully downloaded t10k-labels-idx1-ubyte.gz 4542 bytes.
Extracting data/train-images-idx3-ubyte.gz
Extracting data/train-labels-idx1-ubyte.gz
Extracting data/t10k-images-idx3-ubyte.gz
Extracting data/t10k-labels-idx1-ubyte.gz
I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:925] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
I tensorflow/core/common_runtime/gpu/gpu_init.cc:118] Found device 0 with properties:
name: GeForce GTX 1080
major: 6 minor: 1 memoryClockRate (GHz) 1.8475
pciBusID 0000:01:00.0
Total memory: 7.92GiB
Free memory: 7.48GiB
I tensorflow/core/common_runtime/gpu/gpu_init.cc:138] DMA: 0
I tensorflow/core/common_runtime/gpu/gpu_init.cc:148] 0: Y
I tensorflow/core/common_runtime/gpu/gpu_device.cc:868] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 1080, pci bus id: 0000:01:00.0)
Initialized!
Step 0 (epoch 0.00), 8.2 ms
Minibatch loss: 12.054, learning rate: 0.010000
Minibatch error: 90.6%
Validation error: 84.6%
Step 100 (epoch 0.12), 5.1 ms
Minibatch loss: 3.289, learning rate: 0.010000
Minibatch error: 6.2%
Validation error: 7.0%
Step 200 (epoch 0.23), 5.1 ms
Minibatch loss: 3.476, learning rate: 0.010000
Minibatch error: 12.5%
Validation error: 3.7%
……
Step 8500 (epoch 9.89), 4.9 ms
Minibatch loss: 1.612, learning rate: 0.006302
Minibatch error: 0.0%
Validation error: 1.0%
Test error: 0.8%
P40 上输出
$python -m tensorflow.models.image.mnist.convolutional
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcublas.so locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcudnn.so locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcufft.so locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcuda.so.1 locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcurand.so locally
Successfully downloaded train-images-idx3-ubyte.gz 9912422 bytes.
Successfully downloaded train-labels-idx1-ubyte.gz 28881 bytes.
Successfully downloaded t10k-images-idx3-ubyte.gz 1648877 bytes.
Successfully downloaded t10k-labels-idx1-ubyte.gz 4542 bytes.
Extracting data/train-images-idx3-ubyte.gz
Extracting data/train-labels-idx1-ubyte.gz
Extracting data/t10k-images-idx3-ubyte.gz
Extracting data/t10k-labels-idx1-ubyte.gz
I tensorflow/core/common_runtime/gpu/gpu_device.cc:885] Found device 0 with properties:
name: Tesla P40
major: 6 minor: 1 memoryClockRate (GHz) 1.531
pciBusID 0000:06:00.0
Total memory: 22.41GiB
Free memory: 22.24GiB
W tensorflow/stream_executor/cuda/cuda_driver.cc:590] creating context when one is currently active; existing: 0x3dbfc90
I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
I tensorflow/core/common_runtime/gpu/gpu_device.cc:885] Found device 1 with properties:
name: Tesla P40
major: 6 minor: 1 memoryClockRate (GHz) 1.531
pciBusID 0000:87:00.0
Total memory: 22.41GiB
Free memory: 22.24GiB
I tensorflow/core/common_runtime/gpu/gpu_device.cc:777] Peer access not supported between device ordinals 0 and 1
I tensorflow/core/common_runtime/gpu/gpu_device.cc:777] Peer access not supported between device ordinals 1 and 0
I tensorflow/core/common_runtime/gpu/gpu_device.cc:906] DMA: 0 1
I tensorflow/core/common_runtime/gpu/gpu_device.cc:916] 0: Y N
I tensorflow/core/common_runtime/gpu/gpu_device.cc:916] 1: N Y
I tensorflow/core/common_runtime/gpu/gpu_device.cc:975] Creating TensorFlow device (/gpu:0) -> (device: 0, name: Tesla P40, pci bus id: 0000:06:00.0)
I tensorflow/core/common_runtime/gpu/gpu_device.cc:975] Creating TensorFlow device (/gpu:1) -> (device: 1, name: Tesla P40, pci bus id: 0000:87:00.0)
Initialized!
Step 0 (epoch 0.00), 962.4 ms
Minibatch loss: 8.334, learning rate: 0.010000
Minibatch error: 85.9%
Validation error: 84.6%
Step 100 (epoch 0.12), 4.6 ms
Minibatch loss: 3.249, learning rate: 0.010000
Minibatch error: 6.2%
Validation error: 7.6%
Step 200 (epoch 0.23), 4.5 ms
Minibatch loss: 3.342, learning rate: 0.010000
Minibatch error: 6.2%
Validation error: 4.3%
……
Step 8500 (epoch 9.89), 4.5 ms
Minibatch loss: 1.614, learning rate: 0.006302
Minibatch error: 1.6%
Validation error: 0.8%
Test error: 0.8%
注意事项
(1) 如果需要 GPU,那么首先安装 CUDA 和 cuDNN。
(2) GTX1080、P40 为 Pascal 架构,需要安装 CUDA 8.0 + cuDNN 5.1.5 才能正常运行。
(3) TensorFlow 从 0.8.0rc 开始支持多机多卡分布式计算,而更早的版本只支持单计算节点。
更多推荐
所有评论(0)