apex教程网址:GitHub - NVIDIA/apex: A PyTorch Extension: Tools for easy mixed precision and distributed training in Pytorch

官网是这样写的

git clone https://github.com/NVIDIA/apex
cd apex
pip install -v --disable-pip-version-check --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" ./

前两步没问题,最后一步就报错了

ERROR: Command errored out with exit status 1: /opt/c50017935/condaenv/ViT/bin/python -u -c 'import io, os, sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-req-build-j19rw84j/setup.py'"'"'; __file__='"'"'/tmp/pip-req-build-j19rw84j/setup.py'"'"';f = getattr(tokenize, '"'"'open'"'"', open)(__file__) if os.path.exists(__file__) else io.StringIO('"'"'from setuptools import setup; setup()'"'"');code = f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' --cpp_ext --cuda_ext install --record /tmp/pip-record-wd3fgg7r/install-record.txt --single-version-externally-managed --compile --install-headers /opt/c50017935/condaenv/ViT/include/python3.6m/apex Check the logs for full command output.
 

解决方法

python3 setup.py install

我激活了自己项目用的虚拟环境,然后输入 python3 setup.py install,就成功了。

amp_C 报错

在跑代码的过程中,出现了这样一个警告并报错:

multi_tensor_applier fused unscale kernel is unavailable, possibly because apex was installed without --cuda_ext --cpp_ext. Using Python fallback.  Original ImportError was: ModuleNotFoundError("No module named 'amp_C'")

我一直以为是没有amp_C 这个模块造成的,上网找了好久也没搞定,后面发现这个没太大关系,我的报错只是因为torch版本和GPU不匹配,因为原先这个torch版本是我按原论文的版本在虚拟环境中重新装的,后面就直接copy之前一个师兄的环境,torch匹配了,也还是出现这个警告,但是代码能跑通。

Logo

更多推荐