大模型训练时，使用bitsandbytes报错的解决方法

使用bitsandbytes加载模型出现"The installed version of bitsandbytes was compiled without GPU support."警告的解决办法

文章共641字 · 阅读需要大约3分钟

一键AI生成摘要，助你高效阅读

问答

Anycall201

22494人浏览 · 2023-04-03 15:55:45

Anycall201 · 2023-04-03 15:55:45 发布

前言

在对大语言模型(LLaMa、Chat-GLM等)进行微调时，考虑到减少显存占用，会使用如下方式加载模型。

from transformers import AutoModel

model = AutoModel.from_pretrained(
	model_path,
    trust_remote_code=True,
    load_in_8bit=True,
    torch_dtype=torch.float16,
    device_map='auto',
)

为了使用上述功能，需要安装bitsandbytes库，但在使用时，会提示UserWarning: The installed version of bitsandbytes was compiled without GPU support.。
进而，在模型加载时会报有关"libsbitsandbytes_cpu.so"的编译错误。

原因分析

在bitsandbytes源码中，有如下一段代码 bitsandbytes代码

这段代码会在引入bitsandbytes时执行，主要功能是找到cuda_lib的路径。
若该方法返回路径为空，则在后续会产生上述警告。
而通过注释不难看出，其搜索方式有三种: conda环境变量、LD_LIBRARY_PATH、其他环境变量。
因此，解决bitsandbytes找不到GPU的问题，只需配置好相应的环境变量即可。

解决方案

可通过如下方式解决上述问题：
~~1. 使用pip install bitsandbytes正常安装库~~
~~2. 切换到bitsandbytes所在lib目录，例如: xxx/venv/lib/python3.9/site-packages/bitsandbytes/cuda_setup~~
~~3. 使用vim指令或其他方式编辑main.py文件~~
4. 定位到if not torch.cuda.is_available(): return 'libsbitsandbytes_cpu.so', None, None, None, None，将其替换为if torch.cuda.is_available(): return 'libbitsandbytes_cuda116.so', None, None, None, None 。（并非一定要使用cuda116，只需大于等于自身显卡cuda版本即可）
~~5. 定位到self.lib = ct.cdll.LoadLibrary(binary_path)，会找到两处，把两处都替换为self.lib = ct.cdll.LoadLibrary(str(binary_path))~~