1 docker容器环境配置,

一开始使用的是gemini根据“qvlm”这个标题推荐的镜像,结果发现错了。正确做法就是项目里面推荐的

pip install --upgrade pip  # enable PEP 660 support
pip install -e .

但是也要记得问gemini nvidia的nvcc等怎么配,比如qvlm项目里面的BNB只支持到cuda120,而我用的是cuda121,现在也能安装。一定要验证容器里面nvcc可以运行。
其他的一些库,比如我要针对snn开发,也是问的gemini。
pigar导出的项目依赖有问题,不能用。

2 进一步查看BNB有没有安装

实际上,使用pip install -e后可能显示BNB已经安装,但是在运行python -m bitsandbytes的时候出现如https://github.com/ChangyuanWang17/QVLM/issues/15里面的

False

===================================BUG REPORT===================================
/home/suhail/miniconda3/envs/geochat_QVLM/lib/python3.10/site-packages/bitsandbytes-0.41.1-py3.10.egg/bitsandbytes/cuda_setup/main.py:166: UserWarning: Welcome to bitsandbytes. For bug reports, please run

python -m bitsandbytes


  warn(msg)
================================================================================
/home/suhail/miniconda3/envs/geochat_QVLM/lib/python3.10/site-packages/bitsandbytes-0.41.1-py3.10.egg/bitsandbytes/cuda_setup/main.py:166: UserWarning: Found duplicate ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] files: {PosixPath('/home/suhail/miniconda3/envs/geochat_QVLM/lib/libcudart.so'), PosixPath('/home/suhail/miniconda3/envs/geochat_QVLM/lib/libcudart.so.11.0')}.. We select the PyTorch default libcudart.so, which is {torch.version.cuda},but this might missmatch with the CUDA version that is needed for bitsandbytes.To override this behavior set the BNB_CUDA_VERSION=<version string, e.g. 122> environmental variableFor example, if you want to use the CUDA version 122BNB_CUDA_VERSION=122 python ...OR set the environmental variable in your .bashrc: export BNB_CUDA_VERSION=122In the case of a manual override, make sure you set the LD_LIBRARY_PATH, e.g.export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda-11.2
  warn(msg)
/home/suhail/miniconda3/envs/geochat_QVLM/lib/python3.10/site-packages/bitsandbytes-0.41.1-py3.10.egg/bitsandbytes/cuda_setup/main.py:166: UserWarning: /home/suhail/miniconda3/envs/geochat_QVLM did not contain ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] as expected! Searching further paths...
  warn(msg)
/home/suhail/miniconda3/envs/geochat_QVLM/lib/python3.10/site-packages/bitsandbytes-0.41.1-py3.10.egg/bitsandbytes/cuda_setup/main.py:166: UserWarning: :/home/suhail/miniconda3/envs/geochat_QVLM/lib/:/home/suhail/miniconda3/envs/geochat_QVLM/lib/python3.10/site-packages/nvidia/cudnn/lib did not contain ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] as expected! Searching further paths...
  warn(msg)
The following directories listed in your path were found to be non-existent: {PosixPath('local/suhail'), PosixPath('@/tmp/.ICE-unix/2448,unix/suhail')}
The following directories listed in your path were found to be non-existent: {PosixPath('/etc/xdg/xdg-ubuntu')}
The following directories listed in your path were found to be non-existent: {PosixPath('0'), PosixPath('1')}
The following directories listed in your path were found to be non-existent: {PosixPath('/org/gnome/Terminal/screen/65b47edc_3e94_4970_a5ed_fe3da9c78750')}
CUDA_SETUP: WARNING! libcudart.so not found in any environmental path. Searching in backup paths...
/home/suhail/miniconda3/envs/geochat_QVLM/lib/python3.10/site-packages/bitsandbytes-0.41.1-py3.10.egg/bitsandbytes/cuda_setup/main.py:166: UserWarning: Found duplicate ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] files: {PosixPath('/usr/local/cuda/lib64/libcudart.so'), PosixPath('/usr/local/cuda/lib64/libcudart.so.11.0')}.. We select the PyTorch default libcudart.so, which is {torch.version.cuda},but this might missmatch with the CUDA version that is needed for bitsandbytes.To override this behavior set the BNB_CUDA_VERSION=<version string, e.g. 122> environmental variableFor example, if you want to use the CUDA version 122BNB_CUDA_VERSION=122 python ...OR set the environmental variable in your .bashrc: export BNB_CUDA_VERSION=122In the case of a manual override, make sure you set the LD_LIBRARY_PATH, e.g.export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda-11.2
  warn(msg)
DEBUG: Possible options found for libcudart.so: {PosixPath('/usr/local/cuda/lib64/libcudart.so'), PosixPath('/usr/local/cuda/lib64/libcudart.so.11.0')}
CUDA SETUP: PyTorch settings found: CUDA_VERSION=117, Highest Compute Capability: 8.9.
CUDA SETUP: To manually override the PyTorch CUDA version please see:https://github.com/TimDettmers/bitsandbytes/blob/main/how_to_use_nonpytorch_cuda.md
CUDA SETUP: Required library version not found: libbitsandbytes_cuda117.so. Maybe you need to compile it from source?
CUDA SETUP: Defaulting to libbitsandbytes_cpu.so...

================================================ERROR=====================================
CUDA SETUP: CUDA detection failed! Possible reasons:
1. You need to manually override the PyTorch CUDA version. Please see: "https://github.com/TimDettmers/bitsandbytes/blob/main/how_to_use_nonpytorch_cuda.md
2. CUDA driver not installed
3. CUDA not installed
4. You have multiple conflicting CUDA libraries
5. Required library not pre-compiled for this bitsandbytes release!
CUDA SETUP: If you compiled from source, try again with `make CUDA_VERSION=DETECTED_CUDA_VERSION` for example, `make CUDA_VERSION=113`.
CUDA SETUP: The CUDA version for the compile might depend on your conda install. Inspect CUDA version via `conda list | grep cuda`.
================================================================================

CUDA SETUP: Something unexpected happened. Please compile from source:
git clone https://github.com/TimDettmers/bitsandbytes.git
cd bitsandbytes
CUDA_VERSION=117 make cuda11x
python setup.py install
CUDA SETUP: Setup Failed!
Traceback (most recent call last):
  File "/home/suhail/miniconda3/envs/geochat_QVLM/lib/python3.10/site-packages/transformers/utils/import_utils.py", line 1099, in _get_module
    return importlib.import_module("." + module_name, self.__name__)
  File "/home/suhail/miniconda3/envs/geochat_QVLM/lib/python3.10/importlib/__init__.py", line 126, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 1050, in _gcd_import
  File "<frozen importlib._bootstrap>", line 1027, in _find_and_load
  File "<frozen importlib._bootstrap>", line 1006, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 688, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 883, in exec_module
  File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
  File "/home/suhail/miniconda3/envs/geochat_QVLM/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 32, in <module>
    from ...modeling_utils import PreTrainedModel
  File "/home/suhail/miniconda3/envs/geochat_QVLM/lib/python3.10/site-packages/transformers/modeling_utils.py", line 38, in <module>
    from .deepspeed import deepspeed_config, is_deepspeed_zero3_enabled
  File "/home/suhail/miniconda3/envs/geochat_QVLM/lib/python3.10/site-packages/transformers/deepspeed.py", line 37, in <module>
    from accelerate.utils.deepspeed import HfDeepSpeedConfig as DeepSpeedConfig
  File "/home/suhail/miniconda3/envs/geochat_QVLM/lib/python3.10/site-packages/accelerate/__init__.py", line 3, in <module>
    from .accelerator import Accelerator
  File "/home/suhail/miniconda3/envs/geochat_QVLM/lib/python3.10/site-packages/accelerate/accelerator.py", line 35, in <module>
    from .checkpointing import load_accelerator_state, load_custom_state, save_accelerator_state, save_custom_state
  File "/home/suhail/miniconda3/envs/geochat_QVLM/lib/python3.10/site-packages/accelerate/checkpointing.py", line 24, in <module>
    from .utils import (
  File "/home/suhail/miniconda3/envs/geochat_QVLM/lib/python3.10/site-packages/accelerate/utils/__init__.py", line 131, in <module>
    from .bnb import has_4bit_bnb_layers, load_and_quantize_model
  File "/home/suhail/miniconda3/envs/geochat_QVLM/lib/python3.10/site-packages/accelerate/utils/bnb.py", line 42, in <module>
    import bitsandbytes as bnb
  File "/home/suhail/miniconda3/envs/geochat_QVLM/lib/python3.10/site-packages/bitsandbytes-0.41.1-py3.10.egg/bitsandbytes/__init__.py", line 6, in <module>
    from . import cuda_setup, utils, research, quantization_utils
  File "/home/suhail/miniconda3/envs/geochat_QVLM/lib/python3.10/site-packages/bitsandbytes-0.41.1-py3.10.egg/bitsandbytes/research/__init__.py", line 1, in <module>
    from . import nn
  File "/home/suhail/miniconda3/envs/geochat_QVLM/lib/python3.10/site-packages/bitsandbytes-0.41.1-py3.10.egg/bitsandbytes/research/nn/__init__.py", line 1, in <module>
    from .modules import LinearFP8Mixed, LinearFP8Global
  File "/home/suhail/miniconda3/envs/geochat_QVLM/lib/python3.10/site-packages/bitsandbytes-0.41.1-py3.10.egg/bitsandbytes/research/nn/modules.py", line 8, in <module>
    from bitsandbytes.optim import GlobalOptimManager
  File "/home/suhail/miniconda3/envs/geochat_QVLM/lib/python3.10/site-packages/bitsandbytes-0.41.1-py3.10.egg/bitsandbytes/optim/__init__.py", line 6, in <module>
    from bitsandbytes.cextension import COMPILED_WITH_CUDA
  File "/home/suhail/miniconda3/envs/geochat_QVLM/lib/python3.10/site-packages/bitsandbytes-0.41.1-py3.10.egg/bitsandbytes/cextension.py", line 20, in <module>
    raise RuntimeError('''
RuntimeError: 
        CUDA Setup failed despite GPU being available. Please run the following command to get more information:

        python -m bitsandbytes

        Inspect the output of the command and see if you can locate CUDA libraries. You might need to add them
        to your LD_LIBRARY_PATH. If you suspect a bug, please take the information from python -m bitsandbytes
        and open an issue at: https://github.com/TimDettmers/bitsandbytes/issues

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/suhail/Desktop/GeoChat/geochat/eval/batch_geochatq_vqa.py", line 8, in <module>
    from geochat.constants import IMAGE_TOKEN_INDEX, DEFAULT_IMAGE_TOKEN, DEFAULT_IM_START_TOKEN, DEFAULT_IM_END_TOKEN
  File "/home/suhail/Desktop/GeoChat/geochat/__init__.py", line 1, in <module>
    from .model import GeoChatLlamaForCausalLM
  File "/home/suhail/Desktop/GeoChat/geochat/model/__init__.py", line 1, in <module>
    from .language_model.geochat_llama import GeoChatLlamaForCausalLM, GeoChatConfig
  File "/home/suhail/Desktop/GeoChat/geochat/model/language_model/geochat_llama.py", line 22, in <module>
    from transformers import AutoConfig, AutoModelForCausalLM, \
  File "<frozen importlib._bootstrap>", line 1075, in _handle_fromlist
  File "/home/suhail/miniconda3/envs/geochat_QVLM/lib/python3.10/site-packages/transformers/utils/import_utils.py", line 1090, in __getattr__
    value = getattr(module, name)
  File "/home/suhail/miniconda3/envs/geochat_QVLM/lib/python3.10/site-packages/transformers/utils/import_utils.py", line 1089, in __getattr__
    module = self._get_module(self._class_to_module[name])
  File "/home/suhail/miniconda3/envs/geochat_QVLM/lib/python3.10/site-packages/transformers/utils/import_utils.py", line 1101, in _get_module
    raise RuntimeError(
RuntimeError: Failed to import transformers.models.llama.modeling_llama because of the following error (look up to see its traceback):

        CUDA Setup failed despite GPU being available. Please run the following command to get more information:

        python -m bitsandbytes

        Inspect the output of the command and see if you can locate CUDA libraries. You might need to add them
        to your LD_LIBRARY_PATH. If you suspect a bug, please take the information from python -m bitsandbytes
        and open an issue at: https://github.com/TimDettmers/bitsandbytes/issues

这时建议不要按照qvlm项目的安装方式,而是进入custom_bitsandbytes路径下查看 README.md

CUDA_VERSION=121 make cuda12x
python setup.py install

如果这个能够安装成功,说明BNB是安装成功了。

3 运行时BNB又报错

还是出现找不到BNB的情况,可能原因
1 新的terminal里面没有配置nvcc相关的路径,导致BNB又找不到nvidia运行的环境了;
2 编译出来的BNB libbitsandbytes.so没有配置路径,导致python运行时找不到BNB。
解决方法,以vscode launch.json debug环境配置为例

"env":{
"CUDA_VISIBLE_DEVICES":"0",
"LD_LIBRARY_PATH":"/usr/local/cuda-12.1/lib64:/QVLM/custom_bitsandbytes/bitsandbytes:$LD_LIBRARY_PATH",
"PYTHONPATH": "/QVLM/custom_bitsandbytes/bitsandbytes:$PYTHONPATH"
}

根据需要换显卡编号、QVLM项目路径。
命令行模式下用export指令。

截至以上,是我遇到的BNB安装问题。

4 QVLM项目运行注意

除了作者提供的huggingface用于验证的浮点模型,实际上里面只有llm和projector两个部分。vision 模块还要下载openai的模型,因为作者用的是llava-v1.3,没有对视觉模块进行微调。

import os
from huggingface_hub import snapshot_download

# 指定离线保存的本地文件夹名称
local_dir = "./clip-vit-large-patch14"
os.makedirs(local_dir, exist_ok=True)

print("开始通过代理下载 CLIP 模型...")

snapshot_download(
    repo_id="openai/clip-vit-large-patch14",
    local_dir=local_dir,
    # 极其重要:必须设为 False,直接下载实体文件,否则生成的软链接拷到离线服务器会全部失效
    local_dir_use_symlinks=False, 
    resume_download=True,
    # 过滤掉不需要的非核心权重格式,节省空间和下载时间
    ignore_patterns=["*.msgpack", "*.h5", "rust_model.ot", "*.safetensors"] 
)

print(f"下载完成")

还要注意,下载的是 clip-vit-large-patch14 ,而不是336的模型。后者运行时IMGAcc只有30%。用了正确的模型,浮点值才能达到论文里面的精度。

运行结果

1 单卡。v100 32GB对测试集运行19个小时,因为本身它不怎么不支持BNB;4090运行则不到7小时。
2 vision模块也没有问题的时候,acc平均89.55%,其他各项是
89.48%,95.50%, 84.91,88.66, 87.51,87.94, 90.49,87.87

Logo

小龙虾开发者社区是 CSDN 旗下专注 OpenClaw 生态的官方阵地,聚焦技能开发、插件实践与部署教程,为开发者提供可直接落地的方案、工具与交流平台,助力高效构建与落地 AI 应用

更多推荐