QVLM 4090复现
1 docker容器环境配置,
一开始使用的是gemini根据“qvlm”这个标题推荐的镜像,结果发现错了。正确做法就是项目里面推荐的
pip install --upgrade pip # enable PEP 660 support
pip install -e .
但是也要记得问gemini nvidia的nvcc等怎么配,比如qvlm项目里面的BNB只支持到cuda120,而我用的是cuda121,现在也能安装。一定要验证容器里面nvcc可以运行。
其他的一些库,比如我要针对snn开发,也是问的gemini。
pigar导出的项目依赖有问题,不能用。
2 进一步查看BNB有没有安装
实际上,使用pip install -e后可能显示BNB已经安装,但是在运行python -m bitsandbytes的时候出现如https://github.com/ChangyuanWang17/QVLM/issues/15里面的
False
===================================BUG REPORT===================================
/home/suhail/miniconda3/envs/geochat_QVLM/lib/python3.10/site-packages/bitsandbytes-0.41.1-py3.10.egg/bitsandbytes/cuda_setup/main.py:166: UserWarning: Welcome to bitsandbytes. For bug reports, please run
python -m bitsandbytes
warn(msg)
================================================================================
/home/suhail/miniconda3/envs/geochat_QVLM/lib/python3.10/site-packages/bitsandbytes-0.41.1-py3.10.egg/bitsandbytes/cuda_setup/main.py:166: UserWarning: Found duplicate ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] files: {PosixPath('/home/suhail/miniconda3/envs/geochat_QVLM/lib/libcudart.so'), PosixPath('/home/suhail/miniconda3/envs/geochat_QVLM/lib/libcudart.so.11.0')}.. We select the PyTorch default libcudart.so, which is {torch.version.cuda},but this might missmatch with the CUDA version that is needed for bitsandbytes.To override this behavior set the BNB_CUDA_VERSION=<version string, e.g. 122> environmental variableFor example, if you want to use the CUDA version 122BNB_CUDA_VERSION=122 python ...OR set the environmental variable in your .bashrc: export BNB_CUDA_VERSION=122In the case of a manual override, make sure you set the LD_LIBRARY_PATH, e.g.export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda-11.2
warn(msg)
/home/suhail/miniconda3/envs/geochat_QVLM/lib/python3.10/site-packages/bitsandbytes-0.41.1-py3.10.egg/bitsandbytes/cuda_setup/main.py:166: UserWarning: /home/suhail/miniconda3/envs/geochat_QVLM did not contain ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] as expected! Searching further paths...
warn(msg)
/home/suhail/miniconda3/envs/geochat_QVLM/lib/python3.10/site-packages/bitsandbytes-0.41.1-py3.10.egg/bitsandbytes/cuda_setup/main.py:166: UserWarning: :/home/suhail/miniconda3/envs/geochat_QVLM/lib/:/home/suhail/miniconda3/envs/geochat_QVLM/lib/python3.10/site-packages/nvidia/cudnn/lib did not contain ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] as expected! Searching further paths...
warn(msg)
The following directories listed in your path were found to be non-existent: {PosixPath('local/suhail'), PosixPath('@/tmp/.ICE-unix/2448,unix/suhail')}
The following directories listed in your path were found to be non-existent: {PosixPath('/etc/xdg/xdg-ubuntu')}
The following directories listed in your path were found to be non-existent: {PosixPath('0'), PosixPath('1')}
The following directories listed in your path were found to be non-existent: {PosixPath('/org/gnome/Terminal/screen/65b47edc_3e94_4970_a5ed_fe3da9c78750')}
CUDA_SETUP: WARNING! libcudart.so not found in any environmental path. Searching in backup paths...
/home/suhail/miniconda3/envs/geochat_QVLM/lib/python3.10/site-packages/bitsandbytes-0.41.1-py3.10.egg/bitsandbytes/cuda_setup/main.py:166: UserWarning: Found duplicate ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] files: {PosixPath('/usr/local/cuda/lib64/libcudart.so'), PosixPath('/usr/local/cuda/lib64/libcudart.so.11.0')}.. We select the PyTorch default libcudart.so, which is {torch.version.cuda},but this might missmatch with the CUDA version that is needed for bitsandbytes.To override this behavior set the BNB_CUDA_VERSION=<version string, e.g. 122> environmental variableFor example, if you want to use the CUDA version 122BNB_CUDA_VERSION=122 python ...OR set the environmental variable in your .bashrc: export BNB_CUDA_VERSION=122In the case of a manual override, make sure you set the LD_LIBRARY_PATH, e.g.export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda-11.2
warn(msg)
DEBUG: Possible options found for libcudart.so: {PosixPath('/usr/local/cuda/lib64/libcudart.so'), PosixPath('/usr/local/cuda/lib64/libcudart.so.11.0')}
CUDA SETUP: PyTorch settings found: CUDA_VERSION=117, Highest Compute Capability: 8.9.
CUDA SETUP: To manually override the PyTorch CUDA version please see:https://github.com/TimDettmers/bitsandbytes/blob/main/how_to_use_nonpytorch_cuda.md
CUDA SETUP: Required library version not found: libbitsandbytes_cuda117.so. Maybe you need to compile it from source?
CUDA SETUP: Defaulting to libbitsandbytes_cpu.so...
================================================ERROR=====================================
CUDA SETUP: CUDA detection failed! Possible reasons:
1. You need to manually override the PyTorch CUDA version. Please see: "https://github.com/TimDettmers/bitsandbytes/blob/main/how_to_use_nonpytorch_cuda.md
2. CUDA driver not installed
3. CUDA not installed
4. You have multiple conflicting CUDA libraries
5. Required library not pre-compiled for this bitsandbytes release!
CUDA SETUP: If you compiled from source, try again with `make CUDA_VERSION=DETECTED_CUDA_VERSION` for example, `make CUDA_VERSION=113`.
CUDA SETUP: The CUDA version for the compile might depend on your conda install. Inspect CUDA version via `conda list | grep cuda`.
================================================================================
CUDA SETUP: Something unexpected happened. Please compile from source:
git clone https://github.com/TimDettmers/bitsandbytes.git
cd bitsandbytes
CUDA_VERSION=117 make cuda11x
python setup.py install
CUDA SETUP: Setup Failed!
Traceback (most recent call last):
File "/home/suhail/miniconda3/envs/geochat_QVLM/lib/python3.10/site-packages/transformers/utils/import_utils.py", line 1099, in _get_module
return importlib.import_module("." + module_name, self.__name__)
File "/home/suhail/miniconda3/envs/geochat_QVLM/lib/python3.10/importlib/__init__.py", line 126, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File "<frozen importlib._bootstrap>", line 1050, in _gcd_import
File "<frozen importlib._bootstrap>", line 1027, in _find_and_load
File "<frozen importlib._bootstrap>", line 1006, in _find_and_load_unlocked
File "<frozen importlib._bootstrap>", line 688, in _load_unlocked
File "<frozen importlib._bootstrap_external>", line 883, in exec_module
File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
File "/home/suhail/miniconda3/envs/geochat_QVLM/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 32, in <module>
from ...modeling_utils import PreTrainedModel
File "/home/suhail/miniconda3/envs/geochat_QVLM/lib/python3.10/site-packages/transformers/modeling_utils.py", line 38, in <module>
from .deepspeed import deepspeed_config, is_deepspeed_zero3_enabled
File "/home/suhail/miniconda3/envs/geochat_QVLM/lib/python3.10/site-packages/transformers/deepspeed.py", line 37, in <module>
from accelerate.utils.deepspeed import HfDeepSpeedConfig as DeepSpeedConfig
File "/home/suhail/miniconda3/envs/geochat_QVLM/lib/python3.10/site-packages/accelerate/__init__.py", line 3, in <module>
from .accelerator import Accelerator
File "/home/suhail/miniconda3/envs/geochat_QVLM/lib/python3.10/site-packages/accelerate/accelerator.py", line 35, in <module>
from .checkpointing import load_accelerator_state, load_custom_state, save_accelerator_state, save_custom_state
File "/home/suhail/miniconda3/envs/geochat_QVLM/lib/python3.10/site-packages/accelerate/checkpointing.py", line 24, in <module>
from .utils import (
File "/home/suhail/miniconda3/envs/geochat_QVLM/lib/python3.10/site-packages/accelerate/utils/__init__.py", line 131, in <module>
from .bnb import has_4bit_bnb_layers, load_and_quantize_model
File "/home/suhail/miniconda3/envs/geochat_QVLM/lib/python3.10/site-packages/accelerate/utils/bnb.py", line 42, in <module>
import bitsandbytes as bnb
File "/home/suhail/miniconda3/envs/geochat_QVLM/lib/python3.10/site-packages/bitsandbytes-0.41.1-py3.10.egg/bitsandbytes/__init__.py", line 6, in <module>
from . import cuda_setup, utils, research, quantization_utils
File "/home/suhail/miniconda3/envs/geochat_QVLM/lib/python3.10/site-packages/bitsandbytes-0.41.1-py3.10.egg/bitsandbytes/research/__init__.py", line 1, in <module>
from . import nn
File "/home/suhail/miniconda3/envs/geochat_QVLM/lib/python3.10/site-packages/bitsandbytes-0.41.1-py3.10.egg/bitsandbytes/research/nn/__init__.py", line 1, in <module>
from .modules import LinearFP8Mixed, LinearFP8Global
File "/home/suhail/miniconda3/envs/geochat_QVLM/lib/python3.10/site-packages/bitsandbytes-0.41.1-py3.10.egg/bitsandbytes/research/nn/modules.py", line 8, in <module>
from bitsandbytes.optim import GlobalOptimManager
File "/home/suhail/miniconda3/envs/geochat_QVLM/lib/python3.10/site-packages/bitsandbytes-0.41.1-py3.10.egg/bitsandbytes/optim/__init__.py", line 6, in <module>
from bitsandbytes.cextension import COMPILED_WITH_CUDA
File "/home/suhail/miniconda3/envs/geochat_QVLM/lib/python3.10/site-packages/bitsandbytes-0.41.1-py3.10.egg/bitsandbytes/cextension.py", line 20, in <module>
raise RuntimeError('''
RuntimeError:
CUDA Setup failed despite GPU being available. Please run the following command to get more information:
python -m bitsandbytes
Inspect the output of the command and see if you can locate CUDA libraries. You might need to add them
to your LD_LIBRARY_PATH. If you suspect a bug, please take the information from python -m bitsandbytes
and open an issue at: https://github.com/TimDettmers/bitsandbytes/issues
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/home/suhail/Desktop/GeoChat/geochat/eval/batch_geochatq_vqa.py", line 8, in <module>
from geochat.constants import IMAGE_TOKEN_INDEX, DEFAULT_IMAGE_TOKEN, DEFAULT_IM_START_TOKEN, DEFAULT_IM_END_TOKEN
File "/home/suhail/Desktop/GeoChat/geochat/__init__.py", line 1, in <module>
from .model import GeoChatLlamaForCausalLM
File "/home/suhail/Desktop/GeoChat/geochat/model/__init__.py", line 1, in <module>
from .language_model.geochat_llama import GeoChatLlamaForCausalLM, GeoChatConfig
File "/home/suhail/Desktop/GeoChat/geochat/model/language_model/geochat_llama.py", line 22, in <module>
from transformers import AutoConfig, AutoModelForCausalLM, \
File "<frozen importlib._bootstrap>", line 1075, in _handle_fromlist
File "/home/suhail/miniconda3/envs/geochat_QVLM/lib/python3.10/site-packages/transformers/utils/import_utils.py", line 1090, in __getattr__
value = getattr(module, name)
File "/home/suhail/miniconda3/envs/geochat_QVLM/lib/python3.10/site-packages/transformers/utils/import_utils.py", line 1089, in __getattr__
module = self._get_module(self._class_to_module[name])
File "/home/suhail/miniconda3/envs/geochat_QVLM/lib/python3.10/site-packages/transformers/utils/import_utils.py", line 1101, in _get_module
raise RuntimeError(
RuntimeError: Failed to import transformers.models.llama.modeling_llama because of the following error (look up to see its traceback):
CUDA Setup failed despite GPU being available. Please run the following command to get more information:
python -m bitsandbytes
Inspect the output of the command and see if you can locate CUDA libraries. You might need to add them
to your LD_LIBRARY_PATH. If you suspect a bug, please take the information from python -m bitsandbytes
and open an issue at: https://github.com/TimDettmers/bitsandbytes/issues
这时建议不要按照qvlm项目的安装方式,而是进入custom_bitsandbytes路径下查看 README.md
CUDA_VERSION=121 make cuda12x
python setup.py install
如果这个能够安装成功,说明BNB是安装成功了。
3 运行时BNB又报错
还是出现找不到BNB的情况,可能原因
1 新的terminal里面没有配置nvcc相关的路径,导致BNB又找不到nvidia运行的环境了;
2 编译出来的BNB libbitsandbytes.so没有配置路径,导致python运行时找不到BNB。
解决方法,以vscode launch.json debug环境配置为例
"env":{
"CUDA_VISIBLE_DEVICES":"0",
"LD_LIBRARY_PATH":"/usr/local/cuda-12.1/lib64:/QVLM/custom_bitsandbytes/bitsandbytes:$LD_LIBRARY_PATH",
"PYTHONPATH": "/QVLM/custom_bitsandbytes/bitsandbytes:$PYTHONPATH"
}
根据需要换显卡编号、QVLM项目路径。
命令行模式下用export指令。
截至以上,是我遇到的BNB安装问题。
4 QVLM项目运行注意
除了作者提供的huggingface用于验证的浮点模型,实际上里面只有llm和projector两个部分。vision 模块还要下载openai的模型,因为作者用的是llava-v1.3,没有对视觉模块进行微调。
import os
from huggingface_hub import snapshot_download
# 指定离线保存的本地文件夹名称
local_dir = "./clip-vit-large-patch14"
os.makedirs(local_dir, exist_ok=True)
print("开始通过代理下载 CLIP 模型...")
snapshot_download(
repo_id="openai/clip-vit-large-patch14",
local_dir=local_dir,
# 极其重要:必须设为 False,直接下载实体文件,否则生成的软链接拷到离线服务器会全部失效
local_dir_use_symlinks=False,
resume_download=True,
# 过滤掉不需要的非核心权重格式,节省空间和下载时间
ignore_patterns=["*.msgpack", "*.h5", "rust_model.ot", "*.safetensors"]
)
print(f"下载完成")
还要注意,下载的是 clip-vit-large-patch14 ,而不是336的模型。后者运行时IMGAcc只有30%。用了正确的模型,浮点值才能达到论文里面的精度。
运行结果
1 单卡。v100 32GB对测试集运行19个小时,因为本身它不怎么不支持BNB;4090运行则不到7小时。
2 vision模块也没有问题的时候,acc平均89.55%,其他各项是
89.48%,95.50%, 84.91,88.66, 87.51,87.94, 90.49,87.87
更多推荐

所有评论(0)