xtuner安装及微调大模型

xtuner和llamaFactory不同，xtuner没有提供可视化的ui页面，对模型的配置参数全部在配置文件中进行，tips : deepspeed_zero2是将参数、梯度等在多个GPU上分片存储，如果没有设置deepspeed。新建python文件，写入以下内容，通过python 运行这个文件。指定第一、三两张显卡微调。则每个GPU存放模型的完整副本。2、下载并安装 xtuner。

骑士999111

725人浏览 · 2025-09-11 11:30:05

骑士999111 · 2025-09-11 11:30:05 发布

官方文档：https://xtuner.readthedocs.io/zh-cn/latest/index.html

注意：0.2.0 版本开始不再使用以下这种方式，0.2.0使用torchrun命令训练，在文章最后进行了说明，使用git clone https://github.com/InternLM/xtuner.git 下载的是最新的版本，如果要使用旧文档训练，请下载0.2.0之前的版本

1、新建虚拟环境

conda create --name xtuner python=3.10 -y

source activate xtuner

2、下载并安装 xtuner

git clone https://github.com/InternLM/xtuner.git
cd xtuner
pip install -e '.[all]' 安装xtuner及所有依赖，时间稍长

3、下载魔塔大模型

新建python文件，写入以下内容，通过python 运行这个文件

from modelscope import snapshot_download
model_dir = snapshot_download('Shanghai_AI_Laboratory/internlm2-chat-
1_8b',cache_dir='/root/llm/internlm2-1.8b-chat')

4、配置微调参数

xtuner和llamaFactory不同，xtuner没有提供可视化的ui页面，对模型的配置参数全部在配置文件中进行

5、微调

单卡微调
xtuner train /root/utils/xtuner/xtuner/configs/qwen/qwen1_5/qwen1_5_1_8b_chat/qwen1_5_1_8b_chat_qlora_alpaca_e3_new.py

多卡微调
NPROC_PER_NODE=2 xtuner train /root/utils/xtuner/xtuner/configs/qwen/qwen1_5/qwen1_5_1_8b_chat/qwen1_5_1_8b_chat_qlora_alpaca_e3_new.py --deepspeed deepspeed_zero2

指定第一、三两张显卡微调
CUDA_VISIBLE_DEVICES=0,2 NPROC_PER_NODE=2 xtuner train /root/utils/xtuner/xtuner/configs/qwen/qwen1_5/qwen1_5_1_8b_chat/qwen1_5_1_8b_chat_qlora_alpaca_e3_new.py --deepspeed deepspeed_zero2

tips : deepspeed_zero2是将参数、梯度等在多个GPU上分片存储，如果没有设置deepspeed 则每个GPU存放模型的完整副本

6、模型转换

xtuner训练出的模型后缀是 pth、如果使用了 deepspeed 训练出的模型是一个文件夹，将xtuner训练出的权重转成 huggingface 模型

xtuner convert pth_to_hf /root/utils/xtuner/xtuner/configs/qwen/qwen1_5/qwen1_5_1_8b_chat/qwen1_5_1_8b_chat_qlora_alpaca_e3_new.py ./iter_200.pth ./iter_200_

7、模型合并

xtuner convert merge <基础模型> <适配器路径-也就是训练出来的文件> <合并后文件保存路径> --max-shard-size 2GB \ # 分片大小 --device cuda:0 # 使用GPU加速

xtuner convert merge /root/models/modelscope /root/models/out/iter_500.pth /root/models/merge

xtuner和 llamafactory 的对比，建议使用xtuner

1、xtuner 数据集类型更加丰富

2、可以指定哪些显卡参与训练，见步骤5

3、xtuner 支持多模态训练，llamafactory 不支持

4、更灵活的配置和更完整的工具链

上面的这些安装步骤是旧版本的安装过程，在新版本中和旧版本差距比较大，按照操作手册操作

1、数据集的格式和文件，参考如下，注意文件后缀及格式

2、微调命令使用 torchrun，而不再是 xtuner train，也不需要建配置文件

模板类型 chat_template 只支持千问中的 qwen3，下载模型的时候要注意

torchrun --nproc-per-node 1 xtuner/v1/train/cli/sft.py --load-from /root/model/Qwen/Qwen3-0.6B --chat_template qwen3 --dataset /root/data/openai_sft.jsonl --total-step 100 --work-dir /root/model/xt