ChatGLM3

简介

ChatGLM3 是智谱AI和清华大学 KEG 实验室联合发布的新一代对话预训练模型。ChatGLM3-6B 是 ChatGLM3 系列中的开源模型,在保留了前两代模型对话流畅、部署门槛低等众多优秀特性的基础上,ChatGLM3-6B 引入了如下特性:

  1. **更强大的基础模型:**ChatGLM3-6B-Base 具有在 10B 以下的基础模型中最强的性能。

    ModelGSM8KMATHBBHMMLUC-EvalCMMLUMBPPAGIEval
    ChatGLM2-6B-Base32.46.533.747.951.750.0--
    Best Baseline(10B 以下)52.113.145.060.163.562.247.545.8
    ChatGLM3-6B-Base72.325.766.161.469.067.552.453.7
    Model平均SummarySingle-Doc QAMulti-Doc QACodeFew-shotSynthetic
    ChatGLM2-6B-32K41.524.837.634.752.851.347.7
    ChatGLM3-6B-32K50.226.645.846.156.261.265
  2. 更完整的功能支持: ChatGLM3-6B 采用了全新设计的Prompt 格式。

    • 多轮对话
    • 同时原生支持工具调用(Function Call)
    • 代码执行(Code Interpreter)
    • Agent 任务
  3. 更全面的开源序列:

    ModelSeq LengthDownload
    ChatGLM3-6B8kHuggingFace | ModelScope
    ChatGLM3-6B-Base8kHuggingFace | ModelScope
    ChatGLM3-6B-32K32kHuggingFace | ModelScope

推理代码

from modelscope import AutoTokenizer, AutoModel, snapshot_download
model_dir = snapshot_download("ZhipuAI/chatglm3-6b", revision = "master")
tokenizer = AutoTokenizer.from_pretrained(model_dir, trust_remote_code=True)
model = AutoModel.from_pretrained(model_dir, trust_remote_code=True).half().cuda()
model = model.eval()
response, history = model.chat(tokenizer, "你好", history=[])
print(response)
response, history = model.chat(tokenizer, "晚上睡不着应该怎么办", history=history)
print(response)

命令行加载

import os
import platform
from transformers import AutoTokenizer, AutoModel

model_path = "model/chatglm3_32k/"

tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)
model = AutoModel.from_pretrained(model_path, trust_remote_code=True).cuda()
# 多显卡支持,使用下面两行代替上面一行,将num_gpus改为你实际的显卡数量
# from utils import load_model_on_gpus
# model = load_model_on_gpus(model_path, num_gpus=2)
model = model.eval()

os_name = platform.system()
clear_command = 'cls' if os_name == 'Windows' else 'clear'
stop_stream = False

welcome_prompt = "欢迎使用 ChatGLM3-6B 模型,输入内容即可进行对话,clear 清空对话历史,stop 终止程序"

def build_prompt(history):
    prompt = welcome_prompt
    for query, response in history:
        prompt += f"\n\n用户:{query}"
        prompt += f"\n\nChatGLM3-6B:{response}"
    return prompt

def main():
    past_key_values, history = None, []
    global stop_stream
    print(welcome_prompt)
    while True:
        query = input("\n用户:")
        if query.strip() == "stop":
            break
        if query.strip() == "clear":
            past_key_values, history = None, []
            os.system(clear_command)
            print(welcome_prompt)
            continue
        print("\nChatGLM:", end="")
        current_length = 0
        for response, history, past_key_values in model.stream_chat(tokenizer, query, history=history,temperature=1,
                                                                    past_key_values=past_key_values,
                                                                    return_past_key_values=True):
            if stop_stream:
                stop_stream = False
                break
            else:
                print(response[current_length:], end="", flush=True)
                current_length = len(response)
        print(history)
        print("")
        # print(past_key_values)


if __name__ == "__main__":
    main()

Logo

旨在为数千万中国开发者提供一个无缝且高效的云端环境,以支持学习、使用和贡献开源项目。

更多推荐