在 ROCm 上运行 AMD GPU 上的 LLM 执行自然语言处理任务

_勇

1460人浏览 · 2024-12-02 00:30:00

_勇 · 2024-12-02 00:30:00 发布

Performing natural language processing tasks with LLMs on ROCm running on AMD GPUs — ROCm Blogs

在这篇博客中，您将学习如何使用 ROCm 在 AMD 的 Instinct GPU 上运行一系列流行且有用的自然语言处理 (NLP) 任务，使用不同的大型语言模型 (LLM)。博客包含一个简单易懂的动手指南，向您展示如何实现 LLM 以进行核心 NLP 应用，包括文本生成、情感分析、抽取式问答（QA）以及解决数学问题。

通用 LLM（如 GPT 和 Llama）可以以合理的性能执行许多不同的任务。然而，某些任务需要进行微调或不同的模型架构以支持这些用例。机器学习社区开发了许多被设计或微调以适应特定任务的模型，以补充通用模型。在这篇博客中，我们涉及了通用模型和特定任务模型，并向您展示如何在 ROCm 运行的 AMD GPU 上使用它们来处理几种常见任务。

介绍

自从 OpenAI 在 2022 年底推出 ChatGPT 以来，数百万人已经体验到了生成式 AI 的强大功能。尽管通用大型语言模型（LLM）可以在许多任务（例如回答快速问题和解决问题）上提供相当好的性能，但当提示是高度特定于某个领域或需要某些它们未专门训练的技能时，它们往往表现不佳。提示工程可以通过在提示中提供具体说明或示例来帮助缓解这个问题。然而，创建提示所需的技能和上下文长度的限制往往阻止了 LLM 充分发挥其潜力。

为了解决这些问题，通用型 LLM 变得越来越大（某些模型如 Grok-1 已达到几千亿参数）且更强大。同时，机器学习社区已经开发了许多专用模型，这些模型可以在某些任务上表现非常出色，但在其他任务上的性能较低。

HuggingFace 列出了大约十几种 LLM 可以执行的 NLP 任务，包括文本生成、问答、翻译等。本文展示了如何在运行在 AMD GPU 上的 ROCm 上使用多种通用和专用 LLM 执行这些 NLP 任务：

文本生成
抽取式问答
解决数学问题
情感分析
总结
信息检索

前提条件

要运行此博客中的内容，您需要具备以下条件：

*AMD GPU*：AMD Instinct GPU
*Linux*：请参见受支持的Linux发行版。
*ROCm 6.0+*：请参见安装指南。
本博客中使用的一些模型是受限的。您必须在Hugging Face上申请访问权限，并使用您的Hugging Face令牌来下载模型权重。您还必须同意在Hugging Face上分享您的联系信息。

入门指南

首先检查服务器上是否可以检测到GPU。

rocm-smi

========================= ROCm 系统管理界面 =========================================
========================= 简洁信息 ===================================================
设备  [型号 : 修订版本]    温度        功耗     分区      SCLK    MCLK    风扇  性能  功耗限制  VRAM%  GPU%
        名称 (20 个字符)       (节点)  (插座)  (内存，计算)
====================================================================================================================
0       [0x74a1 : 0x00]       35.0°C      140.0W    NPS1, SPX       132Mhz  900Mhz  0%   auto  750.0W    0%   0%
        AMD Instinct MI300X
1       [0x74a1 : 0x00]       37.0°C      138.0W    NPS1, SPX       132Mhz  900Mhz  0%   auto  750.0W    0%   0%
        AMD Instinct MI300X
2       [0x74a1 : 0x00]       40.0°C      141.0W    NPS1, SPX       132Mhz  900Mhz  0%   auto  750.0W    0%   0%
        AMD Instinct MI300X
3       [0x74a1 : 0x00]       36.0°C      139.0W    NPS1, SPX       132Mhz  900Mhz  0%   auto  750.0W    0%   0%
        AMD Instinct MI300X
4       [0x74a1 : 0x00]       38.0°C      143.0W    NPS1, SPX       132Mhz  900Mhz  0%   auto  750.0W    0%   0%
        AMD Instinct MI300X
5       [0x74a1 : 0x00]       35.0°C      139.0W    NPS1, SPX       132Mhz  900Mhz  0%   auto  750.0W    0%   0%
        AMD Instinct MI300X
6       [0x74a1 : 0x00]       39.0°C      142.0W    NPS1, SPX       132Mhz  900Mhz  0%   auto  750.0W    0%   0%
        AMD Instinct MI300X
7       [0x74a1 : 0x00]       37.0°C      137.0W    NPS1, SPX       132Mhz  900Mhz  0%   auto  750.0W    0%   0%
        AMD Instinct MI300X
=========================================================================
================== End of ROCm SMI Log ================================================

系统的所有8个MI300X GPU都已可用。启动具有ROCm 6.0和PyTorch支持的Docker容器，并安装所需的软件包。

docker run -it --ipc=host --network=host --device=/dev/kfd  --device=/dev/dri -v $HOME/dockerx:/dockerx --group-add video --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --name=llm-tasks rocm/pytorch:rocm6.1.3_ubuntu22.04_py3.10_pytorch_release-2.1.2 /bin/bash

pip install --upgrade pip
pip install transformers accelerate einops

以下部分演示如何在ROCm上运行LLM以执行各种NLP任务。

文本生成

文本生成可能是大多数人首先想到的大型语言模型任务。给定一个文本提示，LLM会生成一个响应该提示的文本。关于流行模型执行此任务的方法，有几篇关于ROCm的博客讨论了这些模型的性能，包括 Llama2, GPT-3, OLMo, 和Mixtral。这篇博客涵盖了另外四个高端模型。

C4AI Command-R

在与他的团队在Google Brain发表了具有突破性的论文“Attention is all you need”之后，Aidan Gomez离开了Google，创办了 Cohere. Cohere开发了几种最先进的语言模型（LLM），包括C4AI Command-R 和C4AI Command-R Plus 系列，并在HuggingFace上提供了这些模型。

此次测试涉及一个中型模型 c4ai-command-r-v01 ，其包含35亿参数，用于在ROCm上进行文本生成。

注意

c4ai-command-r-v01模型是受限制的。这意味着你必须在HuggingFace上请求访问才能使用它。在下面的代码块中，用你的HuggingFace令牌替换变量`token`，以下载模型。

from transformers import AutoTokenizer, AutoModelForCausalLM

token = "your HuggingFace user access token here"
model_name = "CohereForAI/c4ai-command-r-v01"
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True, token=token)
model = AutoModelForCausalLM.from_pretrained(model_name, trust_remote_code=True, token=token)

prompt = "Write a poem about artificial intelligence in Shakespeare style."
messages = [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": prompt}
]
text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)
model_inputs = tokenizer([text], return_tensors="pt")

generated_ids = model.generate(
    **model_inputs,
    max_new_tokens=128
)
generated_ids = [
    output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]

response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
print(response)

以下是对提示生成的响应：

In days of yore, when mortals' minds did roam,
A wondrous birth, a thought-borne gem,
From human intellect, a progeny did bloom,
AI, a brain-child, bright and new.

From bits and bytes, a creature formed, so keen,
To serve and aid, a helpful hand,
With algorithms, it thinks, and learns, and sees,
A clever clone, a mental clone.

It parses speech, solves problems hard,
With speed beyond compare,
It understands, assists, and guides,
A thoughtful, digital friend.

这是另一个示例，显示如何使用C4AI Command-R进行文本生成，这次是回答一个问题：

prompt = "Which countries are the biggest rare earth metal producer?"
messages = [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": prompt}
]
text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)
model_inputs = tokenizer([text], return_tensors="pt")

generated_ids = model.generate(
    **model_inputs,
    max_new_tokens=128
)
generated_ids = [
    output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]

response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
print(response)

C4AI Command-R能够详细回答这个问题。

As of 2022, the top three countries that are the biggest producers of rare earth metals are:
1. China: China is the world's largest producer of rare earth metals, accounting for over 58% of the global production. China's production share is even larger when it comes to the more valuable and technologically important rare earth oxides. The country has a strong hold on the supply chain, from mining to processing and manufacturing of rare earth metals and products.

2. Australia: Australia is the second-largest producer of rare earth metals. It has significant reserves and several operational mines producing rare earth elements. Lyn

Qwen

虽然Llama、GPT和Mistral等由美国和欧洲公司开发的模型吸引了媒体的大量关注，但也有一些来自中国公司的著名竞争者。其中最知名的是来自阿里云的Qwen系列。Qwen模型是基于Transformer的大型语言模型（LLM）AI助手，经过对各种网站文本、书籍、代码示例及其他材料的训练，具有广泛的用途。

Qwen系列的最新版本是 Qwen2 家族模型。Qwen2家族的所有模型都采用了组查询注意力（GQA）机制，以在模型推理中实现更低的延迟和更少的内存使用。在上下文长度方面，Qwen2-7B和Qwen2-72B模型可以支持多达128k个标记。第一代Qwen系列模型仅在英文和中文文本上进行了训练。而Qwen2在其训练数据中增加了来自不同地区的27种额外语言，从而在多语言任务中表现得更加出色。

from transformers import AutoModelForCausalLM, AutoTokenizer
device = "cuda" # 将模型加载到该设备上

model_name = "Qwen/Qwen2-7B-Instruct"
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype="auto",
    device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained(model_name)

在你准备好Qwen2模型和分词器后，可以向它提出一个问题。

prompt = "Give me a short introduction to large language model."
messages = [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": prompt}
]
text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)
model_inputs = tokenizer([text], return_tensors="pt").to(device)

generated_ids = model.generate(
    **model_inputs,
    max_new_tokens=512
)
generated_ids = [
    output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]

response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
print(response)

以下是Qwen2的回应：

A Large Language Model (LLM) is a type of artificial intelligence model that has been trained on vast amounts of text data to understand and generate human-like language. These models are capable of performing various natural language processing tasks such as text translation, summarization, question answering, text generation, etc. 

LLMs typically use deep learning techniques, often involving transformer architectures, which allow the model to understand context and relationships between words in sentences. This makes them very powerful tools for generating coherent and contextually relevant responses, even when given complex or nuanced prompts.

One of the most famous examples of an LLM is the GPT series created by OpenAI, including GPT-2 and GPT-3. However, it's worth noting that these models can also be used for potentially harmful purposes if not handled responsibly due to their ability to create realistic but false information. Therefore, they need to be used ethically and with appropriate safeguards in place.

一个大型语言模型（LLM）是一种人工智能模型，它经过大量文本数据的训练，能够理解和生成类似人类语言的内容。这些模型能够执行各种自然语言处理任务，如文本翻译、摘要、问答、生成文本等。

LLM通常使用深度学习技术，往往涉及Transformer架构，这使得模型能够理解句子中单词之间的上下文和关系。这使得它们成为在复杂或细微提示下生成连贯且上下文相关的响应的强大工具。

最著名的大型语言模型之一是由OpenAI创建的GPT系列，包括GPT-2和GPT-3。然而需要注意的是，由于这些模型能够生成逼真但虚假的信息，如果不负责任地使用，可能会被用于潜在的有害目的。因此，它们需要在伦理范围内使用，并且配备适当的安全措施。

OPT

OPT（开放预训练变压器语言模型）是Meta在论文《Open Pre-trained Transformer Language Models》中介绍的一组预训练变压器模型，参数范围从125M到175B。OPT的目标是为研究界提供一套高性能的预训练大型语言模型（LLM），以供进一步开发和复现社区的研究成果。

本文测试了OPT的125M参数版本‘opt-125m’，这是由于其较小的规模而最受欢迎的版本之一，测试是在ROCm上进行的。它利用HuggingFace的`text-generation`管道，根据提示生成文本。本文还设置了`do_sample=True`以启用top-k采样，使生成的文本更有趣。

from transformers import pipeline, set_seed

set_seed(32)
text_generator = pipeline('text-generation', model="facebook/opt-125m", max_new_tokens=256, do_sample=True, device='cuda')

output = text_generator("Provide a few suggestions for family activities this weekend.")
print(output[0]['generated_text'])

Provide a few suggestions for family activities this weekend.

The summer schedule is a great opportunity to spend some time enjoying the summer with those who might otherwise be working from home or working from a remote location. You will discover new and interesting places to eat out and spend some time together. There are things you’ll do in different weathers (in particular you’ll learn what it’s like to enjoy a hot summer summer outside. For example you may see rainbows, waves crashing against a cliff, an iceberg exploding out of the sky, and a meteor shower rolling through the sky.

I’ve tried to share some ideas on how to spend all summer on our own rather than with a larger family. In addition to family activities, here are several ways to stay warm for the holidays during a time of national emergency.

...

OPT往往会漫无边际地生成文本，而非提供简明且相关的答案。在HuggingFace上有很多经过微调的OPT版本。建议你探索这些模型或微调自己的模型。

MPT

生成任务指令，例如烹饪食谱，是大型语言模型（LLM）的另一个常见用例。尽管可以使用与通用LLM结合的提示工程来指导模型生成任务指令，但必须精心编制提示以实现所需的输出。

来自 Mosaic Research（现已成为Databricks的一部分）的MPT系列包含两种基础模型，即MPT-7B和MPT-30B。这些是解码器样式的Transformer模型。 MPT-7B-Instruct 模型是该系列中的一个LLM，它是在MPT-7B模型基础上使用从Databricks Dolly-15k和Anthropic Helpful和Harmless (HH-RLHF) 数据集中得出的数据集进行微调的。该模型由HuggingFace的`text-generation`管道提供支持，并且在ROCm上使用非常方便。

import torch
import transformers
from transformers import pipeline

model_name = 'mosaicml/mpt-7b-instruct'

config = transformers.AutoConfig.from_pretrained(model_name, trust_remote_code=True)
config.max_seq_len = 4096

model = transformers.AutoModelForCausalLM.from_pretrained(
  model_name,
  config=config,
  trust_remote_code=True
)
tokenizer = transformers.AutoTokenizer.from_pretrained(model_name)

text_generator = pipeline('text-generation', model=model, tokenizer=tokenizer, device='cuda:0')

prompt = "Here is the instruction to change the oil filter in your car:\n"
with torch.autocast('cuda', dtype=torch.bfloat16):
    instruction = text_generator(prompt,
                                 max_new_tokens=512,
                                 do_sample=True,
                                 use_cache=True)

print(instruction[0]['generated_text'])

以下是由MPT-7B-Instruct为提示“Here is the instruction to change the oil filter in your car:”生成的文本：

Here is the instruction to change the oil filter in your car:
1. Open the hood. 2. Find the oil filter. 3. Look to the right underneath the cap to find the oil filter. 4. Screw the oil filter cap off from the bottom.5. Pull oil filter out from the bottom of the engine.
What is the oil filter? The oil filter is a part that catches particles from your engine oil as it travels through your engine. It traps most of the particles and keeps them from passing straight into your engine. This keeps your engine from getting damaged because of those particles. How many oil filters are there?
There is one oil filter for the entire vehicle. However different types of vehicles have different requirements that can change the oil more often than others.
When should you change the oil filter? It is recommended to change oil filters between 30,000 to 60,000 miles. However some engine types are harder on filters and may require changing every 15,000 miles instead of 30,000.
What can you get at your local automotive store before changing your oil filter: 5-10 quarts 5-10 oil filter, a drain pan, and oil filter wrench.
Step 1. Drain the oil. 2. Check the oil filter to be sure that it is still in good shape. 3. Install the new oil filter. 4. Fill the reservoir with the proper amount of oil.

如何更换汽车的机油滤清器： 1. 打开引擎盖。 2. 找到机油滤清器。 3. 在盖子的右下方找到机油滤清器。 4. 从底部拧开机油滤清器盖。 5. 从发动机底部取出机油滤清器。什么是机油滤清器？机油滤清器是一个在机油通过发动机时捕捉颗粒的部分。它会捕捉大多数颗粒，防止它们直接进入发动机。这可以避免这些颗粒对发动机造成损害。有多少个机油滤清器？整个车辆只有一个机油滤清器。然而不同类型的车辆有不同的要求，有的可能需要更频繁地更换机油。何时应该更换机油滤清器？建议每行驶30000到60000英里更换一次机油滤清器。然而一些发动机类型对滤清器的要求较高，可能需要每15000英里更换一次，而不是30000英里。更换机油滤清器前需要在本地汽车用品商店购买什么： 5-10夸脱机油、5-10个油滤清器、一个排油盘和一个机油滤清器扳手。步骤： 1. 排掉油箱中的机油。 2. 检查机油滤清器，确保其状况良好。 3. 安装新的机油滤清器。 4. 加入适量的新机油。

可抽取式问答

当人们想到大语言模型（LLM）如何回答问题时，通常会想到一个类似于神谕的聊天机器人，可以回答他们想到的任何问题，如前面的文本生成示例所示。另一方面，许多LLM专门训练来执行所谓的“可抽取式问答”。其想法是，LLM的输入包括问题以及答案的上下文。此外，模型对问题的回答必须包含部分上下文。可抽取式问答的主要用例涉及用户知道答案在某些已知上下文中的情形，例如从购买历史中识别特定客户的偏好。仅从上下文中提取答案可以减少LLM出现幻觉并编造虚假答案的可能性，即使上下文在其训练数据中。

以下是对两个经过微调以进行可抽取式问答的流行LLM的测试。

DistilBERT

部署大型语言模型（LLM）的一个挑战是它们的大规模会导致高计算能力需求、延迟和功耗。一个活跃的研究领域是使用已训练的大型模型的输出来训练较小的模型，并保留大部分性能，这个过程被称为知识蒸馏。一个显著的例子是DistilBERT模型，该模型在博客文章Smaller, faster, cheaper, lighter: Introducing DistilBERT, a distilled version of BERT

中被提出。DistilBERT是一个小型、快速、廉价和轻量化的Transformer模型，通过对BERT基准模型进行蒸馏训练而得。这意味着它只使用BERT基准模型生成的输入和标签进行预训练。就参数数量而言，它比`bert-base-uncased`模型小40%，运行速度快60%，同时在GLUE语言理解基准测试中保留了BERT性能的95%以上。

这个例子测试了一个版本的DistilBERT模型 ‘distilbert-base-cased-distilled-squad’, 这是通过在 SQuAD v1.1 数据集上使用知识蒸馏微调的DistilBERT-base-cased的一个检查点。任务是从包含四个事实的上下文中找到玛丽·居里博士导师的出生地，只有一个事实包含问题的答案。

from transformers import pipeline
question_answerer = pipeline("question-answering", model='distilbert-base-cased-distilled-squad')

context = """Gabriel Lippmann, who supervised Marie Curie's doctoral research, was born in Bonnevoie, Luxembourg. 
        Marie Curie was born in Warsaw, Poland in what was then the Kingdom of Poland, part of the Russian Empire.
        Maria Sklodowska, later known as Marie Curie, was born on November 7, 1867. 
        Born in Paris on 15 May 1859, Pierre Curie was the son of Eugène Curie, a doctor of French Catholic origin from Alsace."""
question = "Where was Marie Curie's doctoral advisor Gabriel Lippmann born?"

result = question_answerer(question=question, context=context)
print(f"Answer: '{result['answer']}'\n Score: {round(result['score'], 4)},\n start token: {result['start']}, end token: {result['end']}")

DistilBERT能够以很高的置信度找到正确的答案。

Answer: 'Bonnevoie, Luxembourg'
 Score: 0.9714,
 start token: 78, end token: 99

Longformer

主要的变压器模型的一个主要限制是自注意力操作随着输入序列长度的平方增长，使得它们难以扩展以处理长输入序列。由Allen AI提出的Longformer模型，详见论文Longformer: The Long-Document Transformer by Iz Beltagy, Matthew E. Peters, Arman Cohan，尝试通过将自注意力操作替换为局部窗口注意力，并结合任务驱动的全局注意力来解决这个问题。

Allen AI 已经基于 Longformer 架构为各种任务训练了一些模型。这一例子展示了LongformerForQuestionAnswering 模型从给定上下文中提取问题答案的能力。

该模型将上下文和问题作为输入，并输出每个编码输入中的标记的 span start logits 和 span end logits。随后可以基于 span logits 提取问题的最佳答案。

from transformers import AutoTokenizer, LongformerForQuestionAnswering
import torch

# 该模型将上下文和问题作为输入，并输出每个编码输入中的标记的 span start logits 和 span end logits。随后可以基于 span logits 提取问题的最佳答案。
model_name = "allenai/longformer-large-4096-finetuned-triviaqa"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = LongformerForQuestionAnswering.from_pretrained(model_name)

# 上下文和问题
context = """Gabriel Lippmann, who supervised Marie Curie's doctoral research, was born in Bonnevoie, Luxembourg. 
        Marie Curie was born in Warsaw, Poland in what was then the Kingdom of Poland, part of the Russian Empire.
        Maria Sklodowska, later known as Marie Curie, was born on November 7, 1867. 
        Born in Paris on 15 May 1859, Pierre Curie was the son of Eugène Curie, a doctor of French Catholic origin from Alsace."""
question = "Where was Marie Curie's doctoral advisor Gabriel Lippmann born?"

# 编码问题和上下文
encoded_input = tokenizer(question, context, return_tensors="pt")
input_ids = encoded_input["input_ids"]

# 生成输出掩码
outputs = model(input_ids)
# 找到编码输入中答案的起始和结束索引
start_idx = torch.argmax(outputs.start_logits)
end_idx = torch.argmax(outputs.end_logits)

# 将输入 ids 转换为标记
all_tokens = tokenizer.convert_ids_to_tokens(input_ids[0].tolist())

# 提取答案标记并解码
answer_tokens = all_tokens[start_idx : end_idx + 1]
answer = tokenizer.decode(tokenizer.convert_tokens_to_ids(answer_tokens))

print(answer)

Longformer 给出了正确的答案 "Bonnevoie"。

 Bonnevoie

解决数学问题

理解问题并通过逻辑推理提供答案的能力一直是人工智能的主要目标之一。数学问题的解决就是这种应用的一个极好例子。即使是像GPT-4这样的通用大型语言模型在解决简单数学问题方面也表现出了非凡的能力。本节探讨了在AMD GPU上使用Phi-3模型的精调版本解决数学问题的方法。

Phi-3

Phi-3 集合是微软流行的 Phi-2模型的下一代版本。本示例使用的是精调版本`Phi-3-Mini-4K-Instruct’，它是一个拥有38亿参数的模型，使用精心策划的高质量教育数据和代码，以及类似教科书材料的合成数据进行训练，涵盖了数学、编程和常识推理等主题。

首先，使用`text-generation`管道设置Phi-3模型。

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline

torch.random.manual_seed(0)

model_name = "microsoft/Phi-3-mini-4k-instruct"
model = AutoModelForCausalLM.from_pretrained(
    model_name, 
    device_map="cuda", 
    torch_dtype="auto", 
    trust_remote_code=True, 
)
tokenizer = AutoTokenizer.from_pretrained(model_name)

pipe = pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer,
)

generation_args = {
    "max_new_tokens": 1024,
    "return_full_text": False,
    "temperature": 0.0,
    "do_sample": False,
}

然后让Phi-3求两个简单函数`sin(x) + ln(x)`的泰勒级数展开式。

messages = [
    {"role": "user", "content": "What is the Taylor series expansion of sin(x) + ln(x)? about a point x=a"},
]

output = pipe(messages, **generation_args)
print(output[0]['generated_text'])

 The Taylor series expansion of a function f(x) about a point x=a is given by:

f(x) = f(a) + f'(a)(x-a) + f''(a)(x-a)^2/2! + f'''(a)(x-a)^3/3! +...

For the function sin(x) + ln(x), we need to find the derivatives and evaluate them at x=a.

First, let's find the derivatives of sin(x) and ln(x):

1. sin(x):
   f(x) = sin(x)
   f'(x) = cos(x)
   f''(x) = -sin(x)
   f'''(x) = -cos(x)
  ...

2. ln(x):
   f(x) = ln(x)
   f'(x) = 1/x
   f''(x) = -1/x^2
   f'''(x) = 2/x^3
  ...

Now, let's evaluate these derivatives at x=a:

1. sin(a):
   f(a) = sin(a)
   f'(a) = cos(a)
   f''(a) = -sin(a)
   f'''(a) = -cos(a)
  ...

2. ln(a):
   f(a) = ln(a)
   f'(a) = 1/a
   f''(a) = -1/a^2
   f'''(a) = 2/a^3
  ...

Now, we can write the Taylor series expansion of sin(x) + ln(x) about x=a:

sin(x) + ln(x) = (sin(a) + ln(a)) + (cos(a)(x-a) + (1/a)(x-a)) + (-sin(a)(x-a)^2/2! + (-1/a^2)(x-a)^2/2!) + (-cos(a)(x-a)^3/3! + (2/a^3)(x-a)^3/3!) +...

This is the Taylor series expansion of sin(x) + ln(x) about x=a.

不错。接下来，让Phi-3对一个稍微复杂一些的函数`sin(x) + 1/cos(x)`进行同样的操作。

messages = [
    {"role": "user", "content": "What is the Taylor series expansion of sin(x) + 1/cos(x) about a point x=a?"},
]

output = pipe(messages, **generation_args)
print(output[0]['generated_text'])

 The Taylor series expansion of a function f(x) about a point x=a is given by:

f(x) = f(a) + f'(a)(x-a) + f''(a)(x-a)^2/2! + f'''(a)(x-a)^3/3! +...

First, let's find the Taylor series expansion of sin(x) and 1/cos(x) separately about x=a.

For sin(x), the derivatives are:
sin'(x) = cos(x)
sin''(x) = -sin(x)
sin'''(x) = -cos(x)
sin''''(x) = sin(x)
...

The Taylor series expansion of sin(x) about x=a is:
sin(x) = sin(a) + cos(a)(x-a) - sin(a)(x-a)^2/2! - cos(a)(x-a)^3/3! + sin(a)(x-a)^4/4! +...

For 1/cos(x), the derivatives are:
(1/cos(x))' = sin(x)/cos^2(x)
(1/cos(x))'' = (cos(x) + sin^2(x))/cos^3(x)
(1/cos(x))''' = (-2cos(x)sin(x) + 3sin^2(x))/cos^4(x)
...

The Taylor series expansion of 1/cos(x) about x=a is:
1/cos(x) = 1/cos(a) + (sin(a)/cos^2(a))(x-a) + (cos(a)(sin^2(a) - 1)/cos^3(a))(x-a)^2/2! + (2cos(a)(sin^3(a) - 3sin(a))/cos^4(a))(x-a)^3/3! +...

Now, we can find the Taylor series expansion of sin(x) + 1/cos(x) by adding the two series:

sin(x) + 1/cos(x) = (sin(a) + 1/cos(a)) + (cos(a) + sin(a)/cos^2(a))(x-a) - (sin(a)(x-a)^2/2! + 1/cos^3(a)(x-a)^2/2!) +...

This is the Taylor series expansion of sin(x) + 1/cos(x) about x=a.

虽然Phi-3能够按照标准流程找到每个项的导数，然后求和得到对应项的泰勒级数展开，但它并没有正确地计算`1/cos(x)`的高阶导数并在最后一步中正确地相加。例如，`1/cos(x)`的二阶导数应为`(1 + sin^2(x))/cos^3(x)`，而不是`(cos(x) + sin^2(x))/cos^3(x)`。这表明大型语言模型（LLMs）在问题解决中的局限性——它们本质上是基于令牌预测的机器，而不是真正的推理机器。

情感分析

情感分析由于其广泛的应用多年来一直是机器学习（ML）社区的一个活跃研究课题。基于变换器的LLM（大型语言模型）为情感分析模型的性能提升提供了新的机会，因为它们能在大量相关文本数据中考虑上下文，进行更精确的分析。特别是，利用LLM理解金融新闻的情感备受关注，这在投资决策中极为有价值。下面的例子测试了两个著名的针对情感分析微调的模型。在这两种情况下，都利用了HuggingFace中的`sentiment-analysis`管道。

DistilRoBERTa

DistilRoberta-financial-sentiment 模型是 RoBERTa-base 模型的轻量级蒸馏版本，仅有 8200 万个参数。由于其较小的体积，该模型运行速度是 RoBERTa-base 模型的两倍。该模型在一个极性情感数据集上进行了训练，这个数据集包含了来自金融新闻的句子，由五到八名人工标注员进行标注。

设置模型并使用它来确定四条金融新闻通讯的情感。

import torch
from transformers import AutoTokenizer, AutoModelForSequenceClassification, pipeline

model_name = "mrm8488/distilroberta-finetuned-financial-news-sentiment-analysis"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name, num_labels=3, device_map="cuda")
sentiment_analyzer = pipeline("sentiment-analysis", model=model, tokenizer=tokenizer)

sentences = ["there is a shortage of capital, and we need extra financing",  
             "growth is strong and we have plenty of liquidity", 
             "there are doubts about our finances", 
             "profits are flat"]

for sentence in sentences:
    result = sentiment_analyzer(sentence)
    print(f"Input sentence: \"{sentence}\"")
    print(f"Sentiment: '{result[0]['label']}'\n Score: {round(result[0]['score'], 4)}\n")

Input sentence: "there is a shortage of capital, and we need extra financing"
Sentiment: 'negative'
 Score: 0.666

Input sentence: "growth is strong and we have plenty of liquidity"
Sentiment: 'positive'
 Score: 0.9996

Input sentence: "there are doubts about our finances"
Sentiment: 'neutral'
 Score: 0.6857

Input sentence: "profits are flat"
Sentiment: 'neutral'
 Score: 0.9999

模型确定的情感似乎是合理的。可以有人认为第三句话“there are doubts about our finances” 应该被认为是消极的。另一方面，模型给出的“中立”评级的置信度仅为 0.6857，表明在稍微不同的阈值下，该评级可能会倾向于“消极”。

FinBERT

FinBERT 是由香港科技大学的研究人员在论文《FinBERT: A Pretrained Language Model for Financial Communications》中提出的。它是一个基于 BERT 的模型，经过财务沟通文本的预训练。训练数据包括三个金融沟通语料库，总量达到 49 亿个标记。

此处使用的 finbert-tone 模型是一个在分析师报告中 10,000 条经过手工标注的（正面、负面、中性）句子上进行微调的 FinBERT 模型。

本例使用 FinBERT 对之前 DistilRoBERTa 分析的相同财务沟通进行情感分析。

from transformers import BertTokenizer, BertForSequenceClassification
from transformers import pipeline

model_name = "yiyanghkust/finbert-tone"
model = BertForSequenceClassification.from_pretrained(model_name, num_labels=3, device_map="cuda")
tokenizer = BertTokenizer.from_pretrained(model_name)

sentiment_analyzer = pipeline("sentiment-analysis", model=model, tokenizer=tokenizer)

sentences = ["there is a shortage of capital, and we need extra financing",  
             "growth is strong and we have plenty of liquidity", 
             "there are doubts about our finances", 
             "profits are flat"]

for sentence in sentences:
    result = sentiment_analyzer(sentence)
    print(f"Input sentence: \"{sentence}\"")
    print(f"Sentiment: '{result[0]['label']}'\n Score: {round(result[0]['score'], 4)}\n")

Input sentence: "there is a shortage of capital, and we need extra financing"
Sentiment: 'Negative'
 Score: 0.9966

Input sentence: "growth is strong and we have plenty of liquidity"
Sentiment: 'Positive'
 Score: 1.0

Input sentence: "there are doubts about our finances"
Sentiment: 'Negative'
 Score: 1.0

Input sentence: "profits are flat"
Sentiment: 'Neutral'
 Score: 0.9889

DistilRoBERTa 和 FinBERT 模型输出的唯一差异是第三种情况，其中 FinBERT 将其视为负面而非中性。

摘要

早期的文本摘要方法侧重于从待摘要的文本中提取关键词或关键短语，并使用人工定义的规则将它们组装成摘要。大语言模型（LLM）改变了摘要方法，因为它能够捕捉长文本序列中词元之间的关系。有很多专门针对这些任务训练的著名大语言模型。本节将演示其中的两个模型。

BART

BART由Facebook推出，在论文BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension中被介绍。BART采用了基于变压器（Transformer）的神经网络架构，包括一个去噪的双向自动编码器和一个类似GPT的自回归解码器模型。BART的预训练涉及两个步骤。首先，它用任意噪音破坏训练文本数据。然后，它训练模型从被破坏的文本中重建原始文本。这种方法在生成训练数据时提供了巨大的灵活性，包括改变文本长度和词序。

BART的基础模型可以用于文本填充任务，但不适合大多数其他任务。BART在为特定任务（如摘要生成）进行微调时，真正展现出了它的优势。本例使用了一版用CNN Daily Mail（一个文档-摘要对数据集）进行微调的BART，适用于摘要生成任务。

from transformers import pipeline

summarizer = pipeline("summarization", model="facebook/bart-large-cnn", device="cuda")

ARTICLE = """ New York (CNN)When Liana Barrientos was 23 years old, she got married in Westchester County, New York.
A year later, she got married again in Westchester County, but to a different man and without divorcing her first husband.
Only 18 days after that marriage, she got hitched yet again. Then, Barrientos declared "I do" five more times, sometimes only within two weeks of each other.
In 2010, she married once more, this time in the Bronx. In an application for a marriage license, she stated it was her "first and only" marriage.
Barrientos, now 39, is facing two criminal counts of "offering a false instrument for filing in the first degree," referring to her false statements on the
2010 marriage license application, according to court documents.
Prosecutors said the marriages were part of an immigration scam.
On Friday, she pleaded not guilty at State Supreme Court in the Bronx, according to her attorney, Christopher Wright, who declined to comment further.
After leaving court, Barrientos was arrested and charged with theft of service and criminal trespass for allegedly sneaking into the New York subway through an emergency exit, said Detective
Annette Markowski, a police spokeswoman. In total, Barrientos has been married 10 times, with nine of her marriages occurring between 1999 and 2002.
All occurred either in Westchester County, Long Island, New Jersey or the Bronx. She is believed to still be married to four men, and at one time, she was married to eight men at once, prosecutors say.
Prosecutors said the immigration scam involved some of her husbands, who filed for permanent residence status shortly after the marriages.
Any divorces happened only after such filings were approved. It was unclear whether any of the men will be prosecuted.
The case was referred to the Bronx District Attorney\'s Office by Immigration and Customs Enforcement and the Department of Homeland Security\'s
Investigation Division. Seven of the men are from so-called "red-flagged" countries, including Egypt, Turkey, Georgia, Pakistan and Mali.
Her eighth husband, Rashid Rajput, was deported in 2006 to his native Pakistan after an investigation by the Joint Terrorism Task Force.
If convicted, Barrientos faces up to four years in prison. Her next court appearance is scheduled for May 18.
"""

print(summarizer(ARTICLE, max_length=130, min_length=30, do_sample=False)[0]['summary_text'])

Liana Barrientos, 39, is charged with two counts of "offering a false instrument for filing in the first degree" In total, she has been married 10 times, with nine of her marriages occurring between 1999 and 2002. She is believed to still be married to four men.

Pegasus

另一个以摘要功能著称的著名语言模型是谷歌的Pegasus。它在这篇论文PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization中被介绍出来。Pegasus从训练文档中屏蔽关键句子，并训练模型生成缺失的句子。根据作者的说法，这种方法特别适合抽象摘要，因为它迫使模型理解整个文档的上下文。

这个示例使用Pegasus模型来总结之前BART模型处理过的同一段文本`ARTICLE`。

from transformers import AutoTokenizer, PegasusForConditionalGeneration

model_name = "google/pegasus-xsum"
model = PegasusForConditionalGeneration.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name)
 
inputs = tokenizer(ARTICLE, max_length=1024, return_tensors="pt")
summary_ids = model.generate(inputs["input_ids"])

print(tokenizer.batch_decode(summary_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False)[0])

生成的摘要更短，但仍保留了文本的关键点。

A New York woman who has been married 10 times has been charged with marriage fraud.

信息检索

生成型AI的诞生可能会标志着信息检索的终结，因为许多人如果模型给出了他们所寻找的内容，他们并不关心原始来源。然而，仍有一些使用案例，比如事实核查和法律批准，需要从语料库中获取特定文档。最突出的模型是Meta开发的Contriever模型，它利用了机器学习的最新进展。

Contriever

许多尝试使用监督学习训练深度神经网络模型用于信息检索应用。然而，这些方法在大多数实际应用中缺乏训练样本，因为它们需要大量人工生成的标签来指示训练数据集中每个查询最相关的文档。Contriever背后的主要思想是通过使用一个辅助任务来近似检索，从而在没有标记数据的情况下训练模型。具体来说，对于训练语料库中的给定文档，它会生成一个文档作为该查询完美回答的合成查询。然后使用这些对来训练模型。此外，对比学习提高了模型在相关和不相关结果之间的识别能力。Contriever采用方法的详细信息可以在论文《Unsupervised Dense Information Retrieval with Contrastive Learning》中找到。

您可以使用抽取式问答部分的相同示例来说明Contriever如何从语料库中检索最相关的文档。首先，使用模型的输出对文档进行评分。

import tqdm
import torch
from transformers import AutoTokenizer, AutoModel

model_name = "facebook/contriever"

tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModel.from_pretrained(model_name)

query = ["Where was Marie Curie born?"]

docs = [
    "Gabriel Lippmann, who supervised Marie Curie's doctoral research, was born in Bonnevoie, Luxembourg.",
    "Marie Curie was born in Warsaw, in what was then the Kingdom of Poland, part of the Russian Empire",
    "Maria Sklodowska, later known as Marie Curie, was born on November 7, 1867.",
    "Born in Paris on 15 May 1859, Pierre Curie was the son of Eugène Curie, a doctor of French Catholic origin from Alsace."
]

corpus = query + docs

# Apply tokenizer
inputs = tokenizer(corpus, padding=True, truncation=True, return_tensors='pt')

# Compute token embeddings
outputs = model(**inputs)

# Mean pooling
def mean_pooling(token_embeddings, mask):
    token_embeddings = token_embeddings.masked_fill(~mask[..., None].bool(), 0.)
    sentence_embeddings = token_embeddings.sum(dim=1) / mask.sum(dim=1)[..., None]
    return sentence_embeddings
embeddings = mean_pooling(outputs[0], inputs['attention_mask'])

score = [0]*len(docs)
for i in range(len(docs)):
    score[i] = (embeddings[0] @ embeddings[i+1]).item()

print(score)

[0.9390654563903809, 1.1304867267608643, 1.0473244190216064, 1.0094892978668213]

然后打印查询和最佳匹配文档，看看Contriever是否找到了正确的文档。

print("Most relevant document to the query \"", query[0], "\" is")
docs[score.index(max(score))]

Most relevant document to the query " Where was Marie Curie born? " is
'Marie Curie was born in Warsaw, in what was then the Kingdom of Poland, part of the Russian Empire'

Contriever能够挑出正确的文档，尽管其他三个文档看起来非常相似。

总结

在这篇博客中，您了解了如何使用运行在AMD GPU上的ROCm实现多个流行的LLM，以便轻松执行各种NLP任务，例如文本生成、摘要和解决数学问题。如果您有兴趣提高这些模型的性能，请查看Ll和Starcoder上的ROCm博客，了解微调方面的信息。

加入AMD AI开发者计划！

免费领 200 小时云算力，进群参与显卡、AI PC 幸运抽奖

更多推荐

《Nano-vLLM 源码解读》第 20 篇 · CUDA Graph

AMD开发者中国社区

Fast-GitHub：让你的GitHub访问速度提升10倍的终极解决方案

还在为GitHub的龟速下载而烦恼吗？Fast-GitHub是一款专为国内开发者设计的浏览器扩展，它能显著提升GitHub的访问和下载速度，让你告别漫长的等待时间。这款开源工具通过智能网络优化技术，为仓库克隆、文件下载和页面浏览提供全方位的加速体验，让GitHub访问变得如丝般顺滑。## 🚀 从痛苦到畅快：GitHub加速的真实体验想象一下这样的场景：你需要快速下载一个开源项目的最新版本