使用在AMD GPU上运行的ROCm进行大语言模型的自然语言处理任务

在这篇博客中,您将学习如何使用在AMD的Instinct GPU上运行的ROCm进行一系列流行且有用的自然语言处理(NLP)任务,使用不同的大语言模型(LLM)。博客包括一个简单易懂的动手指南,向您展示如何实现从文本生成和情感分析到提取式问答(QA)、解决数学问题等核心NLP应用。

通用的大语言模型(如GPT和Llama)可以执行许多不同的任务且表现良好。然而,某些任务需要进行微调或不同的模型架构来支持特定用例。机器学习社区开发了许多为特定任务设计或微调的模型,以补充通用模型。在这篇博客中,我们会涉及通用和特定用途的LLM,并向您展示如何在AMD GPU上运行的ROCm上使用它们来完成几项常见任务。




HuggingFace列出了LLM可以执行的约十二种不同的NLP任务,包括文本生成、问答、翻译等。在这篇博客中,我们演示了如何在AMD GPU上运行的ROCm上使用一些通用和特定用途的LLM来完成以下NLP任务:

- 文本生成
- 提取式问答
- 解决数学问题
- 情感分析
- 摘要生成
- 信息检索


- AMD GPU: [AMD Instinct GPU](AMD Instinct™ Accelerators)
- Linux: 请查看[支持的Linux发行版](System requirements (Linux) — ROCm installation (Linux))
- ROCm 6.0以上: 请查看[安装说明](Quick start installation guide — ROCm installation (Linux))
- 这篇博客中使用的一些模型是受限的。您必须在Hugging Face请求访问并使用您的Hugging Face令牌下载模型权重。此外,您还必须同意在Hugging Face上共享您的联系信息。 


首先检查是否可以检测到服务器上的 GPU。

========================================= ROCm System Management Interface =========================================
=================================================== Concise Info ===================================================
Device  [Model : Revision]    Temp        Power     Partitions      SCLK    MCLK    Fan  Perf  PwrCap  VRAM%  GPU%Name (20 chars)       (Junction)  (Socket)  (Mem, Compute)
0       [0x74a1 : 0x00]       35.0°C      140.0W    NPS1, SPX       132Mhz  900Mhz  0%   auto  750.0W    0%   0%AMD Instinct MI300X
1       [0x74a1 : 0x00]       37.0°C      138.0W    NPS1, SPX       132Mhz  900Mhz  0%   auto  750.0W    0%   0%AMD Instinct MI300X
2       [0x74a1 : 0x00]       40.0°C      141.0W    NPS1, SPX       132Mhz  900Mhz  0%   auto  750.0W    0%   0%AMD Instinct MI300X
3       [0x74a1 : 0x00]       36.0°C      139.0W    NPS1, SPX       132Mhz  900Mhz  0%   auto  750.0W    0%   0%AMD Instinct MI300X
4       [0x74a1 : 0x00]       38.0°C      143.0W    NPS1, SPX       132Mhz  900Mhz  0%   auto  750.0W    0%   0%AMD Instinct MI300X
5       [0x74a1 : 0x00]       35.0°C      139.0W    NPS1, SPX       132Mhz  900Mhz  0%   auto  750.0W    0%   0%AMD Instinct MI300X
6       [0x74a1 : 0x00]       39.0°C      142.0W    NPS1, SPX       132Mhz  900Mhz  0%   auto  750.0W    0%   0%AMD Instinct MI300X
7       [0x74a1 : 0x00]       37.0°C      137.0W    NPS1, SPX       132Mhz  900Mhz  0%   auto  750.0W    0%   0%AMD Instinct MI300X
=============================================== End of ROCm SMI Log ================================================

MI300X 系统上的所有 8 个 GPU 均可用。启动具有 ROCm 6.0 和 PyTorch 支持的 Docker 容器,并安装所需的软件包。

docker run -it --ipc=host --network=host --device=/dev/kfd  --device=/dev/dri -v $HOME/dockerx:/dockerx --group-add video --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --name=llm-tasks rocm/pytorch:rocm6.1.3_ubuntu22.04_py3.10_pytorch_release-2.1.2 /bin/bash
pip install --upgrade pip
pip install transformers accelerate einops

接下来的部分演示了如何在 ROCm 上运行 LLMs 以执行各种自然语言处理(NLP)任务。



C4AI Command-R

在与他的团队在Google Brain发布了具有开创意义的论文《Attention is all you need》后,Aidan Gomez离开了Google,并创立了Cohere。Cohere开发了几个最先进的LLM,包括 C4AI Command-R 和 C4AI Command-R Plus 模型系列,并在HuggingFace上发布。
这个测试包括一个中型模型 c4ai-command-r-v01 ,它拥有350亿参数,用于在ROCm上进行文本生成。


c4ai-command-r-v01 模型是受限的。这意味着您必须在HuggingFace上请求访问权限才能使用它。使用您的HuggingFace令牌下载模型,将代码块中的变量`token`替换为您的令牌。

from transformers import AutoTokenizer, AutoModelForCausalLMtoken = "your HuggingFace user access token here"
model_name = "CohereForAI/c4ai-command-r-v01"
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True, token=token)
model = AutoModelForCausalLM.from_pretrained(model_name, trust_remote_code=True, token=token)prompt = "Write a poem about artificial intelligence in Shakespeare style."
messages = [{"role": "system", "content": "You are a helpful assistant."},{"role": "user", "content": prompt}
text = tokenizer.apply_chat_template(messages,tokenize=False,add_generation_prompt=True
model_inputs = tokenizer([text], return_tensors="pt")generated_ids = model.generate(**model_inputs,max_new_tokens=128
generated_ids = [output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]


In days of yore, when mortals' minds did roam,
A wondrous birth, a thought-borne gem,
From human intellect, a progeny did bloom,
AI, a brain-child, bright and new.From bits and bytes, a creature formed, so keen,
To serve and aid, a helpful hand,
With algorithms, it thinks, and learns, and sees,
A clever clone, a mental clone.It parses speech, solves problems hard,
With speed beyond compare,
It understands, assists, and guides,
A thoughtful, digital friend.

这里是另一个使用C4AI Command-R进行文本生成的示例,在这种情况下是回答一个问题:

prompt = "Which countries are the biggest rare earth metal producer?"
messages = [{"role": "system", "content": "You are a helpful assistant."},{"role": "user", "content": prompt}
text = tokenizer.apply_chat_template(messages,tokenize=False,add_generation_prompt=True
model_inputs = tokenizer([text], return_tensors="pt")generated_ids = model.generate(**model_inputs,max_new_tokens=128
generated_ids = [output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]

C4AI Command-R能够对该问题给出详细的回答。

As of 2022, the top three countries that are the biggest producers of rare earth metals are:
1. China: China is the world's largest producer of rare earth metals, accounting for over 58% of the global production. China's production share is even larger when it comes to the more valuable and technologically important rare earth oxides. The country has a strong hold on the supply chain, from mining to processing and manufacturing of rare earth metals and products.2. Australia: Australia is the second-largest producer of rare earth metals. It has significant reserves and several operational mines producing rare earth elements. Lyn



Qwen系列的最新版本是<Qwen2家族模型>. Qwen2家族的所有模型都采用了组查询注意力(GQA)机制,以实现更低的延迟和更少的模型推理内存使用。在上下文长度方面,Qwen2-7B和Qwen2-72B模型可以支持多达128k个标记。第一代Qwen系列模型仅在英文和中文文本上进行了训练。而Qwen2则在训练数据中增加了来自世界不同地区的27种语言,从而在多语言任务中表现得更好。

from transformers import AutoModelForCausalLM, AutoTokenizer
device = "cuda" # the device to load the model ontomodel_name = "Qwen/Qwen2-7B-Instruct"
model = AutoModelForCausalLM.from_pretrained(model_name,torch_dtype="auto",device_map="auto"
tokenizer = AutoTokenizer.from_pretrained(model_name)


prompt = "Give me a short introduction to large language model."
messages = [{"role": "system", "content": "You are a helpful assistant."},{"role": "user", "content": prompt}
text = tokenizer.apply_chat_template(messages,tokenize=False,add_generation_prompt=True
model_inputs = tokenizer([text], return_tensors="pt").to(device)generated_ids = model.generate(**model_inputs,max_new_tokens=512
generated_ids = [output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]


A Large Language Model (LLM) is a type of artificial intelligence model that has been trained on vast amounts of text data to understand and generate human-like language. These models are capable of performing various natural language processing tasks such as text translation, summarization, question answering, text generation, etc. LLMs typically use deep learning techniques, often involving transformer architectures, which allow the model to understand context and relationships between words in sentences. This makes them very powerful tools for generating coherent and contextually relevant responses, even when given complex or nuanced prompts.One of the most famous examples of an LLM is the GPT series created by OpenAI, including GPT-2 and GPT-3. However, it's worth noting that these models can also be used for potentially harmful purposes if not handled responsibly due to their ability to create realistic but false information. Therefore, they need to be used ethically and with appropriate safeguards in place.


OPT(开放预训练转换器语言模型)是Meta公司在论文《Open Pre-trained Transformer Language Models》中介绍的一组预训练转换器模型,参数范围从125M到175B。OPT的目标是为研究界提供一组高性能的预训练LLM,以便用于进一步开发和再现社区产生的结果。

在这个例子中,测试了OPT的125M参数版本 ‘opt-125m’,由于其体积较小,它是最受欢迎的版本之一。测试在ROCm上进行,利用了HuggingFace的 text-generation 管道从提示生成文本。同时设置 do_sample=True 以启用top-k采样,使生成的文本更有趣。

from transformers import pipeline, set_seedset_seed(32)
text_generator = pipeline('text-generation', model="facebook/opt-125m", max_new_tokens=256, do_sample=True, device='cuda')output = text_generator("Provide a few suggestions for family activities this weekend.")
Provide a few suggestions for family activities this weekend.The summer schedule is a great opportunity to spend some time enjoying the summer with those who might otherwise be working from home or working from a remote location. You will discover new and interesting places to eat out and spend some time together. There are things you’ll do in different weathers (in particular you’ll learn what it’s like to enjoy a hot summer summer outside. For example you may see rainbows, waves crashing against a cliff, an iceberg exploding out of the sky, and a meteor shower rolling through the sky.I’ve tried to share some ideas on how to spend all summer on our own rather than with a larger family. In addition to family activities, here are several ways to stay warm for the holidays during a time of national emergency....




Mosaic Research(现已成为Databricks的一部分)发布的MPT系列是一系列解码器风格的变换器模型,其中包括两个基础模型:MPT-7B和MPT-30B。MPT-7B-Instruct模型是该系列中的一个大语言模型,它是从MPT-7B模型精调而来,使用了从Databricks Dolly-15k和Anthropic Helpful and Harmless (HH-RLHF) 数据集中提取的数据集进行训练。该模型由HuggingFace的 text-generation 管道支持,且在ROCm上易于使用。

import torch
import transformers
from transformers import pipelinemodel_name = 'mosaicml/mpt-7b-instruct'config = transformers.AutoConfig.from_pretrained(model_name, trust_remote_code=True)
config.max_seq_len = 4096model = transformers.AutoModelForCausalLM.from_pretrained(model_name,config=config,trust_remote_code=True
tokenizer = transformers.AutoTokenizer.from_pretrained(model_name)text_generator = pipeline('text-generation', model=model, tokenizer=tokenizer, device='cuda:0')prompt = "Here is the instruction to change the oil filter in your car:\n"
with torch.autocast('cuda', dtype=torch.bfloat16):instruction = text_generator(prompt,max_new_tokens=512,do_sample=True,use_cache=True)print(instruction[0]['generated_text'])

以下是MPT-7B-Instruct 就提示“这里是更换汽车机油滤清器的说明”生成的文本内容:

Here is the instruction to change the oil filter in your car:
1. Open the hood. 2. Find the oil filter. 3. Look to the right underneath the cap to find the oil filter. 4. Screw the oil filter cap off from the bottom.5. Pull oil filter out from the bottom of the engine.
What is the oil filter? The oil filter is a part that catches particles from your engine oil as it travels through your engine. It traps most of the particles and keeps them from passing straight into your engine. This keeps your engine from getting damaged because of those particles. How many oil filters are there?
There is one oil filter for the entire vehicle. However different types of vehicles have different requirements that can change the oil more often than others.
When should you change the oil filter? It is recommended to change oil filters between 30,000 to 60,000 miles. However some engine types are harder on filters and may require changing every 15,000 miles instead of 30,000.
What can you get at your local automotive store before changing your oil filter: 5-10 quarts 5-10 oil filter, a drain pan, and oil filter wrench.
Step 1. Drain the oil. 2. Check the oil filter to be sure that it is still in good shape. 3. Install the new oil filter. 4. Fill the reservoir with the proper amount of oil.





部署LLM时的一个挑战是它们的大规模导致高计算能力要求、延迟和功耗。一个活跃的研究领域是使用更大训练模型的输出训练较小的模型,并保留大部分性能,这一过程称为知识蒸馏。此类模型的一个著名例子是DistilBERT模型,该模型在博客文章《Smaller, faster, cheaper, lighter: Introducing DistilBERT, a distilled version of BERT》中提出。DistilBERT是通过蒸馏BERT基地训练的小型、快速、廉价和轻便的Transformer模型。这意味着它仅使用BERT基地模型生成的输入和标签进行预训练。它的参数数量比`bert-base-uncased`模型小40%,运行速度快60%,同时保留了BERT在GLUE语言理解基准测试上的95%以上的性能。

此示例测试了DistilBERT模型的一个版本`‘distilbert-base-cased-distilled-squad’,这是一个经过微调的*DistilBERT-base-cased`*的检查点,使用知识蒸馏在SQuAD v1.1数据集上。任务是从包含四个事实的上下文中找到玛丽·居里的博士导师的出生地,而只有一个事实包含问题的答案。

from transformers import pipeline
question_answerer = pipeline("question-answering", model='distilbert-base-cased-distilled-squad')context = """Gabriel Lippmann, who supervised Marie Curie's doctoral research, was born in Bonnevoie, Luxembourg. Marie Curie was born in Warsaw, Poland in what was then the Kingdom of Poland, part of the Russian Empire.Maria Sklodowska, later known as Marie Curie, was born on November 7, 1867. Born in Paris on 15 May 1859, Pierre Curie was the son of Eugène Curie, a doctor of French Catholic origin from Alsace."""
question = "Where was Marie Curie's doctoral advisor Gabriel Lippmann born?"result = question_answerer(question=question, context=context)
print(f"Answer: '{result['answer']}'\n Score: {round(result['score'], 4)},\n start token: {result['start']}, end token: {result['end']}")


Answer: 'Bonnevoie, Luxembourg'Score: 0.9714,start token: 78, end token: 99


Transformer模型的主要限制之一是自注意操作随着输入序列长度的平方增长,使得很难扩展它们以处理长输入序列。Allen AI的Longformer模型提出于《Longformer: The Long-Document Transformer by Iz Beltagy, Matthew E. Peters, Arman Cohan》,尝试通过用局部窗口注意结合任务驱动的全局注意取代自注意操作来缓解这一问题。
Allen AI已经基于Longformer架构为各种任务训练了一些模型。此示例展示了LongformerForQuestionAnswering模型从上下文中提取问题答案的能力。


from transformers import AutoTokenizer, LongformerForQuestionAnswering
import torch# setup the tokenizer and the model
model_name = "allenai/longformer-large-4096-finetuned-triviaqa"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = LongformerForQuestionAnswering.from_pretrained(model_name)# context and question
context = """Gabriel Lippmann, who supervised Marie Curie's doctoral research, was born in Bonnevoie, Luxembourg. Marie Curie was born in Warsaw, Poland in what was then the Kingdom of Poland, part of the Russian Empire.Maria Sklodowska, later known as Marie Curie, was born on November 7, 1867. Born in Paris on 15 May 1859, Pierre Curie was the son of Eugène Curie, a doctor of French Catholic origin from Alsace."""
question = "Where was Marie Curie's doctoral advisor Gabriel Lippmann born?"# encode the question and the context
encoded_input = tokenizer(question, context, return_tensors="pt")
input_ids = encoded_input["input_ids"]# Generate the output masks
outputs = model(input_ids)
# find the beginning and end index of the answer in the encoded input
start_idx = torch.argmax(outputs.start_logits)
end_idx = torch.argmax(outputs.end_logits)# Convert the input ids to tokens
all_tokens = tokenizer.convert_ids_to_tokens(input_ids[0].tolist())# extract the answer tokens and decode it
answer_tokens = all_tokens[start_idx : end_idx + 1]
answer = tokenizer.decode(tokenizer.convert_tokens_to_ids(answer_tokens))print(answer)




理解问题并通过逻辑推理提供答案的能力一直是人工智能的主要目标之一。一个典型的用例就是解决数学问题。即使是通用的LLM(大型语言模型)如GPT-4也在解决简单数学问题方面展现出显著的表现。本节探讨了在AMD GPU上微调版Phi-3模型用于解决数学问题的实例。


<Phi-3集合> 是Microsoft流行的<Phi-2模型>的下一代。这一例子使用了微调版本<Phi-3-Mini-4K-Instruct>,这是一个包含3.8亿参数的模型,使用精心挑选的高质量教育数据和代码,以及类似教材内容的合成数据训练而成,涵盖数学、编码和常识性推理等主题。


import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, pipelinetorch.random.manual_seed(0)model_name = "microsoft/Phi-3-mini-4k-instruct"
model = AutoModelForCausalLM.from_pretrained(model_name, device_map="cuda", torch_dtype="auto", trust_remote_code=True, 
tokenizer = AutoTokenizer.from_pretrained(model_name)pipe = pipeline("text-generation",model=model,tokenizer=tokenizer,
)generation_args = {"max_new_tokens": 1024,"return_full_text": False,"temperature": 0.0,"do_sample": False,

然后让Phi-3找到两个简单函数`sin(x) + ln(x)`的泰勒级数。

messages = [{"role": "user", "content": "What is the Taylor series expansion of sin(x) + ln(x)? about a point x=a"},
]output = pipe(messages, **generation_args)
 The Taylor series expansion of a function f(x) about a point x=a is given by:f(x) = f(a) + f'(a)(x-a) + f''(a)(x-a)^2/2! + f'''(a)(x-a)^3/3! +...For the function sin(x) + ln(x), we need to find the derivatives and evaluate them at x=a.First, let's find the derivatives of sin(x) and ln(x):1. sin(x):f(x) = sin(x)f'(x) = cos(x)f''(x) = -sin(x)f'''(x) = -cos(x)...2. ln(x):f(x) = ln(x)f'(x) = 1/xf''(x) = -1/x^2f'''(x) = 2/x^3...Now, let's evaluate these derivatives at x=a:1. sin(a):f(a) = sin(a)f'(a) = cos(a)f''(a) = -sin(a)f'''(a) = -cos(a)...2. ln(a):f(a) = ln(a)f'(a) = 1/af''(a) = -1/a^2f'''(a) = 2/a^3...Now, we can write the Taylor series expansion of sin(x) + ln(x) about x=a:sin(x) + ln(x) = (sin(a) + ln(a)) + (cos(a)(x-a) + (1/a)(x-a)) + (-sin(a)(x-a)^2/2! + (-1/a^2)(x-a)^2/2!) + (-cos(a)(x-a)^3/3! + (2/a^3)(x-a)^3/3!) +...This is the Taylor series expansion of sin(x) + ln(x) about x=a.

表现不错。接下来让Phi-3对一个稍微复杂一点的函数`sin(x) + 1/cos(x)`进行相同的操作。

messages = [{"role": "user", "content": "What is the Taylor series expansion of sin(x) + 1/cos(x) about a point x=a?"},
]output = pipe(messages, **generation_args)
 The Taylor series expansion of a function f(x) about a point x=a is given by:f(x) = f(a) + f'(a)(x-a) + f''(a)(x-a)^2/2! + f'''(a)(x-a)^3/3! +...First, let's find the Taylor series expansion of sin(x) and 1/cos(x) separately about x=a.For sin(x), the derivatives are:
sin'(x) = cos(x)
sin''(x) = -sin(x)
sin'''(x) = -cos(x)
sin''''(x) = sin(x)
...The Taylor series expansion of sin(x) about x=a is:
sin(x) = sin(a) + cos(a)(x-a) - sin(a)(x-a)^2/2! - cos(a)(x-a)^3/3! + sin(a)(x-a)^4/4! +...For 1/cos(x), the derivatives are:
(1/cos(x))' = sin(x)/cos^2(x)
(1/cos(x))'' = (cos(x) + sin^2(x))/cos^3(x)
(1/cos(x))''' = (-2cos(x)sin(x) + 3sin^2(x))/cos^4(x)
...The Taylor series expansion of 1/cos(x) about x=a is:
1/cos(x) = 1/cos(a) + (sin(a)/cos^2(a))(x-a) + (cos(a)(sin^2(a) - 1)/cos^3(a))(x-a)^2/2! + (2cos(a)(sin^3(a) - 3sin(a))/cos^4(a))(x-a)^3/3! +...Now, we can find the Taylor series expansion of sin(x) + 1/cos(x) by adding the two series:sin(x) + 1/cos(x) = (sin(a) + 1/cos(a)) + (cos(a) + sin(a)/cos^2(a))(x-a) - (sin(a)(x-a)^2/2! + 1/cos^3(a)(x-a)^2/2!) +...This is the Taylor series expansion of sin(x) + 1/cos(x) about x=a.

尽管Phi-3能够按照标准步骤找到每一项的导数并将每项的泰勒级数相加,但它未能正确找到`1/cos(x)`的高阶导数并在最后一步中正确相加。例如, 1/cos(x)的二阶导数应该是`(1 + sin^2(x))/cos^3(x)`而不是`(cos(x) + sin^2(x))/cos^3(x)`。这显示了LLM在解决问题方面的局限性,LLM本质上




DistilRoberta-financial-sentiment模型 是RoBERTa-base模型的轻量化、蒸馏版本,只有8200万个参数。由于其较小的规模,该模型的运行速度是RoBERTa-base模型的两倍。该模型在一个由5到8名人工注释员注释的财经新闻句子的极性情感数据集上进行了训练。


import torch
from transformers import AutoTokenizer, AutoModelForSequenceClassification, pipelinemodel_name = "mrm8488/distilroberta-finetuned-financial-news-sentiment-analysis"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name, num_labels=3, device_map="cuda")
sentiment_analyzer = pipeline("sentiment-analysis", model=model, tokenizer=tokenizer)sentences = ["there is a shortage of capital, and we need extra financing",  "growth is strong and we have plenty of liquidity", "there are doubts about our finances", "profits are flat"]for sentence in sentences:result = sentiment_analyzer(sentence)print(f"Input sentence: \"{sentence}\"")print(f"Sentiment: '{result[0]['label']}'\n Score: {round(result[0]['score'], 4)}\n")
Input sentence: "there is a shortage of capital, and we need extra financing"
Sentiment: 'negative'Score: 0.666Input sentence: "growth is strong and we have plenty of liquidity"
Sentiment: 'positive'Score: 0.9996Input sentence: "there are doubts about our finances"
Sentiment: 'neutral'Score: 0.6857Input sentence: "profits are flat"
Sentiment: 'neutral'Score: 0.9999



香港科技大学的研究人员在论文FinBERT: A Pretrained Language Model for Financial Communications中提出了FinBERT。它是一个基于BERT的模型,针对金融交流文本进行了预训练。训练数据包括三个金融交流语料库,总大小为49亿个标记。


from transformers import BertTokenizer, BertForSequenceClassification
from transformers import pipelinemodel_name = "yiyanghkust/finbert-tone"
model = BertForSequenceClassification.from_pretrained(model_name, num_labels=3, device_map="cuda")
tokenizer = BertTokenizer.from_pretrained(model_name)sentiment_analyzer = pipeline("sentiment-analysis", model=model, tokenizer=tokenizer)sentences = ["there is a shortage of capital, and we need extra financing",  "growth is strong and we have plenty of liquidity", "there are doubts about our finances", "profits are flat"]for sentence in sentences:result = sentiment_analyzer(sentence)print(f"Input sentence: \"{sentence}\"")print(f"Sentiment: '{result[0]['label']}'\n Score: {round(result[0]['score'], 4)}\n")
Input sentence: "there is a shortage of capital, and we need extra financing"
Sentiment: 'Negative'Score: 0.9966Input sentence: "growth is strong and we have plenty of liquidity"
Sentiment: 'Positive'Score: 1.0Input sentence: "there are doubts about our finances"
Sentiment: 'Negative'Score: 1.0Input sentence: "profits are flat"
Sentiment: 'Neutral'Score: 0.9889





BART来自Facebook,在论文<BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension>中被介绍。BART采用了一种基于Transformer的神经网络架构,包含一个去噪双向自动编码器和一个序列到序列的类似GPT的自回归解码器模型。BART的预训练分为两个步骤。它首先用任意噪声破坏训练文本数据。然后训练模型从破坏的文本中重建原始文本。这种方法在生成训练数据方面提供了巨大的灵活性,包括改变文本长度和词序。

BART基模型可以用于文本填充,但不适用于大多数关注的任务。BART在针对特定任务(如摘要生成)进行微调时表现出色。此示例使用一个使用CNN Daily Mail文档-摘要对数据集进行微调的BART版本,用于摘要生成任务。

from transformers import pipelinesummarizer = pipeline("summarization", model="facebook/bart-large-cnn", device="cuda")ARTICLE = """ New York (CNN)When Liana Barrientos was 23 years old, she got married in Westchester County, New York.
A year later, she got married again in Westchester County, but to a different man and without divorcing her first husband.
Only 18 days after that marriage, she got hitched yet again. Then, Barrientos declared "I do" five more times, sometimes only within two weeks of each other.
In 2010, she married once more, this time in the Bronx. In an application for a marriage license, she stated it was her "first and only" marriage.
Barrientos, now 39, is facing two criminal counts of "offering a false instrument for filing in the first degree," referring to her false statements on the
2010 marriage license application, according to court documents.
Prosecutors said the marriages were part of an immigration scam.
On Friday, she pleaded not guilty at State Supreme Court in the Bronx, according to her attorney, Christopher Wright, who declined to comment further.
After leaving court, Barrientos was arrested and charged with theft of service and criminal trespass for allegedly sneaking into the New York subway through an emergency exit, said Detective
Annette Markowski, a police spokeswoman. In total, Barrientos has been married 10 times, with nine of her marriages occurring between 1999 and 2002.
All occurred either in Westchester County, Long Island, New Jersey or the Bronx. She is believed to still be married to four men, and at one time, she was married to eight men at once, prosecutors say.
Prosecutors said the immigration scam involved some of her husbands, who filed for permanent residence status shortly after the marriages.
Any divorces happened only after such filings were approved. It was unclear whether any of the men will be prosecuted.
The case was referred to the Bronx District Attorney\'s Office by Immigration and Customs Enforcement and the Department of Homeland Security\'s
Investigation Division. Seven of the men are from so-called "red-flagged" countries, including Egypt, Turkey, Georgia, Pakistan and Mali.
Her eighth husband, Rashid Rajput, was deported in 2006 to his native Pakistan after an investigation by the Joint Terrorism Task Force.
If convicted, Barrientos faces up to four years in prison.  Her next court appearance is scheduled for May 18.
"""print(summarizer(ARTICLE, max_length=130, min_length=30, do_sample=False)[0]['summary_text'])
Liana Barrientos, 39, is charged with two counts of "offering a false instrument for filing in the first degree" In total, she has been married 10 times, with nine of her marriages occurring between 1999 and 2002. She is believed to still be married to four men.


另一个以摘要生成著称的LLM是Google的Pegasus。它在论文<PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization>中被介绍。Pegasus从训练文档中屏蔽关键句子,并训练模型生成这些缺失的句子。根据作者的说法,这种方法特别适合抽象摘要,因为它迫使模型理解整个文档的上下文。


from transformers import AutoTokenizer, PegasusForConditionalGenerationmodel_name = "google/pegasus-xsum"
model = PegasusForConditionalGeneration.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name)inputs = tokenizer(ARTICLE, max_length=1024, return_tensors="pt")
summary_ids = model.generate(inputs["input_ids"])print(tokenizer.batch_decode(summary_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False)[0])
A New York woman who has been married 10 times has been charged with marriage fraud.






import tqdm
import torch
from transformers import AutoTokenizer, AutoModelmodel_name = "facebook/contriever"tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModel.from_pretrained(model_name)query = ["Where was Marie Curie born?"]docs = ["Gabriel Lippmann, who supervised Marie Curie's doctoral research, was born in Bonnevoie, Luxembourg.","Marie Curie was born in Warsaw, in what was then the Kingdom of Poland, part of the Russian Empire","Maria Sklodowska, later known as Marie Curie, was born on November 7, 1867.","Born in Paris on 15 May 1859, Pierre Curie was the son of Eugène Curie, a doctor of French Catholic origin from Alsace."
]corpus = query + docs# Apply tokenizer
inputs = tokenizer(corpus, padding=True, truncation=True, return_tensors='pt')# Compute token embeddings
outputs = model(**inputs)# Mean pooling
def mean_pooling(token_embeddings, mask):token_embeddings = token_embeddings.masked_fill(~mask[..., None].bool(), 0.)sentence_embeddings = token_embeddings.sum(dim=1) / mask.sum(dim=1)[..., None]return sentence_embeddings
embeddings = mean_pooling(outputs[0], inputs['attention_mask'])score = [0]*len(docs)
for i in range(len(docs)):score[i] = (embeddings[0] @ embeddings[i+1]).item()print(score) 
[0.9390654563903809, 1.1304867267608643, 1.0473244190216064, 1.0094892978668213]


print("Most relevant document to the query \"", query[0], "\" is")
Most relevant document to the query " Where was Marie Curie born? " is
'Marie Curie was born in Warsaw, in what was then the Kingdom of Poland, part of the Russian Empire'



在这篇博客中,你学习了如何使用运行在AMD GPU上的ROCm实现多个流行的大语言模型,以轻松执行各种自然语言处理任务,如文本生成、摘要和数学问题解决。如果你有兴趣提高这些模型的性能,请查看关于微调Llama2和Starcoder的ROCm博客。


