⭐作者介绍:大二本科网络工程专业在读,持续学习Java,努力输出优质文章
⭐作者主页:@逐梦苍穹
⭐所属专栏:人工智能


输出解析器 output_parsers
语言模型输出文本。但很多时候,您可能希望获得比仅文本更结构化的信息。这就是输出解析器的作用。
输出解析器是帮助结构化语言模型响应的类。一个输出解析器必须实现两个主要方法:
“获取格式化指令”: 一个返回包含语言模型输出应如何格式化的字符串的方法。
“解析”: 一个接受字符串(假设为语言模型的响应)并将其解析为某种结构的方法。
然后再加一个可选的方法:
“带提示解析”: 一个接受字符串(假设为语言模型的响应)和提示(假设为生成此响应的提示)并将其解析为某种结构的方法。在需要从提示中获取信息以重试或修复输出的情况下,通常提供提示。

1、快速入门

下面我们来介绍主要类型的输出解析器,PydanticOutputParser。

from langchain.prompts import PromptTemplate, ChatPromptTemplate, HumanMessagePromptTemplate
from langchain.llms import OpenAI
from langchain.chat_models import ChatOpenAI

from langchain.output_parsers import PydanticOutputParser
from pydantic import BaseModel, Field, validator
from typing import List
model_name = 'text-davinci-003'
temperature = 0.0
model = OpenAI(model_name=model_name, temperature=temperature)
# Define your desired data structure.
class Joke(BaseModel):
    setup: str = Field(description="question to set up a joke")
    punchline: str = Field(description="answer to resolve the joke")
    
    # You can add custom validation logic easily with Pydantic.
    @validator('setup')
    def question_ends_with_question_mark(cls, field):
        if field[-1] != '?':
            raise ValueError("Badly formed question!")
        return field
# Set up a parser + inject instructions into the prompt template.
parser = PydanticOutputParser(pydantic_object=Joke)
prompt = PromptTemplate(
    template="Answer the user query.\n{format_instructions}\n{query}\n",
    input_variables=["query"],
    partial_variables={"format_instructions": parser.get_format_instructions()}
)
# And a query intented to prompt a language model to populate the data structure.
joke_query = "Tell me a joke."
_input = prompt.format_prompt(query=joke_query)
output = model(_input.to_string())
parser.parse(output)
Joke(setup='Why did the chicken cross the road?', punchline='To get to the other side!')

2、comma_separated

列表解析器 comma_separated:
您想返回一个逗号分隔项的列表时,可以使用此输出解析器。

from langchain.output_parsers import CommaSeparatedListOutputParser
from langchain.prompts import PromptTemplate, ChatPromptTemplate, HumanMessagePromptTemplate
from langchain.llms import OpenAI
from langchain.chat_models import ChatOpenAI
output_parser = CommaSeparatedListOutputParser()
format_instructions = output_parser.get_format_instructions()
prompt = PromptTemplate(
    template="List five {subject}.\n{format_instructions}",
    input_variables=["subject"],
    partial_variables={"format_instructions": format_instructions}
)
model = OpenAI(temperature=0)
_input = prompt.format(subject="ice cream flavors")
output = model(_input)
output_parser.parse(output)
['Vanilla',
     'Chocolate',
     'Strawberry',
     'Mint Chocolate Chip',
     'Cookies and Cream']

3、output_fixing_parser

自动修复解析器 output_fixing_parser:
此输出解析器包装另一个输出解析器,如果第一个解析器失败,则调用另一个LLM来修复任何错误。
但除了抛出错误之外,我们还可以做其他事情。具体来说,我们可以将格式不正确的输出和格式化的指令一起传递给模型,并要求它修复它。
对于这个例子,我们将使用上面的 Pydantic 输出解析器。如果我们传递给它一个不符合模式的结果,会发生以下情况:

from langchain.prompts import PromptTemplate, ChatPromptTemplate, HumanMessagePromptTemplate
from langchain.llms import OpenAI
from langchain.chat_models import ChatOpenAI
from langchain.output_parsers import PydanticOutputParser
from pydantic import BaseModel, Field, validator
from typing import List
class Actor(BaseModel):
    name: str = Field(description="name of an actor")
    film_names: List[str] = Field(description="list of names of films they starred in")
        
actor_query = "Generate the filmography for a random actor."

parser = PydanticOutputParser(pydantic_object=Actor)
misformatted = "{'name': 'Tom Hanks', 'film_names': ['Forrest Gump']}"
parser.parse(misformatted)
---------------------------------------------------------------------------

    JSONDecodeError                           Traceback (most recent call last)

    File ~/workplace/langchain/langchain/output_parsers/pydantic.py:23, in PydanticOutputParser.parse(self, text)
         22     json_str = match.group()
    ---> 23 json_object = json.loads(json_str)
         24 return self.pydantic_object.parse_obj(json_object)


    File ~/.pyenv/versions/3.9.1/lib/python3.9/json/__init__.py:346, in loads(s, cls, object_hook, parse_float, parse_int, parse_constant, object_pairs_hook, **kw)
        343 if (cls is None and object_hook is None and
        344         parse_int is None and parse_float is None and
        345         parse_constant is None and object_pairs_hook is None and not kw):
    --> 346     return _default_decoder.decode(s)
        347 if cls is None:


    File ~/.pyenv/versions/3.9.1/lib/python3.9/json/decoder.py:337, in JSONDecoder.decode(self, s, _w)
        333 """Return the Python representation of ``s`` (a ``str`` instance
        334 containing a JSON document).
        335 
        336 """
    --> 337 obj, end = self.raw_decode(s, idx=_w(s, 0).end())
        338 end = _w(s, end).end()


    File ~/.pyenv/versions/3.9.1/lib/python3.9/json/decoder.py:353, in JSONDecoder.raw_decode(self, s, idx)
        352 try:
    --> 353     obj, end = self.scan_once(s, idx)
        354 except StopIteration as err:


    JSONDecodeError: Expecting property name enclosed in double quotes: line 1 column 2 (char 1)

    
    During handling of the above exception, another exception occurred:


    OutputParserException                     Traceback (most recent call last)

    Cell In[6], line 1
    ----> 1 parser.parse(misformatted)


    File ~/workplace/langchain/langchain/output_parsers/pydantic.py:29, in PydanticOutputParser.parse(self, text)
         27 name = self.pydantic_object.__name__
         28 msg = f"Failed to parse {name} from completion {text}. Got: {e}"
    ---> 29 raise OutputParserException(msg)


    OutputParserException: Failed to parse Actor from completion {'name': 'Tom Hanks', 'film_names': ['Forrest Gump']}. Got: Expecting property name enclosed in double quotes: line 1 column 2 (char 1)

现在我们可以构建并使用一个 OutputFixingParser。这个输出解析器接受另一个输出解析器作为参数,还有一个 LLM,用来尝试纠正任何格式错误。

from langchain.output_parsers import OutputFixingParser

new_parser = OutputFixingParser.from_llm(parser=parser, llm=ChatOpenAI())
new_parser.parse(misformatted)
Actor(name='Tom Hanks', film_names=['Forrest Gump'])

4、structured

结构化输出解析器 structured:
当您想要返回多个字段时,可以使用此输出解析器。尽管Pydantic/JSON解析器更强大,但我们最初尝试的数据结构仅具有文本字段。

from langchain.output_parsers import StructuredOutputParser, ResponseSchema
from langchain.prompts import PromptTemplate, ChatPromptTemplate, HumanMessagePromptTemplate
from langchain.llms import OpenAI
from langchain.chat_models import ChatOpenAI

这里我们定义了我们想要接收的响应模式。

response_schemas = [
    ResponseSchema(name="answer", description="answer to the user's question"),
    ResponseSchema(name="source", description="source used to answer the user's question, should be a website.")
]
output_parser = StructuredOutputParser.from_response_schemas(response_schemas)

我们现在获得一个包含响应格式化指令的字符串,然后将其插入到我们的提示中。

format_instructions = output_parser.get_format_instructions()
prompt = PromptTemplate(
    template="answer the users question as best as possible.\n{format_instructions}\n{question}",
    input_variables=["question"],
    partial_variables={"format_instructions": format_instructions}
)

我们现在可以使用这个来格式化一个提示,发送给语言模型,然后解析返回的结果。

model = OpenAI(temperature=0)
_input = prompt.format_prompt(question="what's the capital of france?")
output = model(_input.to_string())
output_parser.parse(output)
{'answer': 'Paris',
     'source': 'https://www.worldatlas.com/articles/what-is-the-capital-of-france.html'}

这里是一个在聊天模型中使用它的例子

chat_model = ChatOpenAI(temperature=0)
prompt = ChatPromptTemplate(
    messages=[
        HumanMessagePromptTemplate.from_template("answer the users question as best as possible.\n{format_instructions}\n{question}")  
    ],
    input_variables=["question"],
    partial_variables={"format_instructions": format_instructions}
)
_input = prompt.format_prompt(question="what's the capital of france?")
output = chat_model(_input.to_messages())
output_parser.parse(output.content)
{'answer': 'Paris', 'source': 'https://en.wikipedia.org/wiki/Paris'}
Logo

鸿蒙生态一站式服务平台。

更多推荐