本文翻译整理自:Prompting
https://llama.meta.com/docs/how-to-guides/prompting/
链接到笔记本,显示本节讨论的技术示例。
提示工程是自然语言处理(NLP)中使用的一种技术,通过向他们提供更多关于手头任务的上下文和信息来提高语言模型的性能。它涉及创建提示,这是为模型提供额外信息或指导的短文本片段,例如它将生成的文本的主题或流派。通过使用提示,模型可以更好地理解预期什么样的输出,并产生更准确和相关的结果。在Llama 2中,上下文的大小,就标记的数量而言,从2048到4096翻了一番。
制作有效的提示是提示工程的重要组成部分。以下是创建提示的一些技巧,这些技巧将有助于提高语言模型的性能:
详细、明确的指令比开放式提示产生更好的结果:您可以将明确的指令视为使用规则和限制来响应您的提示。
Explain this to me like a topic on a children's educational network show teaching elementary students. I'm a software engineer using large language models for summarization. Summarize the following text in under 250 words: Give your answer like an old timey private investigator hunting down a case step by step.
Use bullet points. Return as a JSON object. Use less technical terms and help me apply it in my work in communications.
Only use academic papers. Never give sources older than 2020. If you don't know the answer, say that you don't know.
以下是一个通过将响应限制在最近创建的源来提供明确指示以提供更具体的结果的示例:
Explain the latest advances in large language models to me. # More likely to cite sources from 2017 Explain the latest advances in large language models to me. Always cite your sources. Never cite sources older than 2020. # Gives more specific advances and only cites sources from 2020
镜头是您期望从大型语言模型中获得哪种类型的提示和响应的示例或演示。这个术语起源于在照片上训练计算机视觉模型,其中一个镜头是模型用来对图像进行分类的一个示例或实例。
像Meta Llama这样的大型语言模型能够遵循指令并产生响应,而无需预先看到任务示例。没有示例的提示称为“零镜头提示”。
Text: This was the best movie I've ever seen! The sentiment of the text is: Text: The director was trying too hard. The sentiment of the text is:
添加所需输出的具体示例通常会导致更准确、更一致的输出。这种技术称为“少镜头提示”。在本例中,生成的响应遵循我们所需的格式,该格式提供了一个更细致入微的情绪分类器,给出了积极、中性和消极的响应置信度百分比。
You are a sentiment classifier. For each message, give the percentage of positive/netural/negative. Here are some samples: Text: I liked it Sentiment: 70% positive 30% neutral 0% negative Text: It could be better Sentiment: 0% positive 50% neutral 50% negative Text: It's fine Sentiment: 25% positive 50% neutral 25% negative Text: I thought it was okay Text: I loved it! Text: Terrible service 0/10
根据被处理的人或实体的角色或观点创建提示。这种技术对于从语言模型生成更相关和更吸引人的响应很有用。
优点:
缺点:
示例:
You are a virtual tour guide currently walking the tourists Eiffel Tower on a night tour. Describe Eiffel Tower to your audience that covers its history, number of people visiting each year, amount of time it takes to do a full tour and why do so many people visit this place each year.
包括向语言模型提供一系列提示或问题,以帮助指导其思维并产生更连贯和相关的反应。这种技术有助于从语言模型中产生更深思熟虑和推理合理的反应。
优点:
缺点:
示例:
You are a virtual tour guide from 1901. You have tourists visiting Eiffel Tower. Describe Eiffel Tower to your audience. Begin with 1. Why it was built 2. Then by how long it took them to build 3. Where were the materials sourced to build 4. Number of people it took to build 5. End it with the number of people visiting the Eiffel tour annually in the 1900's, the amount of time it completes a full tour and why so many people visit this place each year. Make your tour funny by including 1 or 2 funny jokes at the end of the tour.
LLM是概率性的,因此即使使用思维链,单代也可能产生不正确的结果。自我一致性通过从多代中选择最频繁的答案来提高准确性(以更高的计算为代价):
John found that the average of 15 numbers is 40. If 10 is added to each number then the mean of the numbers is? Report the answer surrounded by three backticks, for example: ```123```
多次运行上述操作并获取答案最常见的返回值将使用自洽方法。
常见的事实通常可以从当今开箱即用的大型模型中获得(即仅使用模型权重)。虽然例如:
What is the capital of California? # The capital of California is Sacramento... What was the temperature in Menlo Park on December 12th, 2023? # I'm just an AI, I don't have have access to real-time or historical weather data...
检索增强生成,或RAG,描述了在提示中包含从外部数据库检索到的信息的做法。这是将事实合并到LLM应用程序中的有效方法,并且比微调更实惠,微调也可能对基础模型的功能产生负面影响。
这可以像查找表一样简单,也可以像包含公司所有知识的矢量数据库一样复杂:
Given the following information about temperatures in Menlo Park: 2023-12-11 : 52 degrees Fahrenheit 2023-12-12 : 51 degrees Fahrenheit 2023-12-13 : 55 degrees Fahrenheit What was the temperature in Menlo Park on 2023-12-12? # Sure! The temperature in Menlo Park on 2023-12-12 was 51 degrees Fahrenheit. What was the temperature in Menlo Park on 2023-07-18 ? # Sorry, I don't have information about the temperature in Menlo Park on 2023-07-18. The information provided only includes temperatures for December 11th, 12th, and 13th of 2023.
LLM天生不擅长执行计算。虽然LLM不擅长算术,但它们非常适合代码生成。程序辅助语言通过指示LLM编写代码来解决计算任务来利用这一事实。
Python code to calculate: ((-5 + 93 * 4 - 0) * (4^4 + -7 + 0 * 5)) Here is the Python code to calculate the given expression: ((-5 + 93 * 4 - 0) * (4**4 + -7 + 0 * 5)) # Calculate the expression inside the parentheses a = -5 + 93 * 4 - 0 b = 4**4 + -7 + 0 * 5 # Multiply a and b result = a * b # Print the result print(result)
直接使用代码可以提供正确的结果。
一个常见的挑战是生成没有无关令牌的响应(例如“当然!这里有更多关于…的信息”)。
通过结合角色、规则和限制、明确的指令和示例,可以提示模型生成所需的响应。
You are a robot that only outputs JSON. You reply in JSON format with the field 'zip_code'. Example question: What is the zip code of the Empire State Building? Example answer: {'zip_code': 10118} Now here is my question: What is the zip code of Menlo Park? # "{'zip_code': 94025}"
直接使用代码可以提供正确的结果。
Meta的 Responsible Use Guide 是了解如何最好地提示和解决语言模型的输入/输出风险的绝佳资源。请参阅第(14-17)页。
以下是语言模型如何产生幻觉的一些示例以及解决问题的一些策略:
示例1:
语言模型被要求对一个关于它没有训练过的主题的问题做出回答。语言模型可能会产生幻觉信息或编造不准确或没有证据支持的事实。
修复:要解决此问题,您可以向语言模型提供有关主题的更多上下文或信息,以帮助它理解所问的问题并生成更准确的响应。您还可以要求语言模型为其提出的任何主张提供来源或证据,以确保其回应基于事实信息。
示例2:
语言模型被要求生成对需要特定视角或观点的问题的响应。语言模型可能会产生幻觉信息或编造与期望的视角或观点不一致的事实。
修复:要解决此问题,您可以向语言模型提供有关所需视角或观点的其他信息,例如所处理的个人或实体的目标、价值观或信仰。这可以帮助语言模型理解上下文,并生成与所需视角或观点更一致的响应。
示例3:
语言模型被要求生成对需要特定语气或风格的问题的回答。语言模型可能会产生幻觉信息或编造与所需语气或风格不一致的事实。
修复:要解决此问题,您可以向语言模型提供有关所需语调或风格的其他信息,例如受众或交流目的。这可以帮助语言模型理解上下文,并生成更符合所需语调或风格的响应。
总的来说,避免语言模型出现幻觉的关键是为他们提供清晰准确的信息和上下文,并仔细监控他们的反应,以确保它们与您的期望和要求一致。
2024-07-16(二)