从零实现反思Agent

大家好，欢迎来到另一篇关于构建多智能体AI系统的精彩文章。在这篇文章中，我们将深入探讨如何从头开始用Python实现一个反思代理。

从我上一篇文章《多智能体系统中的多智能体架构》中，我提到了我们在多智能体系统中可以实现的不同模式。大多数这些模式已经被不同的包如LangGraph、LangChain以及其他现有的AI代理Python包预先构建好了。根据我在处理智能体AI系统方面的经验，有时你真的需要对代码有完全的控制权，或者只是想做一些现有包无法自由支持的事情。学习如何亲自动手并深入研究自己的代码是一项基本且至关重要的技能。

我想使用LangGraph来实现我们在这篇文章中讨论的所有现有模式。仔细想想，不，我希望为你提供构建自己自定义类和包的基本技能，无论是出于某种需求，还是为了支持这些包目前不支持的用例。这样你可以对自己的代码和实现有更精细的控制。

让我们开始吧！

1、什么是反思代理？

首先，我想让我们从基础层面理解我们要构建的东西。让我们回答“什么是反思代理？”这个问题。

反思模式是一种允许代理反思自身行为和决策的模式。它由两个主要组件组成：

生成代理：这个代理负责生成反思提示。
反思代理：这个代理负责反思代理的行为和决策。反思提示用于引导反思代理。

为了让这一点更加清楚，如果你一直跟着我的话，你知道我喜欢视觉化。让我们来看一下反思代理的可视化图示：

从上面的图像中可以看到，用户提示被发送到生成块，生成块生成响应。生成块可以是一个代理，这里我并没有特别将其称为代理，因为在LLM中我没有添加任何工具。回到正题，生成块生成的响应被发送到反思块（或代理，如果给LLM提供了工具）。

反思块反思（reflect）生成的响应并提供反馈。然后将此反馈再次发送回生成块。生成块接收反馈或批评，并根据需要进行必要的更新。这种更新随后被发送回反思代理，反思代理再次对其进行检查，生成反馈或批评。

这个循环会持续进行X次迭代，之后返回最终响应。这样，我们可以让AI系统反思自己的工作，并在某种程度上优化到接近完美的程度后再返回最终响应。这已被证明比单一提示合成能产生更好的结果。

2、预期内容

在这篇文章中，我的目标是创建一个数据分析师代理，我可以用来分析数据。我们将实现一个反思代理来帮助我们完成这项任务。这个代理应该能够生成自己的代码，反思它并根据需要做出改进和修正。

以下是几张图片：

没有反思：

有反思：

3、安装

为了实现这一点，我们将从一个新的文件夹开始，并随着我们的进展逐步添加不同的代码。我们首先开始安装必要的包和依赖项。

我们将首先创建一个虚拟环境，首先创建一个名为multi-agent-patterns-from-scratch的文件夹：

mkdir multi-agent-patterns-from-scratch

然后使用相同的文件夹名称创建虚拟环境，这是我需要环境的方式，以免忘记它们的名字。

conda create --name multi-agent-patterns-from-scratch

然后可以通过以下命令激活此虚拟环境：

conda activate multi-agent-patterns-from-scratch

为了节省时间，现在你已经看到了如何创建文件夹。按照相同的步骤创建这个文件夹结构，我们将在整个课程中使用它。

.  
└── agent-patterns  
    └── reflection_agent  
        └── notebooks  
            └── lesson_01.ipynb  

3 directories, 1 file

现在我们可以安装所有需要的包和依赖项：

pip install colorama openai pandas python-dotenv matplotlib rich

4、实现生成块

要开始实现反思代理，我们将从生成块开始。打开笔记本（lesson_01.ipynb），我们将在这里编写代码实现。

让我们先导入基本模块：

from openai import OpenAI  
from colorama import Fore, Style  
from IPython.display import display, Markdown  

import os  
from dotenv import load_dotenv  

load_dotenv()

对于接下来的代码，创建OpenAI模型时，你需要在.env文件中设置OpenAI API密钥，设置后重启笔记本并再次运行上述单元格。

OPENAI_API_KEY=xxxxxxxxxx

client = OpenAI(api_key=os.getenv("OPENAI_API_KEY"))

要测试模型设置，可以运行以下代码：

response = client.chat.completions.create(  
    model="gpt-4o",  
    messages=[{"role": "user", "content": "Hello, how are you?"}],  
)  

print(response.choices[0].message.content)

5、生成块提示

我想创建一个数据分析师，它可以帮我生成一些数据的代码。这是我要为生成块使用的提示：

generator_prompts = [  
    {  
        "role": "system",  
        "content": "擅长可视化数据的数据分析师。使用Python Pandas生成用于可视化用户数据的代码。 "  
        "当提供批评时，改进代码使其更高效和准确，并以基于批评改进后的代码重新响应。"  
    }  
]  

generator_prompts.append({  
    "role": "user",   
    "content": """Write me code to visualize this data in a bar chart. Here is the data: {'data': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10], 'column': ['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J']}"""  
})

让我们调用它并看看我们能得到什么：

generator_response = client.chat.completions.create(  
    model="gpt-4o",  
    messages=generator_prompts,  
)  

print(generator_response.choices[0].message.content)

我们可以使输出看起来更好，因为它实际上是Markdown文本：

agent_code = generator_response.choices[0].message.content  
display(Markdown(agent_code))

我复制了生成的代码并执行了它，请小心这样做。仔细阅读确保没有恶意代码：

6、实现反思块

既然我们现在能够生成反思块的代码，让我们继续生成反思块，改进就在其中。

反思块提示：

reflection_prompts = [  
    {  
        "role": "system",  
        "content": "您是一位资深的数据分析师。根据生成块生成的代码，向生成块提供反馈。 "  
        "反馈应以代码批评的形式呈现。批评应以可以改进代码的改进列表形式呈现。 "  
        "反馈应以可以改进代码的改进列表形式呈现。 "  
    }  
]  

reflection_prompts.append({  
    "role": "user",   
    "content": f"""Here is the code generated by the generator block: {agent_code}"""  
})

然后我们可以用这些提示调用LLM：

reflection_response = client.chat.completions.create(  
    model="gpt-4o",  
    messages=reflection_prompts,  
)  

reflection_feedback = reflection_response.choices[0].message.content  
display(Markdown(reflection_feedback))

7、将反馈传递给生成器

现在我们有了反馈，我们需要将其传递给生成器块。

generator_prompts.append({  
    "role": "user",   
    "content": f"""Here is the feedback from the reflection block: {reflection_feedback}"""  
})  
generator_response = client.chat.completions.create(  
    model="gpt-4o",  
    messages=generator_prompts,  
)  

improved_code = generator_response.choices[0].message.content  
display(Markdown(improved_code))

让我们在反思反馈后执行生成的代码：

这是得到的图表响应。我们可以看到标签得到了改进，颜色也改变了，还有很多其他改进。

好的部分是这只是这个流程的一次迭代：

你可以想象经过3-4次迭代后的输出。注意，更多的迭代并不总是意味着更好的输出。花点时间去实验。

8、自定义类实现

我们将实现自己的自定义类，以便在项目其他部分中使用这个反思代理。我们将在这里包含X步数。

以下是这个Python类的内容。我创建了以下项目结构：

.  
├── agent-patterns  
│   └── reflection_agent  
│       └── notebooks  
│           └── lesson_01.ipynb  
├── python_classes  
│   └── reflection_agent  
│       ├── __init__.py  
│       ├── main.py  
│       └── __pycache__  
│           ├── __init__.cpython-311.pyc  
│           └── main.cpython-311.pyc  
└── tests  
    ├── __init__.py  
    └── reflection_agent.py  

7 directories, 7 files

reflection_agent/main.py文件的内容如下：

import os  
from openai import OpenAI  
from dotenv import load_dotenv  
from colorama import Fore  
from rich.markdown import Markdown  
from rich.console import Console  

class ReflectionAgent:  
    def __init__(  
        self,  
        generator_prompts,  
        reflection_prompts,  
        api_key=None,  
        num_steps=1,  
        model="gpt-4o"  
    ):  
        """  
        generator_prompts: 列表字典，例如[{"role": "system", "content": "..." }]  
        reflection_prompts: 列表字典，例如[{"role": "system", "content": "..." }]  
        """  
        load_dotenv()  
        self.api_key = api_key or os.getenv("OPENAI_API_KEY")  
        self.client = OpenAI(api_key=self.api_key)  
        self.model = model  
        self.generator_prompts = list(generator_prompts)  
        self.reflection_prompts = list(reflection_prompts)  
        self.generated_code = None  
        self.reflection_feedback = None  
        self.improved_code = None  
        self.num_steps = num_steps  

    def generate_code(self, user_prompt):  
        self.generator_prompts.append({"role": "user", "content": user_prompt})  
        response = self.client.chat.completions.create(  
            model=self.model,  
            messages=self.generator_prompts,  
        )  
        self.generated_code = response.choices[0].message.content  
        return self.generated_code  

    def reflect_on_code(self):  
        self.reflection_prompts.append({  
            "role": "user",  
            "content": f"Here is the code generated by the generator block: {self.generated_code}"  
        })  
        response = self.client.chat.completions.create(  
            model=self.model,  
            messages=self.reflection_prompts,  
        )  
        self.reflection_feedback = response.choices[0].message.content  
        return self.reflection_feedback  

    def improve_code(self):  
        self.generator_prompts.append({  
            "role": "user",  
            "content": f"Here is the feedback from the reflection block: {self.reflection_feedback}"  
        })  
        response = self.client.chat.completions.create(  
            model=self.model,  
            messages=self.generator_prompts,  
        )  
        self.improved_code = response.choices[0].message.content  
        return self.improved_code  

    def display_markdown(self, content):  
        console = Console()  
        console.print(Markdown(content))  

    def run(self, user_prompt, display_steps=True):  
        """  
        运行代理指定次数的改进步骤。  

        参数:  
            user_prompt (str): 初始用户提示。  
            display_steps (bool): 是否显示每一步。  
            num_steps (int): 执行的改进迭代次数。  
        """  
        code = self.generate_code(user_prompt)  
        if display_steps:  
            print(Fore.CYAN + "生成的代码:")  
            print(Fore.RESET + code)  
        for step in range(self.num_steps):  
            feedback = self.reflect_on_code()  
            if display_steps:  
                print(Fore.YELLOW + f"反思反馈 (第 {step+1} 步):")  
                print(Fore.RESET + feedback)  
            improved = self.improve_code()  
            if display_steps:  
                print(Fore.GREEN + f"改进后的代码 (第 {step+1} 步):")  
                print(Fore.RESET + improved)  
            # 准备下一次迭代  
            self.generated_code = self.improved_code  
        return self.improved_code

reflection_agent/__init__.py文件的内容如下：

from .main import ReflectionAgent  

__all__ = ["ReflectionAgent"]

在tests/__init__.py文件中，代码如下：

from multi_agent_patterns.agent_patterns.reflection_agent import ReflectionAgent

在tests/reflection_agent.py文件中，代码如下：

import sys  
import os  
sys.path.append(os.path.abspath(os.path.join(os.path.dirname(__file__), '..')))  

from python_classes.reflection_agent import ReflectionAgent  

generator_prompts = [  
    {  
        "role": "system",  
        "content": "擅长可视化数据的数据分析师。使用Python Pandas生成用于可视化用户数据的代码。 "  
        "当提供批评时，改进代码使其更高效和准确，并以基于批评改进后的代码重新响应。"  
    }  
]  

reflection_prompts = [  
    {  
        "role": "system",  
        "content": "您是一位资深的数据分析师。根据生成块生成的代码，向生成块提供反馈。 "  
        "反馈应以代码批评的形式呈现。批评应以可以改进代码的改进列表形式呈现。 "  
        "反馈应以可以改进代码的改进列表形式呈现。 "  
    }  
]  

agent = ReflectionAgent(generator_prompts, reflection_prompts, num_steps=4)  
agent.run(  
    user_prompt="Write me code to visualize this data in a pie chart. Here is the data: {'data': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10], 'column': ['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J']}")

生成的输出如下：

然后我在笔记本中执行了代码，输出如下：

9、结束语

恭喜你坚持到了最后！希望你觉得这很有帮助，并学会了如何实现自己的反思代理。在下一篇文章中，我们将再次从头开始用Python实现另一个代理模式。

原文链接：Multi-agent System Design Patterns From Scratch In Python | Reflection Agents

汇智网翻译整理，转载请标明出处