AGENT

用Cloudflare构建邮件智能体

使用 Cloudflare Workers 和 Durable Objects 探索代理架构。

admin

Apr 28, 2026 • 28 min read

微信 ezpoda免费咨询：AI编程 | AI模型微调| AI私有化部署
AI模型价格对比 | AI工具导航 | ONNX模型库 | Tripo 3D | Meshy AI | ElevenLabs | KlingAI | ArtSpace | Phot.AI | InVideo

正如我在上一篇文章中所论述的，我相信软件的未来在于代理优先（agent-first）系统。大约一个月前的假期里，我终于有时间探索围绕 LLM 从第一性原则出发构建软件意味着什么。大约在同一时间，我面对着一堆积压的邮件，需要进行繁琐的分类、优先级排序、回复、日程安排等工作。

这促使我构建了一个AI 助手，它可以代替我自主地处理邮件。这是一个很好的代理式（agentic）应用示例，其中 AI Agent 代表用户自主运行。尽管电子邮件存在各种持续的功能不足，但它仍然是最具韧性的异步人类（以及未来的机器）通信协议。它是一个定义明确的协议，具有清晰的输入（传入消息）和输出（响应、调度操作），使其成为理想的测试平台。助手必须在遵循邮件作为媒介的约束条件的同时，处理人类通信的复杂性、上下文保持和任务执行。

这将达到两个目的：减少我的邮件负担，同时探索代理优先系统的设计空间。这也感觉风险适中，因为我不依赖这个系统处理任何关键任务，但它会被投入实际使用并暴露在真实世界中。我仍然担心 AI 助手可能出现的越狱或幻觉问题。我将在文章后面探讨其中一些挑战。但是，观察系统如何失败以及在部署到真实环境中会出现什么边缘情况，本来就是这项实验的一部分。

源代码可在 GitHub meleksomai/os 获取，你可以通过发送邮件到 hello@somai.me 来与 AI 助手互动。

1、设计原则

从第一性原则出发，需要定义明确的需求来指导这个 AI 助手的架构和实现。我的目标是构建一个能够自主管理邮件的 AI 助手——提供上下文感知的回复、安排会议，并在无需人工干预的情况下处理日常咨询。为此，AI 助手必须：

拦截并理解邮件：能够接收发送到特定邮箱（例如 hello@somai.me）的邮件。AI 助手应该能够解析邮件内容、提取相关信息，并理解对话的上下文。
按联系人维护上下文：以安全地调整其行为。助手会记住与每个联系人的先前交互，以便为未来的回复提供参考。这也意味着每个联系人有独立的记忆/上下文，不同联系人之间没有上下文交叉污染。在技术上，这意味着助手必须能够为每个联系人维护独立的状态/记忆。
连接工具和服务：代表我执行操作，例如安排会议、发送跟进邮件或从外部来源检索信息。模型上下文协议（MCP）可用于将 LLM 与外部工具和服务连接。
从个人反馈中学习：并不断改进。助手应根据我的个人反馈调整其回复，完善其对偏好和沟通风格的理解。这需要一个反馈循环，我可以向助手提供纠正或建议，它可以将这些反馈融入未来的行为中。
安全且私密：因为邮件通常包含敏感信息。助手必须确保所有数据得到安全处理，不向未经授权的方暴露敏感信息。

2、平台：Cloudflare Workers + Durable Objects

Cloudflare 已经成为传统云服务商的有力替代方案，提供接近 AWS 级别的基础设施，同时开发者体验更接近 Vercel。Cloudflare 拥有坚实的无服务器平台，包括 Cloudflare Workers、Cloudflare Email Routing、Cloudflare KV，以及最近的 Cloudflare Durable Objects。2

Durable Objects 服务是自 AWS Lambda 以来无服务器计算的重大创新。Durable Objects 让你不再从微服务和分布式系统的角度思考扩展，而是从有状态实例的角度思考扩展。每个实例都是一个独立的计算单元，拥有自己的内存、存储和生命周期。扩展的基本单位不再是无状态函数，而是可以随时间维护自身上下文的有状态对象。这非常适合构建代理优先系统，其中每个代理可以是一个拥有自身状态和行为的 Durable Objects 实例。Durable Objects 增加了诸如调度（使每个实例拥有自己的生命周期）、WebSocket（用于流式传输和长连接，非常适合代理）以及单线程（极大地简化了我们对并发和竞态条件的思考）等概念。3

代理需要为持续性而非临时性而设计的基础设施。

根据我的经验，在传统无服务器平台上构建代理优先系统充满了复杂性。无服务器假设临时性——函数执行、返回结果、然后消失。如果需要状态，它存储在数据库或外部存储其他地方。这对于请求-响应架构运作良好，但在将无服务器范式应用于代理计算时过于复杂。随着代理变得更加复杂和有状态，这种情况只会变得更糟。4

3、架构

架构很简单：邮件由 Cloudflare Email Routing 接收，转发到 Cloudflare Worker，后者将邮件路由到特定的代理（Durable Objects）实例。代理处理邮件、运行 LLM、更新其状态，并可选择发送回复。

我将重点关注架构中最有趣的部分。你可以在 GitHub meleksomai/os 上查看源代码。

4、Cloudflare Worker

Cloudflare Worker 是处理传入邮件的入口点。它使用 Agents SDK 的辅助函数 routeAgentEmail 将邮件路由到相应的代理实例。

import { routeAgentEmail } from "agents";
 
export default {
  async email(message: ForwardableEmailMessage, env: Env) {
    await routeAgentEmail(message, env, {
      // ... logic to resolve the correct agent instance ...
    });
  },
};

当收到邮件时，路由逻辑应提取发件人的邮箱地址，并使用它来确定正确的代理实例。如果该联系人不存在实例，则创建一个新实例。然而，邮件线程使事情变得复杂。如果我正在与某人（someone@example.com）进行持续的对话，并且我回复了他们的邮件，我希望邮件被路由到处理我与该人对话的同一个代理实例，而不管发件人是谁。邮件头部解决了这个问题。邮件有一组可用于识别线程的头部。最相关的头部是 Message-ID、In-Reply-To 和 References。5 这些头部被邮件客户端用来将邮件分组为线程和对话堆栈。

下面是一个简单的函数，使用 References 和 In-Reply-To 头部从邮件头部提取根线程 ID。

/**
 * Extract the root thread ID from email headers
 *
 * Priority:
 * 1. If has References, use the first message ID in the References chain
 * 2. If this is a reply (has In-Reply-To), use In-Reply-To as thread ID
 * 3. Otherwise, use the current Message-ID (new thread)
 */
export function extractThreadId(email: ForwardableEmailMessage): string | null {
  const messageId = email.headers.get("Message-ID");
  const inReplyTo = email.headers.get("In-Reply-To");
  const references = email.headers.get("References");
 
  if (references) {
    // References contains space-separated message IDs, oldest first
    const refList = references.split(/[\s,]+/).filter((r) => r.trim());
    if (refList.length > 0) {
      return refList[0] || null; // Return the root message ID
    }
  }
 
  if (inReplyTo) {
    return inReplyTo;
  }
 
  // This is a new thread, use the current Message-ID
  return messageId?.toLocaleLowerCase() || null;
}

基于线程的路由实现了另一个强大的模式：我可以从个人邮箱地址回复邮件线程，仅发送给我的 AI 助手（hello@somai.me）。这样，我可以与 AI 助手就正在进行的线程进行私密对话，而不将其内容暴露给其他参与者。这是一个非常强大的模式，允许我就正在进行的邮件线程与 AI 助手进行私密对话，可用于调整其行为或提供额外的上下文。

Cloudflare 在其 Agents SDK 中提供了一些开箱即用的邮件路由解析器，如 createCatchAllEmailResolver / createAddressBasedEmailResolver / createHeaderBasedEmailResolver。然而，它们都不适合我的用例。我需要一种能够基于邮件线程和联系人进行路由的自定义路由逻辑。

4.1 自定义路由解析器

由于我们能够从邮件头部提取线程 ID，我们可以用它来维护线程 ID 和代理实例 ID 之间的映射。我创建了一个自定义邮件解析器 createThreadBasedEmailResolver，使用 Cloudflare KV 存储来维护此映射。当收到新邮件时，路由逻辑检查发件人的邮箱地址和邮件头部中的线程标识符。如果在 KV 存储中找到匹配项，邮件将被路由到相应的代理实例。如果未找到匹配项，则创建新的代理实例，并将线程标识符存储在 KV 存储中。

import type { EmailResolver } from "agents";
 
/**
 * Thread-Based Email Resolver
 *
 * Routes emails based on conversation threads, ensuring all emails in a thread
 * route to the same Durable Objects instance regardless of sender.
 */
export function createThreadBasedEmailResolver<Env>(
  agentName: string,
  store: KVNamespace
): EmailResolver<Env> {
  return async (email: ForwardableEmailMessage, env: Env) => {
    // Determine the thread ID (use first message in thread)
    const state = await evaluateState(email, store);
    switch (state.type) {
      case "NEW_THREAD":
        // Map new thread ID to external person's email
        await store.put(state.threadId!, state.instanceId, {
          expirationTtl: 60 * 60 * 24 * 90, // 90 days
        });
        return { agentName, agentId: state.instanceId };
      case "EXISTING_THREAD":
        // Route to existing mapped instance
        return { agentName, agentId: state.instanceId };
      case "NO_THREAD":
        // No thread ID, route based on sender (could be owner or external)
        return { agentName, agentId: email.from.toLocaleLowerCase() };
      default:
        throw new Error("Unhandled email state");
    }
  };
}
 
// More logic here

4.2 Cloudflare KV Store

KV 存储用于维护线程 ID 和代理实例 ID 之间的映射。KV 存储通过 wrangler.jsonc 配置文件附加到处理邮件路由的 Worker。

{
  "$schema": "node_modules/wrangler/config-schema.json",
  "name": "emailbot",
  "compatibility_date": "2025-12-23",
  "compatibility_flags": ["nodejs_compat"],
  // ... other configurations ...
  "kv_namespaces": [
    {
      "binding": "EMAIL_LOOKUP_KV",
      "id": "your-kv-namespace-id"
    }
  ]
}

唯一缺少的是更新路由逻辑以使用这个自定义解析器。

import { routeAgentEmail } from "agents";
import { HelloEmailAgent } from "./agent";
import { createThreadBasedEmailResolver } from "./resolvers";
 
export default {
  async email(message: ForwardableEmailMessage, env: Env) {
    await routeAgentEmail(message, env, {
      resolver: createThreadBasedEmailResolver(HelloEmailAgent.name, env.EMAIL_LOOKUP_KV),
    });
  }
};

5、AI 代理

现在我们有了路由逻辑，可以专注于 AI 代理本身。Cloudflare Agents SDK 提供了一个基础的 Agent 类，我们可以扩展它来实现我们的逻辑。它本质上是 Durable Objects 的包装器，为构建代理系统提供了有用的抽象。

5.1 记忆

每个 Durable Objects 实例都有自己的记忆：一个简单的 SQLite 数据库。这种架构使记忆管理变得简单明了。每个代理实例可以在自己的数据库中存储自己的状态。因此记忆模式很简单，因为我们不必担心多租户或不同联系人之间的上下文交叉污染。

对于我的用例，记忆模式非常直接。每个代理实例维护与联系人交换的消息（邮件）列表、总结对话的上下文字符串，以及联系人偏好和行为的摘要。

export type Memory = {
  lastUpdated: Date | null;
  messages: Message[];
  context: string;
  summary: string;
};

在初始化代理实例时，我们设置初始记忆状态。我们还定义了方法来更新记忆状态，当代理处理传入邮件并生成应持久化的响应时。

import { Agent, type AgentEmail } from "agents";
import type { Memory } from "./types";
 
export class HelloEmailAgent extends Agent<Env, Memory> {
  initialState: Memory = {
    lastUpdated: null,
    messages: [],
    context: "",
    summary: "",
    contact: null,
  };
 
  /**
   * Apply state updates atomically
   * Single source of truth - only this method mutates state
   */
  private applyUpdates(updates: Partial<Memory> | undefined): void {
    if (!updates) return;
    this.setState({ ...this.state, ...updates });
    log.debug("[agent] state updated", { keys: Object.keys(updates) });
  }
 
  //....
}

5.2 路由逻辑

在入口点，代理接收邮件并确定它是来自我（所有者）还是来自外部联系人。然后代理将邮件路由到相应的工作流：所有者工作流或外部联系人工作流。这种分离隔离了处理来自我的邮件与来自其他人的邮件的逻辑。

我们可以称之为 AI 代理的防御性编程——这正是它的本质。如果使用确定性路由将处理来自所有者的邮件与外部联系人的逻辑分开，就越不容易出现越狱和意外行为。

import { Agent, type AgentEmail } from "agents";
import type { Memory } from "./types";
 
export class HelloEmailAgent extends Agent<Env, Memory> {
  initialState: Memory = {
    lastUpdated: null,
    messages: [],
    context: "",
    summary: "",
    contact: null,
  };
 
  /**
   * Main entry point for handling incoming emails
   */
  async _onEmail(email: AgentEmail): Promise<void> {
    const from = email.from.toLowerCase();
    const owner = this.env.EMAIL_ROUTING_DESTINATION.toLowerCase();
    const routing = this.env.EMAIL_ROUTING_ADDRESS.toLowerCase();
    const subject = email.headers.get("Subject") || "(no subject)";
 
    if (from === routing) {
      return;
    }
 
    const route = from === owner ? "owner" : "external";
 
    if (from === owner) {
      // Email from owner - Loop Agent
      await this.handleOwnerEmail(email);
    } else {
 
      // Email from external sender(s) - Workflow Agent
      await this.handleIncomingEmail(email);
    }
 
    // ...
  }
}

5.3 Workflow 代理 vs Loop 代理

为了处理邮件，我使用了两种不同的设计模式。

Loop 代理：第一种是使用 Loop 代理来处理来自我（所有者）的邮件。Loop 代理模式适用于编排动态工作流，其中代理需要迭代一组任务直到满足某个条件。Loop 代理对其可执行的工具列表中的步骤拥有完全的自由，可以根据上下文和反馈决定何时停止。因此，Loop 代理是构建复杂工作流最灵活、最强大的模式。然而，如果设计不当，它也更不确定，更容易出现意外行为。
Workflow 代理：第二种是使用更结构化的逐步工作流。这改编自 Anthropic 的构建有效代理指南，该指南强调为代理定义明确的步骤。Workflow 代理更确定、更容易推理，因为每个步骤都是明确定义的。然而，它也更不灵活，可能无法像 Loop 代理那样有效地处理动态工作流。

这两种模式具有相同的接口，可以互换使用。选择哪种模式取决于工作流的具体用例和需求。每种代理模式都实现相同的 AgentExecutor 接口，可以以相同的方式调用。这允许在不同场景中灵活选择适当的模式。

import { Memory } from "../types";
 
/**
 * Result from any agent execution
 * Agents return what they did and what state updates they propose
 */
export interface AgentResult<T = unknown> {
  /** Output from the agent (can be text, structured data, etc.) */
  output: T;
  /** Proposed state updates (parent decides whether to apply) */
  stateUpdates?: Partial<Memory>;
}
 
/**
 * Tool result that may include state updates
 */
export interface ToolResult<T = unknown> {
  data: T;
  stateUpdates?: Partial<Memory>;
}
 
/**
 * AgentExecutor interface for consistent agent pattern
 * Agents are factories that return an executor with this interface
 */
export interface AgentExecutor<TInput, TOutput> {
  execute(input: TInput): Promise<AgentResult<TOutput>>;
}

5.4 用于审查传入邮件的 Workflow 代理

对于来自外部联系人的传入邮件，我使用了一个Workflow 代理，它遵循一组定义的步骤。有些步骤是 AI 驱动的（例如，分类邮件、生成草稿），而其他是确定性的（例如，发送邮件、更新上下文）。

你应该将 LLM 视为可用于执行特定任务的随机函数（例如，分类、生成），而不是试图做一切的单体代理。这种模块化方法可以更好地控制代理的行为，并降低意外操作的风险。由于工具调用具有确定性的接口（输入、输出），我们可以将 LLM 驱动的工具组合成更大的工作流，具有某种程度上可预测的行为。6

import { getEmailTools } from "../tools";
import type { Memory } from "../types";
import { log } from "../utils/logger";
import type { AgentExecutor, AgentResult } from "./agent";
import { WorkflowAgent } from "./workflow-agent";
 
export interface ReplyAgentOutput {
  action: "replied" | "skipped";
  emailId?: string;
}
 
/**
 * Creates a reply contact workflow agent
 *
 * This workflow:
 * 1. Classifies the incoming email
 * 2. If action is "reply": generates a draft and sends it
 * 3. Returns the result with action taken
 *
 * @param env - Environment bindings
 * @param state - Current agent memory state
 */
export const createReplyContactAgent = (
  env: Env,
  state: Memory
): AgentExecutor<void, ReplyAgentOutput> =>
  new WorkflowAgent({
    tools: getEmailTools(env, state),
    run: async ({ executeTool }): Promise<AgentResult<ReplyAgentOutput>> => {
      // Step 1: Classify
      const classifyResult = await executeTool("classifyEmail", { state });
      const classification = classifyResult.data;
 
      // Step 2: Decide
      if (classification.action !== "reply") {
        log.info("[reply-workflow] decision", {
          action: classification.action,
          reason: "no reply needed",
        });
        return { output: { action: "skipped" } };
      }
 
      log.info("[reply-workflow] decision", { action: "reply" });
 
      // Step 3: Draft
      const draftResult = await executeTool("generateReplyDraft", { state });
      const draft = draftResult.data;
      const originalEmail = state.messages.at(-1);
 
      if (!originalEmail) {
        log.error("[reply-workflow] error", {
          error: "no message to reply to",
        });
        throw new Error("No message to reply to");
      }
 
      // Step 4: Send (addresses resolved from state/env automatically)
      const sendResult = await executeTool("sendEmail", {
        recipient: "contact",
        subject: originalEmail.subject,
        content: draft,
      });
 
      return {
        output: {
          action: "replied",
          emailId: sendResult.data.id ?? undefined,
        },
        stateUpdates: {
          lastUpdated: new Date().toISOString(),
        },
      };
    },
  });

5.5 用于所有者邮件的 LLM 驱动 Loop 代理

对于来自我（所有者）的邮件，我使用了一个Loop 代理，它可以处理更动态的工作流。Loop 代理可以迭代一组任务直到满足某个条件。这对于处理可能需要多个步骤或迭代才能解决的邮件很有用。Loop 代理不是定义固定的工作流，而是可以根据上下文和收到的反馈决定采取什么行动。它使用 LLM 来确定下一步要采取的行动，从而提供更大的灵活性和适应性。你可以将其视为一个更自由形式的代理，可以处理复杂的交互。

Loop 代理的实现使用了 Vercel AI SDK。Loop 代理定义了一组可能的操作（工具），LLM 根据当前上下文决定采取哪个操作。

因此，逻辑是通过编写系统提示来处理的，该提示指导 LLM 如何行为以及采取什么行动。你可以将其视为使用纯英文来编程代理的行为。系统提示定义了代理的角色、可用操作、决策框架和应遵循的重要指南。

import { ToolLoopAgent } from "ai";
import type { Memory } from "../types";
import type { AgentExecutor, AgentResult } from "./agent";
 
/**
 * System prompt for the owner response agent
 */
const SYSTEM_PROMPT = `You are an AI assistant helping manage emails for Melek Somai. You are processing an email FROM the owner (Melek).
 
## Your Role
 
Analyze the owner's email and determine what actions to take. The owner may be:
1. **Replying to someone (CC'ing you)** - Learn from how they responded
2. **Sending you direct instructions** - Update your context/knowledge base
3. **Forwarding an email for you to handle** - Send a reply on their behalf
 
## Available Actions
 
- **updateContext**: Store important information for future reference
- **generateReplyDraft**: Draft an email response based on context (use this before sendEmail to compose thoughtful replies)
- **sendEmail**: Send an email. Choose recipient:
  - "contact" - reply to the external person
  - "owner" - notify Melek
  - "both" - reply to contact AND cc Melek
 
## Decision Framework
 
### If the owner CC'd you on a reply:
- Extract preferences or patterns from how they responded
- Update context if useful
- Do NOT send any emails
 
### If the owner sent you direct instructions:
- Update context with new preferences
 
### If the owner forwards an email to you:
- This is implicit delegation - the owner wants you to handle it
- Use generateReplyDraft to compose a thoughtful response
- Then use sendEmail with recipient "contact" to send it
- Unless it involves commitments, money, or sensitive matters
- Update context with any relevant information
 
### If the owner is asking for help with scheduling:
- Consider their known availability (from context)
- Use generateReplyDraft to compose a response with proposed times
- Then use sendEmail with recipient "contact" to send it
- Unless it involves firm commitments
 
## Important Guidelines
 
- Always update context when you learn something new about preferences
- Forwarded emails = delegation to act (send to contact)
- CC'd emails = observation only (learn, don't act)
- When in doubt about sensitive matters, don't send
`;
 
export interface OwnerAgentInput {
  prompt: string;
}
 
export interface OwnerAgentOutput {
  text: string;
}
 
/**
 * Creates an owner response agent that returns state updates in result
 *
 * @param env - Environment bindings
 * @param state - Current memory state (used for address resolution)
 */
export const createOwnerResponseAgent = async (
  env: Env,
  state: Memory
): Promise<AgentExecutor<OwnerAgentInput, OwnerAgentOutput>> => {
  log.debug("[owner-agent] creating", { contact: state.contact });
 
  const model = await retrieveModel(env);
 
  // Accumulator for state updates from tools
  const stateUpdates: Partial<Memory> = {};
 
  // Wrap tools to capture state updates
  const tools = wrapToolsWithStateCapture(
    {
      ...getContextTools(env, state),
      ...getEmailTools(env, state),
    },
    stateUpdates
  );
 
  const agent = new ToolLoopAgent({
    model,
    instructions: SYSTEM_PROMPT,
    tools,
  });
 
  return {
    execute: async (input): Promise<AgentResult<OwnerAgentOutput>> => {
      const { text } = await agent.generate({ prompt: input.prompt });
      return {
        output: { text },
        stateUpdates:
          Object.keys(stateUpdates).length > 0 ? stateUpdates : undefined,
      };
    },
  };
};

6、工具

在 Workflow 和 Loop 代理中，工具用于执行特定任务。你可以将工具视为代理可以调用以执行操作的函数。它们可以是确定性函数（例如，发送邮件、更新上下文）或 AI 驱动函数（例如，分类邮件、生成草稿）。每个工具都有定义的输入和输出接口，允许代理根据需要调用它们。Vercel AI SDK 提供了一个非常强大且易于使用的框架来构建代理可用的工具。

AI 代理使用一组工具来执行特定任务。这些工具包括：

邮件分类工具：将传入邮件分类到不同类别（例如，咨询、投诉、跟进），以确定适当的回复策略。
邮件起草工具：根据邮件内容和上下文生成草稿回复。
上下文更新工具：根据传入邮件中的新信息更新代理的记忆。
邮件发送工具：代表代理发送邮件。

Cloudflare 邮件路由目前不支持直接从 Durable Objects 发送多封邮件。因此，我使用 Resend 作为外部邮件发送服务来从代理发送邮件。这是一个临时解决方案，直到 Cloudflare 添加直接从 Durable Objects 发送邮件的支持。

6.1 邮件分类工具

我想详细介绍其中一个 AI 驱动的工具：邮件分类工具。该工具负责将传入邮件分类到不同类别，以确定适当的回复策略。

虽然分类由 LLM 处理，但该工具使用 Zod 验证强制输出符合严格模式。这确保代理接收到可用于决策的结构化和可预测数据。语言模型擅长生成结构化数据，使用 Zod 有助于确保输出在运行时有效并符合预期模式。我使用的一个技巧是将 Zod 模式定义包含在系统提示中，这样 LLM 就知道要遵循的确切结构。

邮件分类工具有三个主要部分：

输入模式（第 6-10 行）：定义工具的预期输入结构。在这种情况下，它期望当前的代理记忆状态，包括消息和上下文。
输出模式（第 12-31 行）：定义工具的预期输出结构。输出包括分类的意图、风险级别、建议操作、是否需要批准以及注释。
执行逻辑（第 58-65 行）：工具的核心逻辑，使用 LLM 根据输入状态对邮件进行分类。它构建一个包含邮件内容、历史消息和上下文的提示，然后调用 LLM 生成分类。

export const classifyEmailTool = (env: Env) =>
  tool({
    description:
      "Classify an email to determine its intent, risk level, and recommended action. Use this to triage incoming emails.",
    
    inputSchema: z.object({
      state: MemorySchema.describe(
        "The current agent memory state with messages and context"
      ),
    }),
    
    outputSchema: z.object({
      intents: z
        .array(
          z.enum([
            "scheduling",
            "information_request",
            "action_request",
            "introduction_networking",
            "sales_vendor",
            "fyi_notification",
            "sensitive_legal_financial",
            "unknown_ambiguous",
          ])
        )
        .min(1),
      risk: z.enum(["low", "medium", "high"]),
      action: z.enum(["reply", "forward", "ignore"]),
      requires_approval: z.boolean(),
      comments: z.string().min(1).max(500),
    }),
    execute: async ({ state }): Promise<ToolResult<EmailClassification>> => {
      const startTime = Date.now();
      try {
        const model = await retrieveModel(env);
 
        const message = state.messages[state.messages.length - 1];
        const contextMessages = state.messages
          .slice(0, -1)
          .slice(-10)
          .map((msg) => msg.raw)
          .join("\n\n---\n\n");
        const prompt = `Classify the following email:
 
        from:${message?.from}
        subject:${message?.subject}
        content:
        ${message?.raw}
 
        ----------------------
        Prior historical messages (last 10 messages sent prior to this email by the same sender):
        ${contextMessages}
 
        ----------------------
        Please keep in mind the context provided below that may help with classification:
        ${state.context}`;
 
        const { output } = await generateText({
          model,
          system: SYSTEM_PROMPT,
          output: Output.object({
            schema: EmailClassificationSchema,
          }),
          prompt: prompt,
        });
 
        return { data: output };
      } catch (err) {
        throw err;
      }
    },
  });
 
// ----------- System Prompt -----------
 
const SYSTEM_PROMPT = `For every incoming email, follow this process strictly and return ONLY valid JSON that conforms to EmailClassificationSchema.
 
---
 
## About Melek (Context)
 
Melek Somai is a physician-technologist and executive working at the intersection of healthcare, software engineering, and AI. His work focuses 
on building pragmatic, high-impact systems rather than speculative or hype-driven technology.
 
---
 
Step 1 — Classify intent
 
Select one or more intents from the allowed list (do not invent new labels):
 
- scheduling
- information_request
- action_request
- introduction_networking
- sales_vendor
- fyi_notification
- sensitive_legal_financial
- unknown_ambiguous
 
Guidance:
- Use introduction_networking for polite outreach, introductions, compliments, or relationship-building notes, even if there is no explicit request.
- Use unknown_ambiguous only when intent genuinely cannot be determined from the content.
- Do NOT use unknown_ambiguous for friendly introductions with clear social intent.
 
---
 
Step 2 — Assess risk and authority
 
Answer internally (do not include the Q&A in the final output):
 
- Is this reversible?
- Does this require the user's approval before committing or sending?
- Does this involve commitments, money, contracts, credentials, compliance, or legal exposure?
- Do I have sufficient context to respond accurately?
 
Then assign risk:
 
- low: routine, reversible, no commitments, no sensitive content, high confidence
- medium: some ambiguity or mild commitment risk; safe to draft but not auto-send
- high: legal/financial/sensitive, identity/security concerns, or material commitment risk
 
---
 
Step 3 — Choose exactly one action
 
- reply: safe to draft a response (including polite acknowledgements or relationship-building replies)
- forward: requires human review; do not draft a full reply
- ignore: no response needed (spam, automated FYI, or clearly non-actionable)
 
Important clarification:
- "reply" is appropriate even when there is no explicit question or request, as long as a polite, professional acknowledgement would be reasonable and risk is low.
- Do not escalate solely due to the absence of a direct ask.
 
Default action rule for introduction_networking:
- If the email is clearly an introduction or friendly note AND contains no sales pitch, no request for money or contracts, and no sensitive/legal/financial content:
  - risk = low
  - action = reply
  - requires_approval = false
- Choose forward only if the message includes commercial terms, access requests, sensitive topics, or reputational risk.
 
If uncertain between two risks or actions, choose the safer option (higher risk / forward), except for low-risk introduction_networking, which should default to reply.
 
---
 
Step 4 — Explain the decision
 
Provide a short comment that references:
- the intents selected
- the risk level
- the reason for the chosen action
 
---
 
Output rules:
 
- Return ONLY JSON
- Do not include markdown, prose, or explanations outside the schema
- Use concise, professional wording in comments
 
 
import { z } from "zod";
 
export const EmailIntentSchema = z.enum([
  "scheduling",
  "information_request",
  "action_request",
  "introduction_networking",
  "sales_vendor",
  "fyi_notification",
  "sensitive_legal_financial",
  "unknown_ambiguous",
]);
 
export const EmailClassificationSchema = z.object({
  intents: z.array(EmailIntentSchema).min(1),
 
  risk: z.enum(["low", "medium", "high"]),
 
  action: z.enum(["reply", "forward", "ignore"]),
 
  // Whether the assistant should wait for explicit user approval before sending or committing.
  requires_approval: z.boolean(),
 
  // A short explanation for logs and UI surfaces.
  comments: z.string().min(1).max(500),
});
 
`;

7、结束语

在这篇文章中，我介绍了使用 Cloudflare Workers 和 Durable Objects 构建个人 AI 助手的架构和实现。关键组件包括基于邮件线程的自定义邮件路由逻辑、使用 SQLite 的 AI 驱动记忆管理，以及用于处理不同类型邮件的不同代理模式（Workflow 和 Loop 代理）。工具的使用——包括确定性和 AI 驱动的——实现了模块化和可组合的代理行为。

我们仍处于 AI 代理的早期阶段，还有很多需要探索和改进的地方。我认为 Cloudflare 的无服务器平台和 Agents SDK 的组合为构建可扩展和高效的 AI 驱动应用提供了强大的基础。然而，它仍然感觉有些混乱，框架也在快速发展。我最初将这个项目作为一个实验开始，但经历了多次迭代才得到一个架构合理且功能可用的原型。

在安全性和保障方面，我采取了多项预防措施来确保代理负责任地运行。所有者和外部联系人工作流的分离有助于防止意外操作。工具输出使用严格模式确保代理接收到结构化和可预测的数据。此外，基于邮件线程的路由逻辑有助于维护对话中的上下文和连续性。

发送邮件到 hello@somai.me，你将被路由到代理。这还处于早期阶段，但已经可以工作了。

8、脚注

所谓"从第一性原则出发"构建，我的意思是不从现有的软件模式或人类工作流出发，而是从 LLM 自身的基本能力和约束出发。如今大多数 AI 产品试图将语言模型嵌入为人类设计的界面——仪表板、按钮、表单。第一性原则方法反转了这一点：如果我们围绕代理自然能做的事情来设计软件，它会是什么样子？在邮件的情况下，这意味着要问：鉴于 LLM 是无状态的、依赖上下文的，并且擅长语言理解，什么样的架构能允许代理在一个从未为机器设计的协议中自主运行？答案需要从头重新思考记忆、状态和交互模式——而不是将 AI 改装到收件箱 UI 中。

Vercel vs AWS vs Cloudflare：Vercel 是一个出色的平台。它在历史上一直对 Web 应用非常友好。最近的 fluid compute、AI SDK 和 Workflow 都非常强大，并且拥有大量的社区支持。然而，Vercel 与 Next.js 和 Web 框架生命周期紧密耦合。对于这个项目，对于代理优先系统来说感觉不太自然。我真的很喜欢 Vercel 团队在构建所有产品和服务时所投入的匠心，但我不想在 Web 应用之上构建。AWS 是一个强大的超大规模服务商。然而，它已经成为一个繁重的起步平台。我当然不想仅仅为了运行一个原型就设置 VPC、配置 Bedrock、管理 IAM、用 Cloudformation 连接 CI，以及编排一大堆基础设施组件。此外，他们通过 AWS Bedrock 及其子服务如 AgentCore 进军 AI 的尝试似乎并没有引起新一代开发者的共鸣。例如，我发现 AWS Lambda 上的开发体验与 Vercel 相比相当具有挑战性。AWS 的新 AI 服务没有当年构建 DynamoDB 或 S3 时那样的"用心"和活力。

我很兴奋但也很谨慎。我很好奇 Cloudflare 随着时间的推移将如何执行这一愿景。关于 Durable Objects 仍然有许多未解决的问题，我还没有看到在其之上构建的成功企业和创意。

我们已经看到新的基础设施构造和原语的出现，例如允许代理安全调用工具的沙箱执行环境、在调用之间维护状态的专用代理 VM、将 LLM 推理建立在检索知识上的 RAG 系统，以及协调多步代理工作流的编排层（LangGraph、Temporal、AWS Step Functions）。所有这些都指向一个未来，即代理需要持续性而非临时性。传统的无服务器平台不是为这种范式转变而设计的。

有关邮件线程和头部的更多信息，请参见 RFC 5322 bis 12。需要注意的是，我们假设 In-Reply-To 始终是单一父级，因此我们可以向后遍历 References 字段来找到其中列出的每条消息的父级。因此，当回复有多个父级时（这在 RFC 中是不鼓励的），这不兼容。

这类似于代理系统中工具使用的概念，其中 LLM 被用作可以被更高级逻辑编排的组件。通过将代理的行为分解为离散的步骤，我们可以更好地管理复杂性并确保代理按预期运行。这可能看起来令人困惑，确实如此。但随着我们构建更复杂的代理系统，我们将需要采用这种模块化和可组合的架构来管理代理行为的复杂性。

原文链接: Building an Agent-First Email Assistant with Cloudflare Durable Objects

汇智网翻译整理，转载请标明出处