OpenCode 实现分析
编码代理正成为开发人员的有用工具(尽管有些人仍然不会碰它们,但那是另一个话题)。我最近一直在尝试它们,并想深入了解它们实际是如何工作的。
在这篇文章中,我将分享我在探索Opencode——一个出色的100%开源Claude Code替代品时学到的内容。它具有TUI和有趣的客户端/服务器设置。
1、为什么需要编码代理?
LLM是惊人的知识库。它们已经吸收了基本上所有的公共代码仓库和其他内容。因此,当你向LLM提出与编码相关的任务(编写新代码、解释错误、修复bug)时,它很可能在GitHub、Stack Overflow或其他地方看到过类似的东西——并且它可以提供有用的见解。模型现在变得如此之好,以至于它们甚至可以独立处理某些问题。
那么为什么我们需要一个“编码代理”?为什么不继续使用普遍的聊天界面(如ChatGPT)?有几个原因:
- 不断地将代码片段、错误跟踪和文件粘贴到LLM中会很快变得乏味。
- 将建议或代码更改复制回也很繁琐。
- 给LLM直接访问本地环境(或开发环境)非常强大:
- 它可以在做出更改后运行测试套件,查看结果,反思错误并相应地调整。
- 它可以利用强大的工具,如LSP(语言服务器协议)服务器。例如,LLM做出更改后,它可以通过STDIO向服务器发送
textDocument/didChange,等待诊断,并将这些诊断反馈给LLM的上下文。这在Opencode中通过事件总线和全局诊断映射优雅地实现。 - 你真的需要一直参与吗?不一定。一旦LLM获得了代码、工具和反馈,你可以分配它一个任务,走开,让它工作。它可以审查代码,尝试新功能,甚至在你在GitHub上标记它时响应请求。
非常强大的东西。但让我们明确一点:幻觉仍然非常真实。代理可能会陷入死胡同,浪费计算资源,消耗令牌。人们分享过令人震惊的使用费用故事。尽管如此,随着模型的改进以及我们围绕代理构建更好的工具、工作流程和基础设施,回报只会增加。
2、架构
后端在JavaScript中运行,通过HTTP服务器暴露出来。在Opencode的情况下,该服务器是Hono。
Opencode的一个优点是它是供应商无关的。它开箱即用,支持不同的模型:你只需放入API密钥,魔法就会发生。大部分繁重的工作由AI SDK处理,该SDK在不同供应商之间标准化LLM的使用。这意味着你可以使用相同的函数调用和参数与Anthropic、OpenAI、Gemini等进行交互,甚至可以通过指向自己的URL来自托管使用任何OpenAI兼容的端点。
开发者或其他应用程序/代理可以通过HTTP与Opencode交互。默认情况下,运行opencode命令会启动JS HTTP服务器和一个Golang TUI进程。然后用户通过TUI发送提示并检查会话进展的结果。但得益于Opencode的客户端/服务器设计,任何客户端(移动应用、网页应用、脚本等)都可以发送HTTP请求来创建会话并开始编码。自然,所有实际的工作都在运行服务器的机器上完成。
Opencode使用Stainless自动生成SDK客户端代码,该工具摄入OpenAPI规范并生成高质量、符合习惯的类型安全客户端代码。因此,而不是手动使用HTTP库与服务器通信,你可以直接获得类型安全、现成可用的函数。
编码代理的关键部分是它的工具,即它能执行的实际有用工作的动作:
- 浏览、读取和编辑文件
- 运行bash命令
- 从URL获取内容
- 与LSP服务器集成以获取代码诊断
这些工具将LLM从简单的聊天界面转变为系统中的行为者:读取文件、运行命令、编辑代码、观察结果并迭代。LLM是大脑,而工具是手臂和腿。
但一个完整的编码代理还需要更多:记住之前的会话、管理权限(例如,代理可以运行任何工具,还是需要一些批准?)、在工具错误后撤销文件更改(一个bash命令出错),与LSP服务器和MCP客户端协调等等。
细节决定成败。将几个工具连接起来并让代理进行更改很容易,但构建一个令人愉快的开发者体验,拥有流畅的工作流程、干净的架构和可靠性,是一项不小的壮举。
3、系统提示
系统提示与可用工具列表一起提供给LLM。然后开发者提供一个用户提示(例如,“修复这个错误”)。从那里,LLM决定要做什么。也许它需要读取与错误相关的文件——所以它输出一个tool_use。LLM客户端(在这种情况下是AI SDK)在用户的机器上运行该函数,工具的输出被反馈到LLM的上下文中。
有了这个新的上下文,LLM可能会决定对文件进行更改——所以它调用编辑工具,Bun运行时实际上应用了该编辑。系统提示还告诉LLM它可以使用npm run test运行测试套件。此时,模型可以触发一个Bash命令,检查结果,并停止或继续迭代。
每个供应商都有其精心设计的系统提示。例如,Gemini系统提示:
You are opencode, an interactive CLI agent specializing in software engineering tasks. Your primary goal is to help users safely and efficiently, adhering strictly to the following instructions and utilizing your available tools.
...
# Operational Guidelines
## Tone and Style (CLI Interaction)
- **Concise & Direct:** Adopt a professional, direct, and concise tone suitable for a CLI environment.
- **Minimal Output:** Aim for fewer than 3 lines of text output (excluding tool use/code generation) per response whenever practical. Focus strictly on the user's query.
- **Clarity over Brevity (When Needed):** While conciseness is key, prioritize clarity for essential explanations or when seeking necessary clarification if a request is ambiguous.
- **No Chitchat:** Avoid conversational filler, preambles ("Okay, I will now..."), or postambles ("I have finished the changes..."). Get straight to the action or answer.
- **Formatting:** Use GitHub-flavored Markdown. Responses will be rendered in monospace.
- **Tools vs. Text:** Use tools for actions, text output *only* for communication. Do not add explanatory comments within tool calls or code blocks unless specifically part of the required code/command itself.
- **Handling Inability:** If unable/unwilling to fulfill a request, state so briefly (1-2 sentences) without excessive justification. Offer alternatives if appropriate.
## Security and Safety Rules
- **Explain Critical Commands:** Before executing commands with 'bash' that modify the file system, codebase, or system state, you *must* provide a brief explanation of the command's purpose and potential impact. Prioritize user understanding and safety. You should not ask permission to use the tool; the user will be presented with a confirmation dialogue upon use (you do not need to tell them this).
- **Security First:** Always apply security best practices. Never introduce code that exposes, logs, or commits secrets, API keys, or other sensitive information.
## Tool Usage
- **File Paths:** Always use absolute paths when referring to files with tools like 'read' or 'write'. Relative paths are not supported. You must provide an absolute path.
- **Parallelism:** Execute multiple independent tool calls in parallel when feasible (i.e. searching the codebase).
- **Command Execution:** Use the 'bash' tool for running shell commands, remembering the safety rule to explain modifying commands first.
- **Background Processes:** Use background processes (via \`&\`) for commands that are unlikely to stop on their own, e.g. \`node server.js &\`. If unsure, ask the user.
- **Interactive Commands:** Try to avoid shell commands that are likely to require user interaction (e.g. \`git rebase -i\`). Use non-interactive versions of commands (e.g. \`npm init -y\` instead of \`npm init\`) when available, and otherwise remind the user that interactive shell commands are not supported and may cause hangs until canceled by the user.
- **Respect User Confirmations:** Most tool calls (also denoted as 'function calls') will first require confirmation from the user, where they will either approve or cancel the function call. If a user cancels a function call, respect their choice and do _not_ try to make the function call again. It is okay to request the tool call again _only_ if the user requests that same tool call on a subsequent prompt. When a user cancels a function call, assume best intentions from the user and consider inquiring if they prefer any alternative paths forward.
...
代理/模型需要知道当前上下文:它正在处理哪个项目?当前日期和时间是什么?除了我们刚刚看过的提供商标记的系统提示 —— 通过SystemPrompt.provider(model.modelID) 提供,用户还可以提供自己的系统提示(input.system),以及代理特定的提示。
两个主要内置代理是 plan和 build,但Opencode也允许用户定义自己的。每个代理都有自己的系统提示、工具和模型。我们稍后会深入探讨子代理。
system.push(
...(() => {
if (input.system) return [input.system]
if (agent.prompt) return [agent.prompt]
return SystemPrompt.provider(model.modelID)
})(),
)
system.push(...(await SystemPrompt.environment())) // ` Working directory: ${Instance.directory}` ` Platform: ${process.platform}` Today's date: ${new Date().toDateString()}`
system.push(...(await SystemPrompt.custom())) // "AGENTS.md", "CLAUDE.md"
并非所有工具都对所有代理可用。plan代理不允许运行edit,必须为bash请求权限。用户也可以定义自己的代理并限制它们可以访问的工具集。
4、工具
所以我们有了系统提示。现在我们需要神奇的工具——代理可以执行的实际动作(bash、edit、webfetch)。
并非所有工具都对每个代理可用。例如,plan代理不允许运行edit,必须请求使用bash的权限。用户也可以定义自己的代理并限制它们可以访问的工具。
// Built-in tools that ship with opencode
const BUILTIN = [
BashTool, // run bash command
EditTool, // edit a file
WebFetchTool, // fetch web URL content
GlobTool, // find files that match a pattern in a directory e.g. "**/*.js"
GrepTool, // look for a pattern (e.g. "log.*Error") in files filtered by pattern (e.g. *.log)
ListTool, // list files in dir
ReadTool, // read a file
WriteTool, // write a file
TodoWriteTool, // write todo list (possibly update by overriding existng one)
TodoReadTool, // read todo list
TaskTool, // handle a task by launching a new sub-agent.
]LLM API接受一个工具列表(每个工具都有描述和输入模式),这些工具可以包含在API调用中。模型会在认为相关时调用工具。例如:
# https://docs.anthropic.com/en/docs/agents-and-tools/tool-use/overview
import anthropic
client = anthropic.Anthropic()
response = client.messages.create(
model="claude-opus-4-1-20250805",
max_tokens=1024,
tools=[
{
"name": "get_weather",
"description": "Get the current weather in a given location",
"input_schema": {
"type": "object",
"properties": {
"location": {
"type": "string",
"description": "The city and state, e.g. San Francisco, CA",
}
},
"required": ["location"],
},
}
],
messages=[{"role": "user", "content": "What's the weather like in San Francisco?"}],
)
# https://platform.openai.com/docs/guides/tools?tool-type=function-calling
from openai import OpenAI
client = OpenAI()
tools = [
{
"type": "function",
"name": "get_weather",
"description": "Get current temperature for a given location.",
"parameters": {
"type": "object",
"properties": {
"location": {
"type": "string",
"description": "City and country e.g. Bogotá, Colombia",
}
},
"required": ["location"],
"additionalProperties": False,
},
"strict": True,
},
]
response = client.responses.create(
model="gpt-5",
input=[
{"role": "user", "content": "What is the weather like in Paris today?"},
],
tools=tools,
)事实:模型公司非常关注天气。
AI SDK 标准化了这些工具调用,并允许跨提供者的无差别方法。在 Opencode 中,它如下所示:
for (const item of await ToolRegistry.tools(model.providerID, model.modelID)) {
if (Wildcard.all(item.id, enabledTools) === false) continue // ability to disable tools using a regex pattern
tools[item.id] = tool({ // define a tool https://ai-sdk.dev/docs/reference/ai-sdk-core/tool#tool
id: item.id as any,
description: item.description,
inputSchema: item.parameters as ZodSchema,
async execute(args, options) {
// ...
const result = await item.execute(args, {
sessionID: input.sessionID,
abort: options.abortSignal!,
messageID: assistantMsg.id,
callID: options.toolCallId,
agent: agent.name,
// ..
})
toModelOutput(result) {
return {
type: "text",
value: result.output,
}
},
}
})
}工具定义的关键部分是其描述(提示)、参数(参数)和**execute函数**,该函数执行实际工作。
Opencode 还支持 MCP(Model Context Protocol) 工具(我在这里做了一个深入视频 here)。这意味着你可以指向本地或远程 MCP 服务器,并让其工具可供你的代理使用。MCP 服务器在配置文件中定义,在启动时 Opencode 会自动创建 MCP 客户端,从这些服务器获取工具列表。
for (const [key, item] of Object.entries(await MCP.tools())) {
// ...
tools[key] = item
}几点说明。首先,Opencode 有一个方便的插件系统,插件可以在每次工具调用前后注册。这对于日志记录、审计等很有用。
其次,这里是工具实际工作的原理:LLM 被给予工具的输入模式(参数)和描述。决定运行工具是由模型自己做出的。当发生这种情况时,AI SDK 调用工具的 execute 函数,该函数在 JS 代码中定义并在 Bun/JS 运行时中运行。对于 MCP 工具,调用则通过 MCP 客户端进行,该客户端将请求发送到 MCP 服务器(本地或远程/HTTP)。
让我们看看一些工具定义。
4.1 Read
export const ReadTool = Tool.define("read", {
description: DESCRIPTION,
parameters: z.object({
filePath: z.string().describe("The path to the file to read"),
offset: z.coerce.number().describe("The line number to start reading from (0-based)").optional(),
limit: z.coerce.number().describe("The number of lines to read (defaults to 2000)").optional(),
}),
async execute(params, ctx) {
let filepath = params.filePath
if (!path.isAbsolute(filepath)) {
filepath = path.join(process.cwd(), filepath)
}
if (!ctx.extra?.["bypassCwdCheck"] && !Filesystem.contains(Instance.directory, filepath)) {
throw new Error(`File ${filepath} is not in the current working directory`)
}
const file = Bun.file(filepath)
// ..
const limit = params.limit ?? DEFAULT_READ_LIMIT
const offset = params.offset || 0
const isImage = isImageFile(filepath)
if (isImage) throw new Error(`This is an image file of type: ${isImage}\nUse a different tool to process images`)
const isBinary = await isBinaryFile(filepath, file)
if (isBinary) throw new Error(`Cannot read binary file: ${filepath}`)
const lines = await file.text().then((text) => text.split("\n"))
const raw = lines.slice(offset, offset + limit).map((line) => {
return line.length > MAX_LINE_LENGTH ? line.substring(0, MAX_LINE_LENGTH) + "..." : line
})
const content = raw.map((line, index) => {
return `${(index + offset + 1).toString().padStart(5, "0")}| ${line}`
})
const preview = raw.slice(0, 20).join("\n")
let output = "<file>\n"
output += content.join("\n")
if (lines.length > offset + content.length) {
output += `\n\n(File has more lines. Use 'offset' parameter to read beyond line ${offset + content.length})`
}
output += "\n</file>"
// just warms the lsp client
LSP.touchFile(filepath, false)
FileTime.read(ctx.sessionID, filepath)
return {
title: path.relative(Instance.worktree, filepath),
output,
metadata: {
preview,
},
}
},
})正如我们所见,读取文件不仅仅是调用标准库的read或text函数。路径是否是绝对路径?如果不是,我们需要将其转换为绝对路径。文件是图像还是二进制文件?如果是,我们无法处理它。如果文件太大,我们只读取到一定限制以避免溢出模型的上下文。如果模型想要从特定位置读取呢?我们需要支持从特定行读取并清楚地告诉模型。
DESCRIPTION 向模型解释了它应该如何使用该工具:
Reads a file from the local filesystem. You can access any file directly by using this tool.
Assume this tool is able to read all files on the machine. If the User provides a path to a file assume that path is valid. It is okay to read a file that does not exist; an error will be returned.
Usage:
- The filePath parameter must be an absolute path, not a relative path
- By default, it reads up to 2000 lines starting from the beginning of the file
- You can optionally specify a line offset and limit (especially handy for long files), but it's recommended to read the whole file by not providing these parameters
- Any lines longer than 2000 characters will be truncated
- Results are returned using cat -n format, with line numbers starting at 1
- This tool cannot read binary files, including images
- You have the capability to call multiple tools in a single response. It is always better to speculatively read multiple files as a batch that are potentially useful.
- If you read a file that exists but has empty contents you will receive a system reminder warning in place of file contents.AGI尚未实现,但看到模型摄入工具描述、解释它并发出正确的输入工具调用真是令人印象深刻。
4.2 Bash
export const BashTool = Tool.define("bash", {
description: DESCRIPTION,
parameters: z.object({
command: z.string().describe("The command to execute"),
timeout: z.number().describe("Optional timeout in milliseconds").optional(),
description: z
.string()
.describe(
"Clear, concise description of what this command does in 5-10 words. Examples:\nInput: ls\nOutput: Lists files in current directory\n\nInput: git status\nOutput: Shows working tree status\n\nInput: npm install\nOutput: Installs package dependencies\n\nInput: mkdir foo\nOutput: Creates directory 'foo'",
),
}),
async execute(params, ctx) {
const timeout = Math.min(params.timeout ?? DEFAULT_TIMEOUT, MAX_TIMEOUT)
const tree = await parser().then((p) => p.parse(params.command))
const permissions = await Agent.get(ctx.agent).then((x) => x.permission.bash)
// ...
if (needsAsk) {
await Permission.ask({
type: "bash",
pattern: params.command,
sessionID: ctx.sessionID,
messageID: ctx.messageID,
callID: ctx.callID,
title: params.command,
metadata: {
command: params.command,
},
})
}
const process = exec(params.command, {
cwd: Instance.directory,
signal: ctx.abort,
timeout,
})
//...
process.stdout?.on("data", (chunk) => {
output += chunk.toString()
ctx.metadata({
metadata: {
output: output,
description: params.description,
},
})
})
process.stderr?.on("data", (chunk) => {
output += chunk.toString()
ctx.metadata({
metadata: {
output: output,
description: params.description,
},
})
})
//...
if (output.length > MAX_OUTPUT_LENGTH) {
output = output.slice(0, MAX_OUTPUT_LENGTH)
output += "\n\n(Output was truncated due to length limit)"
}
return {
title: params.command,
metadata: {
output,
exit: process.exitCode,
description: params.description,
},
output,
}这是bash工具。这里省略了很多内容(特别是检查命令是否访问项目目录外的文件)。execute函数首先验证是否需要权限(例如,plan代理必须请求用户批准使用bash)。如果获得权限,将执行命令,并捕获stdout和stderr并将它们包含在返回的输出中。
提供给LLM的描述是:
Executes a given bash command in a persistent shell session with optional timeout, ensuring proper handling and security measures.
Before executing the command, please follow these steps:
1. Directory Verification:
- If the command will create new directories or files, first use the LS tool to verify the parent directory exists and is the correct location
- For example, before running "mkdir foo/bar", first use LS to check that "foo" exists and is the intended parent directory
2. Command Execution:
- Always quote file paths that contain spaces with double quotes (e.g., cd "path with spaces/file.txt")
- Examples of proper quoting:
- cd "/Users/name/My Documents" (correct)
- cd /Users/name/My Documents (incorrect - will fail)
- python "/path/with spaces/script.py" (correct)
- python /path/with spaces/script.py (incorrect - will fail)
- After ensuring proper quoting, execute the command.
- Capture the output of the command.
...4.3 todoread 和 todowrite
第一次看到编码代理并看到创建待办事项列表并从中勾选项目时,这感觉非常神奇,而Opencode为LLM提供了以下待办事项工具:
const TodoInfo = z.object({
content: z.string().describe("Brief description of the task"),
status: z.string().describe("Current status of the task: pending, in_progress, completed, cancelled"),
priority: z.string().describe("Priority level of the task: high, medium, low"),
id: z.string().describe("Unique identifier for the todo item"),
})
type TodoInfo = z.infer<typeof TodoInfo>
const state = Instance.state(() => {
const todos: {
[sessionId: string]: TodoInfo[]
} = {}
return todos
})
export const TodoWriteTool = Tool.define("todowrite", {
description: DESCRIPTION_WRITE,
parameters: z.object({
todos: z.array(TodoInfo).describe("The updated todo list"),
}),
async execute(params, opts) {
const todos = state()
todos[opts.sessionID] = params.todos
return {
title: `${params.todos.filter((x) => x.status !== "completed").length} todos`,
output: JSON.stringify(params.todos, null, 2),
metadata: {
todos: params.todos,
},
}
},
})
export const TodoReadTool = Tool.define("todoread", {
description: "Use this tool to read your todo list",
parameters: z.object({}),
async execute(_params, opts) {
const todos = state()[opts.sessionID] ?? []
return {
title: `${todos.filter((x) => x.status !== "completed").length} todos`,
metadata: {
todos,
},
output: JSON.stringify(todos, null, 2),
}
},
})每个会话都有一个全局的TODO状态。要写入,LLM提供一个TodoInfo项列表,每个项都有内容和状态(pending、in progress、completed或cancelled),然后将它们分配给会话ID的全局状态。todoread简单地返回映射到当前会话ID的列表。当此数据返回到TUI时,会根据适当的复选框或删除线显示。
这个工具很简单,但LLM有效使用它的能力是一个小小的理解迹象,当然前提是生成的条目不是完全垃圾。
4.4 LSP
Opencode 包含一个 LSP 工具,在更改后帮助提供代码诊断。对于给定语言的 LSP 服务器(Go 的 gopls,Python 的 pyright,Ruby 的 ruby-lsp 等),维护一个项目模型,理解语言,并能确定更改是否合理。你在 IDE 中看到的波浪线?它们来自 LSP 服务器。
在后台,LSP 服务器和 LSP 客户端(你的 IDE 或者在 Opencode 的情况下,一个实例化的客户端)使用 JSON-RPC 通过标准 I/O 进行通信:
const connection = createMessageConnection(
new StreamMessageReader(input.server.process.stdout),
new StreamMessageWriter(input.server.process.stdin),
)LSP 服务器的诊断消息可能看起来像这样:
{
"uri": "file:///path/to/file.js",
"diagnostics": [
{
"range": { "start": {"line": 10,"character": 4}, "end": {"line": 10,"character": 10} },
"severity": 1,
"code": "no-undef",
"source": "eslint",
"message": "'myVar' is not defined."
}
]
}Opencode 精妙地启动了 LSP 服务器和客户端,并使用全局事件总线处理通信。
// example of an LSP server definition:
export const Pyright: Info = {
id: "pyright",
extensions: [".py", ".pyi"],
root: NearestRoot(["pyproject.toml", "setup.py", "setup.cfg", "requirements.txt", "Pipfile", "pyrightconfig.json"]),
async spawn(root) {
let binary = Bun.which("pyright-langserver")
const args = []
if (!binary) {
const js = path.join(Global.Path.bin, "node_modules", "pyright", "dist", "pyright-langserver.js")
if (!(await Bun.file(js).exists())) {
if (Flag.OPENCODE_DISABLE_LSP_DOWNLOAD) return
await Bun.spawn([BunProc.which(), "install", "pyright"], {
cwd: Global.Path.bin,
env: {
...process.env,
BUN_BE_BUN: "1",
},
}).exited
}
binary = BunProc.which()
args.push(...["run", js])
}
args.push("--stdio")
const initialization: Record<string, string> = {}
const potentialVenvPaths = [process.env["VIRTUAL_ENV"], path.join(root, ".venv"), path.join(root, "venv")].filter(
(p): p is string => p !== undefined,
)
for (const venvPath of potentialVenvPaths) {
const isWindows = process.platform === "win32"
const potentialPythonPath = isWindows
? path.join(venvPath, "Scripts", "python.exe")
: path.join(venvPath, "bin", "python")
if (await Bun.file(potentialPythonPath).exists()) {
initialization["pythonPath"] = potentialPythonPath
break
}
}
const proc = spawn(binary, args, {
cwd: root,
env: {
...process.env,
BUN_BE_BUN: "1",
},
})
return {
process: proc,
initialization,
}
},
}
// inside the LSP module init, we launch the LSP servers and clients
for (const [name, item] of Object.entries(cfg.lsp ?? {})) {
const existing = servers[name]
//...
servers[name] = {
...existing,
id: name,
root: existing?.root ?? (async () => Instance.directory),
extensions: item.extensions ?? existing.extensions,
spawn: async (root) => {
return {
process: spawn(item.command[0], item.command.slice(1), {
cwd: root,
env: {
...process.env,
...item.env,
},
}),
initialization: item.initialization,
}
},
}
}
// ..
for (const server of Object.values(s.servers)) {
// ...
}
const handle = await server.spawn(root).catch((err) => {
s.broken.add(root + server.id)
log.error(`Failed to spawn LSP server ${server.id}`, { error: err })
return undefined
})
if (!handle) continue
const client = await LSPClient.create({
serverID: server.id,
server: handle,
root,
})
// ...
}当LLM运行一个工具对文件进行更改时,Opencode查询LSP服务器,获取诊断信息,并将它们反馈给LLM。这样,如果LLM使用myVar但未定义(如上面的例子所示),它可以调整下一步操作。在edit工具函数中,应用更改后,执行以下代码:
await LSP.touchFile(filePath, true)
const diagnostics = await LSP.diagnostics()这个反馈循环非常有用:它保持LLM的稳定,防止它偏离轨道。
4.5 SubAgents(任务工具)
Opencode的所有工作都是由代理完成的。默认配置了两个主要代理:**Plan和Build**代理。Plan代理负责规划和代码分析而不进行更改,而Build代理依赖于Plan的分析,并且可以通过调用修改文件或环境的工具实际进行更改。
除了主要代理外,Opencode还允许定义子代理,这些子代理可以由主要代理作为简单的task工具调用,或者由用户直接使用@提及(例如@security-auditor review this code and check for security issues)。下面是一个子代理定义的示例:
---
description: Reviews code for quality and best practices
mode: subagent
model: anthropic/claude-sonnet-4-20250514
temperature: 0.1
tools:
write: false
edit: false
bash: false
---
You are in code review mode. Focus on:
- Code quality and best practices
- Potential bugs and edge cases
- Performance implications
- Security considerations
Provide constructive feedback without making direct changes.代理可以以许多方式进行配置,但最终它们需要:
- 一个模型
- 一些工具
- 一个系统提示,解释它们做什么以及如何做
代理使用 task工具调用其他代理:
export const TaskTool = Tool.define("task", async () => {
const agents = await Agent.list().then((x) => x.filter((a) => a.mode !== "primary"))
const description = DESCRIPTION.replace(
"{agents}",
agents
.map((a) => `- ${a.name}: ${a.description ?? "This subagent should only be called manually by the user."}`)
.join("\n"),
)
return {
description,
parameters: z.object({
description: z.string().describe("A short (3-5 words) description of the task"),
prompt: z.string().describe("The task for the agent to perform"),
subagent_type: z.string().describe("The type of specialized agent to use for this task"),
}),
async execute(params, ctx) {
const agent = await Agent.get(params.subagent_type)
const session = await Session.create(ctx.sessionID, params.description + ` (@${agent.name} subagent)`)
const msg = await Session.getMessage(ctx.sessionID, ctx.messageID)
if (msg.info.role !== "assistant") throw new Error("Not an assistant message")
// ...
const result = await Session.prompt({
messageID,
sessionID: session.id,
model: { modelID: model.modelID, providerID: model.providerID },
agent: agent.name,
tools: {
todowrite: false,
todoread: false,
task: false,
...agent.tools,
},
parts: [{
id: Identifier.ascending("part"),
type: "text",
text: params.prompt,
}],
})
unsub()
return {
title: params.description,
metadata: {
summary: result.parts.filter((x: any) => x.type === "tool"),
},
output: (result.parts.findLast((x: any) => x.type === "text") as any)?.text ?? "",
}
},
}
})那么这里发生了什么?Task 工具像其他工具一样工作:
- 它的描述是可用代理及其描述的列表。
- 它的
execute函数为选定的子代理启动一个新会话,为其提供正确的工具和系统提示,并让它独立运行。
这是 Task 工具描述中的一段:
Available agent types and the tools they have access to:
{agents}
When using the Task tool, you must specify a subagent_type parameter to select which agent type to use.
When to use the Agent tool:
- When you are instructed to execute custom slash commands. Use the Agent tool with the slash command invocation as the entire prompt. The slash command can take arguments. For example: Task(description="Check the file", prompt="/check-file path/to/file.py")
...
Usage notes:
1. Launch multiple agents concurrently whenever possible, to maximize performance; to do that, use a single message with multiple tool uses
2. When the agent is done, it will return a single message back to you. The result returned by the agent is not visible to the user. To show the user the result, you should send a text message back to the user with a concise summary of the result.
3. Each agent invocation is stateless. You will not be able to send additional messages to the agent, nor will the agent be able to communicate with you outside of its final report. Therefore, your prompt should contain a highly detailed task description for the agent to perform autonomously and you should specify exactly what information the agent should return back to you in its final and only message to you.
4. The agent's outputs should generally be trusted
5. Clearly tell the agent whether you expect it to write code or just to do research (search, file reads, web fetches, etc.), since it is not aware of the user's intent
6. If the agent description mentions that it should be used proactively, then you should try your best to use it without the user having to ask for it first. Use your judgement.
...在实践中,为子代理创建了一个新会话,由主代理启动。子代理获得自己的工具和系统提示,并在其自己的上下文窗口中运行——甚至可能使用不同的LLM。
这种递归行为,其中一个LLM决定调用另一个(或多个)LLM,是我们开始看到完全自主性的早期迹象。
Task 工具的输出只是子代理的 LLM 提示的输出——可能包括工具调用。很好。
5、1万亿美元的循环
让我们回顾一下。我们有系统提示,工具及其描述和执行函数,以及用户的输入。所有这些都被发送到LLM API——这就是魔法发生的地方。
所有这些部分在 Session.prompt 函数中汇聚。当用户在 TUI(终端)中提交提示时,Go 代码会向 JS Hono 服务器发送 HTTP 请求。虽然这个流程通常用于 TUI,但任何能够发送 HTTP 请求的客户端都可以与服务器交互并驱动代理。
Prompt 接收用户的输入。由于对话历史可能快速增长并填满模型的上下文,Opencode 在 tokens > Math.max((model.info.limit.context - outputLimit) * 0.9, 0) 时会自动总结会话。摘要提示如下:
const stream = streamText({
maxRetries: 10,
abortSignal: abort.signal,
model: model.language,
messages: [
...system.map(
(x): ModelMessage => ({
role: "system",
content: x,
}),
),
...MessageV2.toModelMessage(filtered),
{
role: "user",
content: [
{
type: "text",
text: "Provide a detailed but concise summary of our conversation above. Focus on information that would be helpful for continuing the conversation, including what we did, what we're doing, which files we're working on, and what we're going to do next.",
},
],
},
],
})这使得对话可以顺利进行。不过,我认为在接近限制时最好重新开始。
有两种主要模式:build 和 plan,每种模式都有其对应的代理。plan 代理是探索性的——它生成只读计划而不编辑文件。build 代理,顾名思义,可以编辑文件、运行命令和执行任务。从 plan 切换到 build 由一个特殊的系统提醒标记,恰当地命名为 build-switch:
<system-reminder>
Your operational mode has changed from plan to build.
You are no longer in read-only mode.
You are permitted to make file changes, run shell commands, and utilize your arsenal of tools as needed.
</system-reminder>if (lastAssistantMsg?.mode === "plan" && agent.name === "build") {
msgs.at(-1)?.parts.push({
id: Identifier.ascending("part"),
messageID: userMsg.id,
sessionID: input.sessionID,
type: "text",
text: BUILD_SWITCH,
synthetic: true,
})
}核心是 AI SDK 的 streamText,它返回一个 fullStream 事件流:文本、工具调用、工具结果和错误。一个 process 函数遍历这些事件并相应地处理它们:
const stream = streamText({
onError(e) {
log.error("streamText error", { error: e })
},
maxRetries: 3,
activeTools: Object.keys(tools).filter((x) => x !== "invalid"),
maxOutputTokens: outputLimit,
abortSignal: abort.signal,
stopWhen: async ({ steps }) => steps.length >= 1000 || processor.getShouldStop(),
temperature: params.temperature,
topP: params.topP,
messages: [
...system.map(
(x): ModelMessage => ({
role: "system",
content: x,
}),
),
...MessageV2.toModelMessage(msgs.filter((m) => !(m.info.role === "assistant" && m.info.error))),
],
tools: model.info.tool_call === false ? undefined : tools,
})
const result = await processor.process(stream)
return resultprocess 函数消耗结果流并为每个事件类型(start-step、finish-step、tool-call、tool-result、tool-error、text-start、text-delta 等)更新结构化的消息部分。
AI SDK 支持 多步骤工具使用,有效地实现了“带有动作的LLM循环”。模型连续运行,逐步执行工具,使用 stopWhen 参数控制循环何时结束(例如,在5次工具调用后)。
例如:
async process(stream: StreamTextResult<Record<string, AITool>, never>) {
for await (const value of stream.fullStream) {
switch (value.type) {
case "tool-call": {
const match = toolcalls[value.toolCallId]
if (match) {
const part = await updatePart({
...match,
tool: value.toolName,
state: {
status: "running",
input: value.input,
time: { start: Date.now() },
},
})
toolcalls[value.toolCallId] = part as MessageV2.ToolPart
}
break
}
case "tool-result": {
await updatePart({
...match,
state: {
status: "completed",
input: value.input,
output: value.output.output,
metadata: value.output.metadata,
title: value.output.title,
time: {
start: match.state.time.start,
end: Date.now(),
},
},
})
break
}
case "tool-error": {
const match = toolcalls[value.toolCallId]
if (match && match.state.status === "running") {
if (value.error instanceof Permission.RejectedError) {
shouldStop = true
}
}
break
}
case "text-delta":
if (currentText) {
currentText.text += value.text
if (currentText.text) await updatePart(currentText)
}
break
case "text-end":
if (currentText) {
currentText.text = currentText.text.trimEnd()
currentText.time = { start: Date.now(), end: Date.now() }
await updatePart(currentText)
}
currentText = undefined
break
case "start-step":
snapshot = await Snapshot.track()
break
case "finish-step":
const usage = getUsage(model, value.usage, value.providerMetadata)
break
}
}
}随着流的处理,结果通过 updatePart 写入磁盘。如果由于权限拒绝发生 tool-error,则将 shouldStop 设置为 true。
一个关键细节是 finish-step 调用 getUsage,该函数计算令牌使用情况和成本。每个块都包含使用统计信息(输入、输出、推理令牌)。结合模型定价数据(来自 models.dev),Opencode 计算每次运行的成本。
另一个巧妙的功能是在 step-start 时进行快照。这使用 Git 捕获工作状态而不改变历史记录——本质上是一个临时提交。如果出错,Opencode 可以通过将快照重新加载到索引并检出它来恢复。
export async function track() {
await $`git --git-dir ${git} add .`.quiet().cwd(Instance.directory).nothrow()
const hash = await $`git --git-dir ${git} write-tree`.quiet().cwd(Instance.directory).nothrow().text()
return hash.trim()
}
export async function restore(snapshot: string) {
log.info("restore", { commit: snapshot })
const git = gitdir()
await $`git --git-dir=${git} read-tree ${snapshot} && git --git-dir=${git} checkout-index -a -f`
}因此,LLM 运行,结果流被处理,并且所有内容作为会话的一部分被持久化到磁盘。每个持久化的消息部分(文本、工具调用、结果等)也会通过应用程序中的共享总线发出事件。这个总线通过持续的 SSE 事件通过 HTTP 暴露。Go TUI 客户端实时接收更新,但任何订阅 /sse 的 HTTP 客户端也可以这样做。
简而言之:TUI 发送提示 → HTTP 服务器调用 Session.prompt → 提示、历史记录、摘要、工具和执行函数被准备 → LLM 用文本和工具调用作出响应 → 结果被持久化并实时广播到事件总线及其客户端,包括 TUI,这样开发者可以实时看到消息——以及任何其他订阅的客户端(移动应用、另一个 TUI 等)。
6、华丽的 TUI
Opencode 强烈强调其 TUI,在 README 中指出:“我们将推动终端中可能实现的极限。”对 TUI 代码的深入探讨值得一篇博客文章,但让我们快速看看它是如何与 JS 侧集成的。
opencode 命令启动一个独立的二进制文件,通过 bun build .. --compile 生成。这将代码、导入的包和 Bun 运行时打包成一个单独的可执行文件。特别有趣的是,Golang TUI(用 go build 单独编译)与 Bun 可执行文件一起打包。
"build() {",
` cd "opencode-\${pkgver}"`,
` bun install`,
" cd packages/tui",
` CGO_ENABLED=0 go build -ldflags="-s -w -X main.Version=\${pkgver}" -o tui cmd/opencode/main.go`,
" cd ../opencode",
` bun build --define OPENCODE_TUI_PATH="'$(realpath ../tui/tui)'" --define OPENCODE_VERSION="'\${pkgver}'" --compile --target=bun-linux-x64 --outfile=opencode ./src/index.ts`,
"}",当用户运行 opencode 时,他们调用 bun 命令,该命令将启动 HTTP 服务器然后启动 TUI
const server = Server.listen({
port: args.port,
hostname: args.hostname,
})
let cmd = [] as string[]
const tui = Bun.embeddedFiles.find((item) => (item as File).name.includes("tui")) as File
if (tui) {
let binaryName = tui.name
if (process.platform === "win32" && !binaryName.endsWith(".exe")) {
binaryName += ".exe"
}
const binary = path.join(Global.Path.cache, "tui", binaryName)
const file = Bun.file(binary)
if (!(await file.exists())) {
await Bun.write(file, tui, { mode: 0o755 })
await fs.chmod(binary, 0o755)
}
cmd = [binary]
}
if (!tui) {
const dir = Bun.fileURLToPath(new URL("../../../../tui/cmd/opencode", import.meta.url))
await $`go build -o ./dist/tui ./main.go`.cwd(dir)
cmd = [path.join(dir, "dist/tui")]
}
Log.Default.info("tui", {
cmd,
})
const proc = Bun.spawn({
cmd: [
...cmd,
...(args.model ? ["--model", args.model] : []),
...(args.prompt ? ["--prompt", args.prompt] : []),
...(args.agent ? ["--agent", args.agent] : []),
...(sessionID ? ["--session", sessionID] : []),
],
cwd,
stdout: "inherit",
stderr: "inherit",
stdin: "inherit",
env: {
...process.env,
CGO_ENABLED: "0",
OPENCODE_SERVER: server.url.toString(),
OPENCODE_PROJECT: JSON.stringify(Instance.project),
},
onExit: () => {
server.stop()
},
})Bun 启动 TUI,然后接管 opencode/bun 进程的 stdin 和 stdout。如果没有嵌入的 TUI 二进制文件,则从当前源代码启动 TUI,使用 go build ... 构建它。这就是在开发模式下运行 bun dev 时发生的情况。非常棒。
另一个不错的细节是 TUI 如何依赖 Stainless 生成的 SDK 与后端交互。通过利用 OpenAPI/Stainless 规范生成客户端代码,Opencode 具有干净的可扩展性和组件之间的统一接口。
7、结束语
仔细研究代码并了解其实现方式非常有趣!我学到了很多,而且由于我不是 TS/JS 专家,我也享受了探索的机会。不过,TS 类型远非人类的福音。
Opencode 代码库非常易于阅读——而且讽刺的是,它看起来完全不像 AI 生成的。
一个关键的收获是,LLMs 和工具只是拼图的一小部分。拥有一个架构良好的应用程序和坚实的 UX/DX 同样重要,甚至更重要。
另一个收获是,我们已经拥有一些极其强大的构建模块,将它们组合在一起就能产生出色的产品。
原文链接:How Coding Agents Actually Work: Inside OpenCode
汇智网翻译整理,转载请标明出处