MiniMax M3

M3 将三种能力整合在一起，这是智能体软件产品一直期待在一个技术栈中获得的：

M3 与 GPT-5.5 和 Gemini 3.1 Pro 相比非常有竞争力，它使用 MiniMax Sparse Attention (MSA) 让长上下文变得实用。

数字很有趣，但架构更有趣。

我个人认为，终于出现了一款具备视觉、长上下文和编程智能体能力且价格低廉的模型，但不禁要问："这是基准测试优化但在实际使用中较弱吗？它能在本地运行吗，还是对本地硬件来说太大了？"

M3 是一个强烈的信号，表明智能体技术栈正在从封闭的前沿 API 转向混合、更便宜、多模态、长上下文的系统，工程团队可以实际进行路由、评估，并最终实现自托管。

以下是独立的 DeepSWE 运行结果：

让我们来看看已经发布的内容、尚未发布的内容、如何开始使用它，以及如果你关心可靠性，基于 M3 的智能体工作流应该是什么样的。

如果你已经开始用 MiniMax、Kimi 或 GLM 的变体来替代 GPT 或 Gemini 模型，我也很乐意在评论区听到你的经验。

1、30秒简介

MiniMax M3 于 2026年6月1日发布，API 现已可用。

你可以通过以下方式调用它：

兼容 OpenAI 的 Chat Completions，地址为 https://api.minimax.io/v1
兼容 Anthropic 的 Messages，地址为 https://api.minimax.io/anthropic
Claude Code 风格的工具
OpenCode
Cursor、Cline、Roo Code、TRAE 以及其他接受自定义 OpenAI 兼容或 Anthropic 兼容端点的工具

模型 ID 为：

MiniMax-M3

兼容 OpenAI 的 Chat Completions 端点支持 M3 的文本、图像、视频和工具调用内容。

定价有一个 512K 输入 token 的分界点：512K 输入 token 及以下的调用按标准费率计费，超过 512K 的调用按更高的长上下文费率计费。

MiniMax 的发布列出了 Token Plan tiers：Plus 每月 $20 约 1.7B token，Max 每月 $50 约 5.1B token，Ultra 每月 $120 约 9.8B token：

以下是当前可用的功能：

2、最重要的技术细节是 MSA

M3 中重要的架构理念是 MiniMax Sparse Attention。

完整的注意力机制有一个熟悉的长上下文问题：随着序列长度增加，成本增长得很糟糕。对于简单的产品使用来说，这很烦人。对于智能体来说，这是结构性的。

智能体跟踪记录默认就很长。

使用普通的短上下文模型，你需要积极总结并丢失细节
使用朴素的长上下文模型，你保留所有内容并为此付费
使用薄弱的检索层，你选择了错误的文件，模型自信地编辑了错误的抽象

MSA 在注意力层解决了上下文扩展问题，它比 DSA 和 MoBA 等方法更精确地将 KV 分块，使用 "KV 外 gather Q" 的算子策略，以连续内存访问方式读取每个块，并且在 M3 的头配置下比开源的 Flash-Sparse-Attention 和 flash-moba 快 4 倍以上

然而，在长上下文效率之前，你必须围绕积极检索构建小上下文智能体。智能体会搜索、选择片段、总结，并在小工作集内操作。

这可以工作，但会产生失败模式：糟糕的检索、缺失的不变量、过时的总结和幻觉的文件关系。

有了可用的 1M 上下文，你可以给模型更多原始证据，但如果你简单地把所有内容都丢进提示词，你会产生另一种失败模式：范围漂移。

模型看到太多内容，对无关细节进行推理，并浪费 token。

正确的架构是混合的：

使用检索来构建高信号上下文包。
使用长上下文来保留真正需要无损的文件、日志、规范和跟踪记录。
仅在工具调用历史影响下一个决策时保留它。
总结低价值历史，而非高价值证据。
为每个任务阶段设置硬性的 token 预算。

口号应该是：

1M 上下文不能替代上下文工程。它提高了上下文工程的上限。

3、基准测试有用，但脚手架才是隐藏变量

M3 在 SWE-Bench Pro 上达到 59.0%，高于 GPT-5.5 和 Gemini 3.1 Pro，接近其发布对比中的 Opus 4.7。

M3 在 SVG-Bench 上也超过了 Opus 4.7，在 OmniDocBench 上得分高于 Gemini 3.1 Pro。

这些数字值得关注，但方法论部分更重要。

SWE-Bench Verified 和 SWE-Bench Pro 在内部基础设施上使用 Claude Code 作为脚手架进行测试，默认的系统提示词被覆盖
Terminal-Bench 2.1 使用 Terminus 2 作为脚手架
SWE Atlas-Codebase QNA 对某些模型使用 Mini-SWE-Agent
NL2Repo 对 Claude Opus、MiniMax-M2.7、M3 和 Gemini 3.1 Pro 使用 Claude Code 脚手架
GPT-5.5 使用 Codex 脚手架
OfficeQA Pro 使用 Claude Code 脚手架

因此，你不应将基准测试分数解读为纯粹的模型智能，因为它们衡量的是以下因素的组合：

基础模型能力
系统提示词质量
工具模式设计
文件访问策略
沙箱可靠性
重试逻辑
基准测试工具假设
评估器行为
超时限制
最大输出 token 设置
上下文截断策略

这就是为什么两个团队可以使用相同的模型，却得到非常不同的生产行为。

薄弱的工具链会让强大的模型显得 sloppy，而强大的工具链能让更便宜的模型感觉更接近前沿质量。

这正是 Claude Code 感觉比 "文本框中的 Claude" 更好的原因。

4、M3 在严肃的智能体技术栈中的定位

M3 最好被视为一种高能力的工作模型，用于需要代码理解、长上下文和视觉 grounding 组合的智能体任务。

适合的场景：

没有额外控制时不适合的场景：

务实的方案是一个路由器：

你可以将智能体视为分布式系统，模型路由是认知的负载均衡。

5、一个最小的 M3 智能体工具链

最有用的开发者练习是构建一个小型工具链，让模型通过工具进行操作。

不要从大型框架开始。

从四个工具开始：

list_files
read_file
search_repo
run_tests

然后在有防护措施后再添加 write_file。

目标是让模型的决策可观察。

以下是一个使用兼容 OpenAI API 的紧凑 Python 工具链。它故意设计得很窄。它只读取文件、搜索仓库和运行测试。它不会自动编写代码。在你信任跟踪记录后，可以稍后添加补丁应用功能。

# mini_m3_agent.py
# Minimal read/search/test harness for MiniMax-M3.
# Requires: pip install openai

from __future__ import annotations

import json
import os
import subprocess
from pathlib import Path
from typing import Any

from openai import OpenAI

ROOT = Path(os.environ.get("AGENT_REPO", ".")).resolve()
MAX_FILE_CHARS = 20_000
MAX_TOOL_OUTPUT_CHARS = 12_000
MAX_STEPS = 8

client = OpenAI(
    base_url=os.environ.get("OPENAI_BASE_URL", "https://api.minimax.io/v1"),
    api_key=os.environ["MINIMAX_API_KEY"],
)

def safe_path(path: str) -> Path:
    candidate = (ROOT / path).resolve()
    if not str(candidate).startswith(str(ROOT)):
        raise ValueError(f"Path escapes repo root: {path}")
    return candidate

def truncate(text: str, limit: int = MAX_TOOL_OUTPUT_CHARS) -> str:
    if len(text) <= limit:
        return text
    return text[:limit] + f"\n\n[truncated: {len(text) - limit} chars omitted]"

def list_files(pattern: str = "") -> str:
    cmd = ["git", "ls-files"]
    result = subprocess.run(cmd, cwd=ROOT, capture_output=True, text=True, timeout=10)
    if result.returncode != 0:
        return truncate(result.stderr)

    files = result.stdout.splitlines()
    if pattern:
        files = [f for f in files if pattern.lower() in f.lower()]
    return truncate("\n".join(files), 30_000)

def read_file(path: str) -> str:
    p = safe_path(path)
    if not p.exists() or not p.is_file():
        return f"File not found: {path}"
    return truncate(p.read_text(errors="replace"), MAX_FILE_CHARS)

def search_repo(query: str) -> str:
    result = subprocess.run(
        ["rg", "-n", "--hidden", "--glob", "!node_modules", "--glob", "!.git", query],
        cwd=ROOT,
        capture_output=True,
        text=True,
        timeout=20,
    )
    output = result.stdout if result.stdout else result.stderr
    return truncate(output)

def run_tests(command: str = "pytest -q") -> str:
    allowed_prefixes = ["pytest", "npm test", "pnpm test", "go test", "cargo test"]
    if not any(command.startswith(prefix) for prefix in allowed_prefixes):
        return f"Rejected command: {command}. Allowed prefixes: {allowed_prefixes}"

    result = subprocess.run(
        command,
        cwd=ROOT,
        shell=True,
        capture_output=True,
        text=True,
        timeout=120,
    )
    return truncate(
        f"exit_code={result.returncode}\n\nSTDOUT:\n{result.stdout}\n\nSTDERR:\n{result.stderr}"
    )

TOOL_IMPL = {
    "list_files": list_files,
    "read_file": read_file,
    "search_repo": search_repo,
    "run_tests": run_tests,
}

TOOLS: list[dict[str, Any]] = [
    {
        "type": "function",
        "function": {
            "name": "list_files",
            "description": "List tracked repository files. Optionally filter by substring.",
            "parameters": {
                "type": "object",
                "properties": {"pattern": {"type": "string"}},
                "required": [],
            },
        },
    },
    {
        "type": "function",
        "function": {
            "name": "read_file",
            "description": "Read a file from the repository root.",
            "parameters": {
                "type": "object",
                "properties": {"path": {"type": "string"}},
                "required": ["path"],
            },
        },
    },
    {
        "type": "function",
        "function": {
            "name": "search_repo",
            "description": "Search the repository using ripgrep.",
            "parameters": {
                "type": "object",
                "properties": {"query": {"type": "string"}},
                "required": ["query"],
            },
        },
    },
    {
        "type": "function",
        "function": {
            "name": "run_tests",
            "description": "Run a constrained test command and return stdout/stderr.",
            "parameters": {
                "type": "object",
                "properties": {"command": {"type": "string", "default": "pytest -q"}},
                "required": [],
            },
        },
    },
]

def run_agent(user_task: str) -> str:
    messages: list[dict[str, Any]] = [
        {
            "role": "system",
            "content": (
                "You are a senior software engineer operating through tools. "
                "Do not guess file contents. Inspect before concluding. "
                "Prefer small, reversible changes. "
                "When you have enough evidence, return a concise plan and the exact files likely needing edits."
            ),
        },
        {"role": "user", "content": user_task},
    ]

    for step in range(MAX_STEPS):
        response = client.chat.completions.create(
            model="MiniMax-M3",
            messages=messages,
            tools=TOOLS,
            max_completion_tokens=4096,
            temperature=0.3,
            extra_body={"thinking": {"type": "adaptive"}, "reasoning_split": True},
        )

        msg = response.choices[0].message
        messages.append(msg.model_dump(exclude_none=True))

        if not msg.tool_calls:
            return msg.content or ""

        for call in msg.tool_calls:
            name = call.function.name
            args = json.loads(call.function.arguments or "{}")

            if name not in TOOL_IMPL:
                tool_result = f"Unknown tool: {name}"
            else:
                try:
                    tool_result = TOOL_IMPL[name](**args)
                except Exception as exc:
                    tool_result = f"Tool error: {type(exc).__name__}: {exc}"

            messages.append(
                {
                    "role": "tool",
                    "tool_call_id": call.id,
                    "name": name,
                    "content": tool_result,
                }
            )

    return "Stopped: max agent steps reached. Review the trace and narrow the task."

if __name__ == "__main__":
    import sys

    task = " ".join(sys.argv[1:]) or "Find the most likely cause of the failing tests."
    print(run_agent(task))

运行它：

export MINIMAX_API_KEY="..."
export OPENAI_BASE_URL="https://api.minimax.io/v1"
export AGENT_REPO="/path/to/your/repo"

python mini_m3_agent.py "Investigate why the auth tests fail after the session middleware refactor."

我正在考虑在六月覆盖最新的开源模型，告诉我你希望看到什么，祝你构建愉快！

原文链接: MiniMax M3 vs GPT-5.5 and Gemini 3.1 Pro: 1M Context and Native-Multimodality

汇智网翻译整理，转载请标明出处