用AI智能体制作在线课程
输入框里有一行字:教我如何为LLM应用构建生产级检索系统。
十分钟后,管道返回一个目录:
course/
├── syllabus.md
├── lectures/
│ ├── 01_what_retrieval_actually_does.md
│ ├── 02_chunking_strategies_that_dont_ruin_recall.md
│ ├── 03_embedding_models_that_arent_oai_ada_002.md
│ ├── 04_vector_stores_and_when_they_are_overkill.md
│ ├── 05_hybrid_search_bm25_and_dense_together.md
│ ├── 06_rerankers_and_what_they_fix.md
│ ├── 07_query_rewriting_and_when_to_skip_it.md
│ ├── 08_evaluation_beyond_vibe_check.md
│ ├── 09_grounding_and_citation_discipline.md
│ ├── 10_freshness_and_re_indexing.md
│ ├── 11_latency_and_cost_tuning.md
│ └── 12_failure_modes_and_how_to_ship_with_them.md
├── quizzes/
│ ├── module_1.json
│ ├── module_2.json
│ └── module_3.json
├── assignments/
│ ├── module_2_build_a_reranker.md
│ └── module_3_ship_a_grounded_answer_pipeline.md
└── slides/
└── course.pdf
第一个模块的教学大纲看起来像这样:
Module 1: Retrieval without the Buzzwords (3 lessons)
Lesson 1: What retrieval actually does
Objectives:
- Explain what the retrieval step is for (not what embeddings are).
- Distinguish retrieval from generation and from ranking.
- Identify where retrieval belongs in the RAG pipeline.
这个输出的API花费约0.26美元,挂钟时间14分钟。根据iSpring的2026年定价指南,一个1级定制电子学习模块——带旁白的幻灯片配静态图形和简单知识检查——每完成一小时成本在250到500美元之间。
这个管道本身并不能发布一个完成的500美元课程。它发布的是一个足够快的第一稿,编辑可以在一个下午内完成它。
这篇文章涵盖了如何构建这个管道。我们将使用LangGraph中的四个代理和一个反模板化审查器。我们将在规划和写作之间设置一个人工审批关卡。我们将通过一个贯穿始终的实例一直跟踪到真正的PDF,并仔细研究这个架构所防止的确切失败模式。
1、为什么课程生成是一个管道而不是一个提示
大多数开发者尝试通过编写一个大型提示来构建课程生成器。他们告诉一个语言模型来写一个关于某个主题的完整课程。教学大纲看起来不错,第一个模块也不错。到第三个模块时,文本开始漂移。到第四个模块时,模型与第一个模块矛盾了。语调游移不定,学习目标悄悄消失了。
没有任何单一上下文窗口既足够长以容纳整个课程,又足够聚焦以使每节课都有冲击力。
课程生成几乎完美地适合多代理分解。各个阶段干净地可分离。教学大纲、讲义、评估和生产直接映射到真实的教学设计阶段。ADDIE框架(分析、设计、开发、实施、评估)长期以来就使用这些完全相同的接缝。
每个阶段有不同的最优模型。教学大纲需要一个推理密集型模型,而讲义正文散文需要便宜快速的模型。幻灯片生产还需要确定性代码。
Instructional Agents论文(Yao等人,亚利桑那州立大学,EACL 2026)证明了这是有效的。他们通过人类专家评估了跨越五门大学级计算机科学课程的多代理框架。他们的完全副驾驶模式持续优于自主生成。
我们将构建该架构的面向生产的改编版本。
2、架构
我们的系统使用四个代理、一个审查器和一个人工审批关卡。
课程代理接收一个输入句子并输出结构化教学大纲。内容代理接收一个课程大纲并输出一个完整讲义脚本。这在所有课程中并行运行。评估代理接收一个模块并输出测验和作业。生产代理是确定性Python。它接收所有输出并组装幻灯片和PDF。反模板化审查器在每次内容生成后被调用。它返回结构化纠正或批准。我们使用相对便宜的模型来完成此任务。
一个监督节点将它们连接起来。我们使用显式路由进行确定性阶段转换,使用动态并行扇出进行写作阶段。
每个代理都读写共享状态。状态模式是管道的骨架。我们使用Pydantic模型来定义数据契约。
from typing import Annotated, Literal, TypedDict
from pydantic import BaseModel, Field
BloomLevel = Literal[
"remember", "understand", "apply", "analyze", "evaluate", "create"
]
class LearningObjective(BaseModel):
id: str = Field(description="short stable id, e.g. 'lo-2-3'")
text: str = Field(description="measurable, learner-centered verb phrase")
bloom: BloomLevel
module_id: str
class Lesson(BaseModel):
id: str
module_id: str
title: str
objectives: list[str] = Field(description="LearningObjective ids covered")
prerequisites: list[str] = Field(default_factory=list)
outline: list[str] = Field(description="hook, concept, mechanism, example, ...")
class Module(BaseModel):
id: str
title: str
summary: str
lessons: list[str]
class Syllabus(BaseModel):
topic: str
audience: str
prerequisites: list[str]
modules: list[Module]
lessons: list[Lesson]
objectives: list[LearningObjective]
class Lecture(BaseModel):
lesson_id: str
title: str
body_markdown: str
checklist: list[str]
rewrite_count: int = 0
class ReviewIssue(BaseModel):
category: Literal[
"filler_phrase", "buzzword", "missing_specificity",
"uniform_rhythm", "vague_conclusion", "lexical_repetition",
]
example: str = Field(description="literal quote from the draft")
fix: str = Field(description="one-sentence correction instruction")
class ReviewVerdict(BaseModel):
lesson_id: str
approved: bool
severity: Literal["low", "medium", "high"] = "low"
issues: list[ReviewIssue] = Field(default_factory=list)
class Quiz(BaseModel):
module_id: str
questions: list[dict]
class Assignment(BaseModel):
module_id: str
brief_markdown: str
rubric: list[str]
def merge_dicts(left: dict, right: dict) -> dict:
return {**left, **right}
class CourseState(TypedDict, total=False):
topic: str
audience: str
syllabus: Syllabus
human_feedback: str
lectures: Annotated[dict[str, Lecture], merge_dicts]
verdicts: Annotated[dict[str, ReviewVerdict], merge_dicts]
quizzes: Annotated[dict[str, Quiz], merge_dicts]
assignments: Annotated[dict[str, Assignment], merge_dicts]
output_dir: str
export_bundle_path: str
我们对讲义使用自定义字典归约器而不是简单的列表追加。如果讲义被审查器拒绝并重写,新草稿会按其课程ID覆盖旧草稿,而不是在状态中堆积重复项。这保持了下游生产步骤的最终负载干净。
教学大纲持有一个带有显式模块ID反向引用的扁平课程列表。这种扁平形状使后续的并行扇出变成简单的一行代码。
2.1 课程代理
课程代理是大多数初次尝试课程生成器做对最少但最肤浅的部分。我们需要仔细地做。
一个朴素的提示要求模型写一个包含十二节课的教学大纲。我们的课程代理首先推导可衡量的学习目标。然后将它们分组到模块中,再选择教授它们的课程。这在机械层面强制执行了Quality Matters Rubric的对齐标准。
每个学习目标都带有布鲁姆分类法级别标签。选项是记忆、理解、应用、分析、评估和创造。这迫使评估代理稍后产生真正的问题混合,而不是十个多项选择词汇检查。
我们在提示中强制执行硬上限。我们将课程限制为三个模块和总共十五节课。超过这个数量,系统就开始生成垃圾来填充空间。
教学大纲起草后,图形暂停。人类审查教学大纲,然后批准它或要求修改。
from langchain_google_genai import ChatGoogleGenerativeAI
from langgraph.types import Command, interrupt
CURRICULUM_PROMPT = """You are a Curriculum Architect. Given one sentence of intent,
produce a full course syllabus designed *backwards from learning objectives*.
Hard constraints:
- 3 modules total. 10 to 15 lessons total. Do not exceed.
- Every learning objective is measurable and learner-centered (starts with a Bloom verb).
- Every module has at least one objective at Apply level or higher.
- Every lesson maps to 1 to 3 objectives from its module.
- Prerequisites surfaced explicitly at course level.
Topic: {topic}
Audience: {audience}
Prior reviewer feedback (apply it verbatim, do not argue with it):
{feedback}
"""
def curriculum_node(state: CourseState) -> Command:
llm = ChatGoogleGenerativeAI(model="gemini-2.5-pro", temperature=0.3)
architect = llm.with_structured_output(Syllabus)
syllabus: Syllabus = architect.invoke([
{
"role": "system",
"content": CURRICULUM_PROMPT.format(
topic=state["topic"],
audience=state.get("audience", "working software engineers"),
feedback=state.get("human_feedback") or "none — first draft",
),
}
])
decision = interrupt({
"phase": "syllabus_review",
"topic": state["topic"],
"syllabus": syllabus.model_dump(),
"options": ["approve", "revise"],
})
if decision.get("action") == "approve":
return Command(
update={"syllabus": syllabus, "human_feedback": ""},
goto="fan_out",
)
return Command(
update={"human_feedback": decision.get("feedback", "")},
goto="curriculum",
)
interrupt()调用暂停图形。它返回调用者在恢复时传递的任何值。负载形状是图形和用户界面之间的契约。
修改路径循环回到课程节点。下一次运行从状态中读取人类反馈,模型会看到它。这是定向修订。完整状态由检查点器持久化,所以人类可以花几个小时来审查教学大纲,图形会完美恢复。
2.2 内容代理
课程代理产生十二个课程大纲。内容代理并行写入所有十二个。
顺序生成大约需要十五分钟的写作。并行生成大约需要九十秒。每节课都是用聚焦于自身目标的独立上下文生成的。这完全消除了长上下文漂移。
from langgraph.types import Send
def fan_out_to_content_agents(state: CourseState) -> list[Send]:
"""Conditional edge: one Send per lesson, dispatched in one superstep."""
syllabus = state["syllabus"]
objectives_by_id = {o.id: o for o in syllabus.objectives}
return [
Send(
"content",
{
"lesson": lesson.model_dump(),
"objectives": [
objectives_by_id[oid].model_dump()
for oid in lesson.objectives
if oid in objectives_by_id
],
"prerequisites": lesson.prerequisites,
"topic": state["topic"],
},
)
for lesson in syllabus.lessons
]
def fan_out_node(state: CourseState) -> dict:
"""Identity node — exists only to be the source of the conditional edge."""
return {}
每个Send是一个对命名节点的调度调用,带有自定义负载。当fan_out_to_content_agents返回一个包含十二个Send对象的列表时,LangGraph在同一个超级步骤中调度content的十二个独立运行。内容节点不接收完整的CourseState。它只接收其Send中的小字典。这就是消除长上下文漂移的机制。第七节课不知道第三节课的内容,也无法与之矛盾,因为模型看不到它。
输出通过我们之前定义的merge_dicts归约器合并回全局状态。每次运行返回{"lectures": {lesson_id: Lecture}},归约器将十二个单键字典组合成一个按课程ID键入的lectures字典。对同一状态槽的并行写入不会冲突,因为每次运行都写入不同的键。
fan_out_node是一个垫片。LangGraph中的条件边必须附加到源节点,而我们希望图形可视化读起来清晰。将扇出直接附加到curriculum会在一个节点上纠缠两个不相关的分支——人工审批的修改循环和并行写作阶段——所以我们用一个空操作节点来保持图表清晰。
2.3 反模板化审查器
这是每个演示都跳过的部分。如果你从这篇文章中只取一个概念,它应该是这一层。
模板化内容包括通用填充短语如"在当今不断变化的格局中"或"值得注意的是"。它包括"游戏规则改变者"或"无缝"这样的流行词。它缺乏具体性。没有数字,没有命名工具,没有代码片段。句子长度都一样,结论是"只有时间会证明"这样模糊的短语。
我们的审查器是一个使用Gemini 2.5 Flash的结构化输出代理。它有一个任务。它接收一个生成的讲义并返回一个结构化裁决。它不在抽象中判断质量。它针对一个具体检查清单进行模式匹配。
from langchain_google_genai import ChatGoogleGenerativeAI
from langgraph.types import Command
MAX_REWRITES = 2
LESSON_PROMPT = """Write one lesson for a course on: {topic}
Lesson title: {title}
Learning objectives (must all be addressed, tagged with Bloom level):
{objectives}
Prerequisites the reader already has: {prereqs}
Skeleton (use these exact H2 headings):
## Hook
## Concept
## Mechanism
## Worked example
## Common failure
## Summary
## Checklist
Hard rules:
- 800 to 1500 words.
- At least two concrete, named artifacts (tools, libraries, papers, or code).
- At least one code snippet OR one concrete numerical example.
- No filler phrases ("in today's evolving landscape", "it's worth noting").
- No vague closers ("the possibilities are endless").
- Reviewer corrections to apply verbatim (if any): {corrections}
"""
def content_node(state: dict) -> dict:
llm = ChatGoogleGenerativeAI(model="gemini-2.5-pro", temperature=0.6)
writer = llm.with_structured_output(Lecture)
corrections = state.get("reviewer_corrections") or "none — first draft"
rewrite_count = state.get("rewrite_count", 0)
lecture: Lecture = writer.invoke([
{
"role": "system",
"content": LESSON_PROMPT.format(
topic=state["topic"],
title=state["lesson"]["title"],
objectives="\n".join(
f"- [{o['bloom']}] {o['text']}" for o in state["objectives"]
),
prereqs=", ".join(state.get("prerequisites") or []) or "none",
corrections=corrections,
),
}
])
lecture.lesson_id = state["lesson"]["id"]
lecture.rewrite_count = rewrite_count
return {"lectures": {lecture.lesson_id: lecture}}
SLOP_PATTERNS = [
"in today's", "ever-evolving", "it's worth noting", "in conclusion",
"game-changer", "revolutionary", "seamless", "cutting-edge",
"the possibilities are endless", "only time will tell", "dive deep",
"unlock the power", "at the end of the day",
]
REVIEWER_PROMPT = """You are an anti-slop reviewer. You do not judge "quality"
in the abstract. You pattern-match against a concrete checklist and return
structured issues.
Reject the lecture if ANY of these are true:
- Contains filler phrases (see regex hits below).
- Uses buzzwords without concrete referents.
- Any H2 section lacks a specific named tool, library, paper, number, or code.
- Ends with a vague conclusion ("endless", "only time will tell", "stay tuned").
- Any phrase repeats 3+ times across the lecture.
Regex pre-check hits: {regex_hits}
Lecture id: {lesson_id}
Title: {title}
Return a ReviewVerdict. Approve only if clean."""
def reviewer_node(state: dict) -> Command:
llm = ChatGoogleGenerativeAI(model="gemini-2.5-flash", temperature=0.0)
judge = llm.with_structured_output(ReviewVerdict)
lesson_id = state["lesson"]["id"]
lecture = state["lectures"][lesson_id]
lowered = lecture.body_markdown.lower()
regex_hits = [p for p in SLOP_PATTERNS if p in lowered]
verdict: ReviewVerdict = judge.invoke([
{
"role": "system",
"content": REVIEWER_PROMPT.format(
regex_hits=regex_hits or "none",
lesson_id=lecture.lesson_id,
title=lecture.title,
),
},
{"role": "user", "content": lecture.body_markdown},
])
verdict.lesson_id = lecture.lesson_id
if verdict.approved or lecture.rewrite_count >= MAX_REWRITES:
return Command(update={"verdicts": {lesson_id: verdict}}, goto="collect")
corrections = "\n".join(f"- [{i.category}] {i.fix}" for i in verdict.issues)
return Command(
update={
"rewrite_count": lecture.rewrite_count + 1,
"reviewer_corrections": corrections,
},
goto="content",
)
def collect_node(state: CourseState) -> dict:
return {}
正则预检故意简单。它对已知模板化短语进行字面子字符串匹配。这节省了语言模型在每次调用时重新推导什么是模板化的工作。它是一个廉价先验,服务于一个昂贵模型。
我们对重写使用硬上限。rewrite_count >= MAX_REWRITES分支是回退。如果模型两次失败,我们记录裁决并继续。人类编辑稍后可以看到审查器放弃了哪些课程。
循环对每个扇出分支是局部的。第三节课可能正在第二次重写,而第七节课还在第一次。它们都在同一个超级步骤中独立收敛。
收集节点是一个恒等节点。它充当十二个并行分支在评估运行之前重新汇合的连接点。
2.4 评估代理
评估代理在结构上与内容代理相同。它接收模块上下文并生成测验和作业。
提示要求代理标记每个问题的布鲁姆级别,并针对特定分布(20%记忆、40%应用、30%分析、10%评估)。在生产系统中,你会在这里添加一个后生成验证器,如果比例偏离目标太远或问题映射到缺失的目标ID,则拒绝输出。
我们还为每个模块生成一个多部分作业,限定在应用级别或更高的目标。它包括一个评分标准,教师可以据此评分。
def assessment_node(state: CourseState) -> dict:
llm = ChatGoogleGenerativeAI(model="gemini-2.5-pro", temperature=0.4)
quiz_writer = llm.with_structured_output(Quiz)
brief_writer = llm.with_structured_output(Assignment)
quizzes: dict[str, Quiz] = {}
assignments: dict[str, Assignment] = {}
for module in state["syllabus"].modules:
module_objectives = [
o for o in state["syllabus"].objectives if o.module_id == module.id
]
quiz = quiz_writer.invoke(
f"Write a 10-question quiz for module '{module.title}'. "
f"Distribute Bloom levels: 20% remember/understand, 40% apply, "
f"30% analyze, 10% evaluate/create. Each question maps to exactly "
f"one objective id from: {[o.id for o in module_objectives]}"
)
quizzes[module.id] = quiz
if any(o.bloom in ("apply", "analyze", "create") for o in module_objectives):
assignment = brief_writer.invoke(
f"Write one multi-part assignment for module '{module.title}' "
f"scoped to objectives at Apply or higher. Include a rubric."
)
assignments[module.id] = assignment
return {"quizzes": quizzes, "assignments": assignments}
2.5 使用Marp的确定性生产
这是管道中唯一严格不需要LLM的部分。幻灯片组装是确定性的。
我们使用Marp CLI。它接收结构化Markdown并生成PDF、PPTX或HTML幻灯片。代理从每节课的Hook、Summary和Checklist块中确定性地生成Marp Markdown。没有语言模型生成幻灯片内容。
LLM生成的幻灯片布局是损坏幻灯片的最大来源。它们产生过大的文本、重叠的图像和项目符号泥石流。模板和确定性组装完全消除了这个失败类别。
import re
import subprocess
from pathlib import Path
MARP_HEADER = """---
marp: true
theme: default
paginate: true
---
# {course_title}
#### {audience}
---
"""
SECTION_RE = re.compile(r"^## (Hook|Summary|Checklist)\s*\n(.*?)(?=\n## |\Z)",
re.MULTILINE | re.DOTALL)
def lecture_to_slides_md(lecture: Lecture, module_title: str) -> str:
"""Deterministic: no LLM call. Extract fixed sections, template them."""
sections = {m.group(1): m.group(2).strip() for m in SECTION_RE.finditer(lecture.body_markdown)}
slides = [
f"## {module_title}\n### {lecture.title}",
f"### Why this lesson\n\n{sections.get('Hook', '').strip()}",
f"### The idea\n\n{sections.get('Summary', '').strip()}",
]
checklist = sections.get("Checklist", "").strip()
if checklist:
slides.append(f"### Checklist\n\n{checklist}")
return "\n\n---\n\n".join(slides) + "\n\n---\n\n"
def build_course_deck(
syllabus: Syllabus,
lectures: dict[str, Lecture],
out_dir: Path,
) -> Path:
out_dir.mkdir(parents=True, exist_ok=True)
md_path = out_dir / "course.md"
pdf_path = out_dir / "course.pdf"
modules_by_id = {m.id: m for m in syllabus.modules}
parts = [MARP_HEADER.format(course_title=syllabus.topic,
audience=syllabus.audience)]
for lesson in syllabus.lessons:
lecture = lectures.get(lesson.id)
if lecture is None:
continue
module_title = modules_by_id[lesson.module_id].title
parts.append(lecture_to_slides_md(lecture, module_title))
md_path.write_text("\n".join(parts), encoding="utf-8")
subprocess.run(
["marp", str(md_path), "--pdf", "-o", str(pdf_path), "--allow-local-files"],
check=True,
)
return pdf_path
def production_node(state: CourseState) -> dict:
out_dir = Path(state.get("output_dir") or "./course")
pdf = build_course_deck(state["syllabus"], state["lectures"], out_dir / "slides")
return {"export_bundle_path": str(pdf)}
正则依赖于前面提示所强制的确切H2标题。提示保证了下游解析器所依赖的结构。这是一个关键的设计模式。当下游消费者是模板化的时,输出就足够确定性了。
2.6 连接完整图形
我们有五个节点、一个归约器驱动的扇出和一个人工审批关卡。我们将它们编译成一个图形。
from langgraph.checkpoint.memory import MemorySaver
from langgraph.graph import END, START, StateGraph
from langgraph.types import Command
def build_course_graph(checkpointer=None) -> StateGraph:
g = StateGraph(CourseState)
g.add_node("curriculum", curriculum_node)
g.add_node("fan_out", fan_out_node)
g.add_node("content", content_node)
g.add_node("reviewer", reviewer_node)
g.add_node("collect", collect_node)
g.add_node("assessment", assessment_node)
g.add_node("production", production_node)
g.add_edge(START, "curriculum")
g.add_conditional_edges("fan_out", fan_out_to_content_agents, ["content"])
g.add_edge("content", "reviewer")
g.add_edge("collect", "assessment")
g.add_edge("assessment", "production")
g.add_edge("production", END)
return g.compile(checkpointer=checkpointer or MemorySaver())
def run_course_pipeline(topic: str, audience: str, thread_id: str) -> str:
graph = build_course_graph()
config = {"configurable": {"thread_id": thread_id}}
result = graph.invoke(
{"topic": topic, "audience": audience, "output_dir": f"./course/{thread_id}"},
config=config,
)
while "__interrupt__" in result:
payload = result["__interrupt__"][0].value
print(f"\n=== Review syllabus for: {payload['topic']} ===")
for mod in payload["syllabus"]["modules"]:
print(f" Module: {mod['title']} — {len(mod['lessons'])} lessons")
choice = input("\n[a]pprove / [r]evise: ").strip().lower()
if choice.startswith("r"):
feedback = input("What should change? ")
result = graph.invoke(
Command(resume={"action": "revise", "feedback": feedback}),
config=config,
)
else:
result = graph.invoke(
Command(resume={"action": "approve"}),
config=config,
)
return result["export_bundle_path"]
if __name__ == "__main__":
path = run_course_pipeline(
topic="Teach me how to build production-grade retrieval systems for LLM apps.",
audience="working AI engineers who have shipped a RAG demo",
thread_id="course-2026-04-19",
)
print(f"\nDone. Course PDF at: {path}")
边列表清楚地展示了整个架构。从课程到扇出和从审查器到收集的转换由节点内的Command(goto=...)指令驱动。这是现代LangGraph风格。
while "__interrupt__" in result循环处理人工审批契约。如果教学大纲被修改了三次,循环就运行三次。线程ID在所有这些操作中保持状态固定。
3、结束语
如果你想在基本管道运行后继续构建,你可以将讲义脚本输入文本转语音API,并将其与FFmpeg配对以生成旁白MP4导出。你还可以添加一个对齐审计器——第二个审查器,检查评估是否实际测量了声明的目标——如果输出要放在认证机构面前,这值得做。
课程创建并没有被这个管道取代。代理没有取代课程创建者。它取代了课程创建者时间线的前三周。
原文链接: Building a Multi-Agent System That Turns One Sentence Into a $500 Online Course
汇智网翻译整理,转载请标明出处