构建有长期记忆的AI代理

我用 ChromaDB（向量数据库）、Ollama（本地 LLM）和 TypeScript 构建了一个具有语义长期记忆的原型 AI 客服代理。该代理能够跨会话记住对话，语义地理解上下文，并且运行成本为零。这是一个展示架构的 MVP，包含完整的源代码。你可以通过以下链接免费阅读全文：

1、问题：记忆丧失的 AI 代理

传统的聊天机器人有一个关键缺陷：它们会忘记。

你在周一告诉他们你的名字是 Alice。到周三时，他们又问起来。你在一次对话中解释你的技术背景。下一个会话，他们又解释你已经知道的基本概念。

这不仅令人恼火——对于生产级 AI 应用来说是一个致命缺陷。

为什么 AI 代理会忘记？

大多数实现使用以下其中一种方法：

仅会话记忆 — 关闭标签页时一切都会消失
SQL 关键字搜索 — SELECT * WHERE message LIKE '%payment%' 会错过语义含义
会话历史 — 直到你达到 token 限制才起作用（成本高，难以扩展）

以上方法都不能解决真正的问题：AI 代理需要语义级长期记忆。

2、解决方案：使用向量嵌入的语义记忆

如果你的 AI 代理能够像人类一样记住信息呢？

“支付问题” → 回忆三天前的“结账失败”
“用技术性解释” → 记得你是开发者，调整语气
“我的名字是什么？” → 能立即回忆起任何先前会话中的名字

这正是我所构建的。这是它的工作原理。

架构：记忆栈

┌─────────────────┐
│   Web UI        │  ← User interacts here
│  (HTML/CSS/JS)  │
└────────┬────────┘
         │ HTTP/REST
         ▼
┌─────────────────┐
│  Express API    │  ← Routes requests
│  (TypeScript)   │
└────────┬────────┘
         │
         ▼
┌─────────────────┐
│  Support Agent  │  ← Brain (Genkit + Ollama)
│   (Genkit)      │
└────┬───────┬────┘
     │       │
     ▼       ▼
┌─────────┐ ┌──────────┐
│ Memory  │ │  Ollama  │  ← LLM
│ Manager │ │  Client  │
└────┬────┘ └──────────┘
     │
     ▼
┌─────────────────┐
│   ChromaDB      │  ← Persistent vector storage
│ (Vector Store)  │
└─────────────────┘

关键组件：

Ollama — 本地 LLM（qwen2.5:7b）用于聊天和嵌入（nomic-embed-text）- ChromaDB — 用于语义搜索的向量数据库，采用 HNSW 索引- Genkit — 带会话管理的代理框架- Express — 带 CORS 和错误处理的 REST API 服务器- TypeScript — 具有严格验证的类型安全开发

总成本：$0（完全本地，无 API 调用）

生产特性：

基于 LLM 的事实提取，附带置信度评分
自动内存冲突解析（纠正过时信息）
上下文窗口管理（2000 token 限制）
类型验证与清理
PII 提取（姓名、电子邮件、电话、地点）
情感分析与错误报告分离

3、语义记忆的工作原理

3.1 记忆存储

当用户说："My name is Alice"

提取关键信息：“Customer's name is Alice”
生成嵌入（embedding）：将其转换为 768 维向量，使用 nomic-embed-text
存储在 ChromaDB：使用元数据（userId、timestamp、importance score）保存

// Memory storage example
async addMemory(memory: MemoryInput): Promise<string> {
  const embedding = await this.ollama.generateEmbedding(memory.content);

  await this.vectorStore.addMemory({
    id: generateId(),
    userId: memory.userId,
    content: memory.content,
    embedding: embedding,
    importance: this.calculateImportance(memory),
    timestamp: Date.now(),
    metadata: memory.metadata
  });
  return memoryId;
}

重要性评分：

个人信息（姓名、偏好）: 0.9–1.0
技术问题: 0.7–0.9
一般性问题: 0.3–0.5

3.2 记忆提取

当用户问："What's my name?"

生成查询嵌入 — 将问题转换为 768 维向量
语义搜索 — ChromaDB 使用余弦相似度找到相似向量
重新排序结果 — 结合相关性、重要性和新近性
返回前几个记忆 — 使用上下文生成回答

// Memory retrieval example
async searchRelevantMemories(
  userId: string,
  query: string
): Promise<Memory[]> {
  // Semantic search in ChromaDB
  const results = await this.vectorStore.searchMemories(query, userId, 10);

  // Re-rank: relevance(50%) + importance(30%) + recency(20%)
  const reRanked = results.map(r => ({
    ...r,
    score: (r.relevance * 0.5) + (r.importance * 0.3) + (r.recency * 0.2)
  }));
  return reRanked.sort((a, b) => b.score - a.score).slice(0, 5);
}

重新排序公式：

final_score = (relevance × 0.5) + (importance × 0.3) + (recency × 0.2)

3.3 上下文感知的响应

代理将检索到的记忆作为上下文：

// Agent response generation
async chat(userId: string, message: string): Promise<ChatResponse> {
  // Get relevant memories
  const memories = await this.memoryManager.searchRelevantMemories(
    userId,
    message,
    { limit: 5 }
  );

  // Build context
  const context = memories.map(m =>
    `[${m.timestamp}] ${m.content} (relevance: ${m.relevance})`
  ).join('\n');
  // Generate response with context
  const prompt = `
You are a customer support agent with memory of previous conversations.
CONTEXT FROM MEMORY:
${context}
CURRENT MESSAGE: ${message}
Respond naturally, referencing relevant context when helpful.
  `;
  const response = await this.ollama.chat([
    { role: 'system', content: prompt },
    { role: 'user', content: message }
  ]);
  // Store this interaction
  await this.memoryManager.addInteraction({
    userId,
    userMessage: message,
    assistantMessage: response
  });
  return { response, context: memories };
}

**结果：**代理在服务器重启或新会话后仍然回答“Your name is Alice!”

4、魔法：语义搜索 vs. 关键词搜索

让我们看看实际中的差异。

测试用例:

用户说 "checkout is broken"

3 天后，用户问："Any update on my payment issue?"

传统 SQL 搜索：

SELECT * FROM conversations
WHERE user_id = 'alice'
AND (message LIKE '%payment%' OR message LIKE '%issue%')
ORDER BY timestamp DESC;

结果：0 条匹配 ❌（关键字不匹配）

语义向量搜索：

const embedding = await generateEmbedding("payment issue");
const results = await chromaDB.query(embedding, { userId: 'alice' });

结果：以 87% 相似度找到 "checkout is broken" ✅

为什么有效：

两个短语都与购买相关的问题
向量嵌入捕捉语义含义
余弦相似度：0.87（高度相关）

5、实现细节

5.1 设置ChromaDB

// src/memory/vectorStore.ts
import { ChromaClient } from 'chromadb';

export class VectorStore {
  private client: ChromaClient;
  private collection: Collection;

  async initialize(): Promise<void> {
    this.client = new ChromaClient();

    this.collection = await this.client.getOrCreateCollection({
      name: 'customer_support_memories',
      metadata: {
        'hnsw:space': 'cosine',  // Cosine similarity
        'hnsw:M': 16,             // HNSW graph connections
        'hnsw:construction_ef': 200,
        'hnsw:search_ef': 50
      }
    });
  }

  async addMemory(memory: VectorMemory): Promise<void> {
    await this.collection.add({
      ids: [memory.id],
      embeddings: [memory.embedding],
      documents: [memory.content],
      metadatas: [{
        userId: memory.userId,
        importance: memory.importance,
        timestamp: memory.timestamp,
        category: memory.category
      }]
    });
  }

  async searchMemories(
    query: string,
    userId: string,
    limit: number = 5
  ): Promise<SearchResult[]> {
    const queryEmbedding = await this.ollama.generateEmbedding(query);

    const results = await this.collection.query({
      queryEmbeddings: [queryEmbedding],
      nResults: limit,
      where: { userId: userId }
    });

    return results.ids[0].map((id, idx) => ({
      id: id,
      content: results.documents[0][idx],
      relevance: 1 - results.distances[0][idx], // Convert distance to similarity
      metadata: results.metadatas[0][idx]
    }));
  }
}

Key Decisions:

Cosine similarity — 文本文字嵌入的最佳选择
HNSW 索引 — 快速近似最近邻搜索
用户隔离 — 每个用户拥有独立的记忆空间

5.2 嵌入缓存

生成嵌入成本高昂。让我们对它们进行缓存。

// src/models/ollama.ts
export class OllamaClient {
  private embeddingCache = new Map<string, number[]>();
  private cacheHits = 0;
  private cacheMisses = 0;

async generateEmbedding(
    text: string,
    useCache: boolean = true
  ): Promise<number[]> {
    // Create cache key (hash of text)
    const cacheKey = createHash('sha256')
      .update(text.toLowerCase().trim())
      .digest('hex');
    // Check cache
    if (useCache && this.embeddingCache.has(cacheKey)) {
      this.cacheHits++;
      return this.embeddingCache.get(cacheKey)!;
    }
    // Generate embedding via Ollama
    this.cacheMisses++;
    const response = await this.ollama.embeddings({
      model: 'nomic-embed-text',
      prompt: text
    });
    const embedding = response.embedding; // 768-dimensional vector
    // Cache it (LRU eviction when > 1000 entries)
    if (this.embeddingCache.size >= 1000) {
      const firstKey = this.embeddingCache.keys().next().value;
      this.embeddingCache.delete(firstKey);
    }
    this.embeddingCache.set(cacheKey, embedding);
    return embedding;
  }
  getCacheStats() {
    const total = this.cacheHits + this.cacheMisses;
    return {
      hits: this.cacheHits,
      misses: this.cacheMisses,
      hitRate: total > 0 ? (this.cacheHits / total * 100).toFixed(1) : '0'
    };
  }
}

Performance Impact:

缓存命中率在生产环境为 40–60%
嵌入生成大约 200ms/次
缓存检索 <1ms
总节省：约 40–50% 的响应时间

5.3 支持智能提取的记忆管理器

并非所有信息都需要记住。要智能地提取关键信息。

// src/memory/memoryManager.ts
export class MemoryManager {
  async addInteraction(interaction: Interaction): Promise<void> {
    // 记录原始对话
    await this.vectorStore.addMemory({
      content: `User: ${interaction.userMessage}\nAssistant: ${interaction.assistantMessage}`,
      type: MemoryType.CONVERSATION,
      importance: 0.5
    });

    // 使用 LLm 提取关键信息
    const facts = await this.extractKeyFacts(interaction);

    // 将每个事实单独存储，具有更高的重要性
    for (const fact of facts) {
      await this.vectorStore.addMemory({
        content: fact.content,
        type: MemoryType.EXTRACTED_FACT,
        importance: fact.importance
      });
    }
  }

  private async extractKeyFacts(interaction: Interaction): Promise<Memory[]> {
    const prompt = `
Extract key facts from this conversation that should be remembered long-term.
Focus on: personal info, preferences, issues, requests, decisions.

Conversation:
User: ${interaction.userMessage}
Assistant: ${interaction.assistantMessage}

Return JSON array of facts:
[{ "content": "Customer's name is Alice", "importance": 0.95 }]
    `;

    const response = await this.ollama.chat([
      { role: 'system', content: 'You extract key facts from conversations.' },
      { role: 'user', content: prompt }
    ]);

    return JSON.parse(response);
  }

  private calculateImportance(memory: Memory): number {
    let score = 0.5; // Base score

    // Personal information
    if (memory.content.match(/name is|called|prefer/i)) score += 0.4;

    // Technical issues
    if (memory.content.match(/bug|error|broken|issue/i)) score += 0.3;

    // Strong sentiment
    if (memory.content.match(/love|hate|frustrated|excited/i)) score += 0.2;

    return Math.min(score, 1.0);
  }
}

Memory Types:

CONVERSATION — 全部对话（重要性：0.3-0.5)
EXTRACTED_FACT — 关键信息（重要性：0.7-1.0)
PREFERENCE — 用户偏好（重要性：0.8-0.9)
EVENT — 重要动作（重要性：0.7-0.9)

5.4 REST API

// src/api/routes.ts
import express from 'express';

const router = express.Router();

// Chat endpoint
router.post('/chat', async (req, res) => {
  try {
    const { userId, message, sessionId } = req.body;

    // Validate input
    if (!userId || !message) {
      return res.status(400).json({ error: 'userId and message required' });
    }

    // Get response from agent
    const response = await agent.chat(userId, message, sessionId);

    res.json({
      response: response.response,
      memoriesUsed: response.context.length,
      avgRelevance: (response.context.reduce((sum, m) =>
        sum + m.relevance, 0) / response.context.length * 100).toFixed(0),
      responseTime: response.metadata.responseTime,
      timestamp: response.timestamp
    });
  } catch (error) {
    console.error('Chat error:', error);
    res.status(500).json({ error: 'Internal server error' });
  }
});

// Get memories
router.get('/memories/:userId', async (req, res) => {
  try {
    const { userId } = req.params;
    const limit = parseInt(req.query.limit as string) || 10;

    const memories = await memoryManager.getRecentMemories(userId, limit);

    res.json({ memories, count: memories.length });
  } catch (error) {
    res.status(500).json({ error: 'Failed to fetch memories' });
  }
});

// Delete user data (GDPR compliance)
router.delete('/memories/:userId', async (req, res) => {
  try {
    const { userId } = req.params;

    await memoryManager.deleteUserData(userId);

    res.json({ message: 'User data deleted successfully' });
  } catch (error) {
    res.status(500).json({ error: 'Failed to delete user data' });
  }
});

// Health check
router.get('/health', async (req, res) => {
  const ollamaStatus = await ollama.isAvailable();
  const memoryCount = await memoryManager.getMemoryCount();

  res.json({
    status: 'healthy',
    ollama: ollamaStatus ? 'connected' : 'disconnected',
    memories: memoryCount,
    uptime: process.uptime()
  });
});

export default router;

6、产品级特性

6.1 智能事实提取

系统使用基于 LLm 的提取（qwen2.5:7b），并采用全面的模式匹配：

// Extraction prompt covers:
- Names: "my name is X", "I am X", "call me X"
- Contact: "my email is X", "call me at Y", "my phone is Z"
- Location: "I'm from X", "I live in Y"
- Problems: "X is broken", "error with X"
- Requests: "I need X", "can you add X"
- Preferences: "I prefer X", "I like X"
- Sentiment: "I love X", "I hate Y", "frustrated with Z"

示例：

User: "Hi, I'm Alice and my email is alice@example.com"
✅ Extracted 2 facts:
1. "User's name is Alice" (importance: 1.0, confidence: 1.0)
2. "User's email is alice@example.com" (importance: 1.0, confidence: 1.0)

6.2 记忆冲突自动解决

系统检测并解决冲突信息：

User: "My name is Alice"
→ Stored: "User's name is Alice"
User: "Actually, it's spelled Alicia"
→ Detects conflict with existing name memory
→ Deletes old memory
→ Stores corrected memory with version tracking
User: "What's my name?"
→ Agent: "Your name is Alicia" ✅

冲突检测处理：

姓名修改/更改
电子邮件更新
公司/工作变动
位置信息更新
电话号码变更

6.3 可靠的验证流水线

每个提取的事实都要经过严格的验证：

// Validation checks:
✅ Required fields (content, type, category)
✅ Type enum validation (preference, event, sentiment, extracted_fact)
✅ Category enum validation (bug_report, feature_request, general, etc.)
✅ Content length limits (max 500 characters)
✅ Importance score range (0-1)
✅ Confidence score range (0-1)
✅ Minimum confidence threshold (>0.5)

无效类型会自动纠正：

LLM returns type: "contact"  (invalid)
→ System corrects to: "extracted_fact" ✅

6.4 上下文窗口管理

通过智能排序避免上下文溢出：

// Token estimation: ~4 characters = 1 token
// Max context: 2000 tokens
Priority order:
1. Customer Profile (preferences, facts)     - Most important
2. Known Issues (bug reports)                - High priority
3. Feature Requests                          - Medium priority
4. Sentiment History                         - Context
5. Recent Interactions (last 3)              - Temporal context

日志显示令牌使用：

[Context] Built context: 64/2000 tokens, 3 memories
[Context] Built context: 1847/2000 tokens, 47 memories
... (15 items truncated due to context limit)

6.5 情绪分析

情感分析与错误报告分离地独立提取情感上下文：

User: "The app crashes when I login, I'm so frustrated"
✅ Extracted 2 facts:
1. "User reported issue: app crashes" 
   (type: extracted_fact, category: bug_report, importance: 0.85)
2. "User expressed frustration about app crashing" 
   (type: sentiment, category: general, importance: 0.7)

这使得能够单独跟踪用户满意度趋势，而不混淆技术问题。

6.6 置信度打分

每个提取都附带置信分数：

{
  "content": "User's name is Alice",
  "confidence": 0.98,  // High confidence
  "importance": 0.95,
  "type": "preference"
}

置信度低于 0.5 的事实将自动被拒绝，确保质量。

6.7 记忆版本及审计跟踪

对记忆更新进行完整跟踪：

{
  "content": "User's name is Alicia",
  "metadata": {
    "replacedMemoryId": "5c67f3fd-7751-4d10-bd02-5984ea6cf8bf",
    "replacedAt": 1771042850123,
    "confidence": 0.98,
    "extractedFrom": "llm",
    "originalMessage": "Actually, it's spelled Alicia..."
  }
}

7、用例

这个架构适用于：

个人 AI 助手

记住你的偏好、日程、习惯
适应你的沟通风格
跨会话跟踪任务与目标

医疗保健应用

遵循 HIPAA 的本地数据存储下的病历历史
跨就诊的医疗上下文
个性化治疗建议

教育与辅导

学生的学习风格与进度
根据表现调整难度
记住误解以便纠正

销售 CRM

客户对话历史
交易阶段与异议
关系洞见

研究助手

论文摘要与联系
研究问题与发现
文献综述管理

8、结束语

构建一个具有语义长期记忆的 AI 代理不仅是可能的，而且是实用且负担得起的。

借助 ChromaDB、Ollama，以及 TypeScript，你可以创建一个可工作的原型，它能够：

跨会话记住对话
理解语义含义，而不仅仅是关键词
运行成本为 0（完全本地）
尊重用户隐私（数据不会离开服务器）

这个原型展示了核心架构。要进行生产部署，你需要添加：

会话持久化（SQLite/PostgreSQL）
身份验证与授权
监控与可观测性
速率限制和错误处理
负载测试与优化

AI 代理的未来不是在于更大的模型，而在于更好的记忆。

原文链接: Building an AI Agent with Long-Term Memory: ChromaDB + Ollama + TypeScript

汇智网翻译整理，转载请标明出处