Agent Context Management 是什么？AI Agent 上下文管理详解

世界上有一种技术叫 Context Management，它就像Agent的"短期记忆"——让Agent记住你3分钟前说了什么，而不是每次都像第一次见面一样："你好，你是谁？"

凌晨1点13分，我的Agent在API调用中第5次忘了之前的对话内容。我忍不住问它："我说了多少遍别叫我找数据库？"它答："这是我们的第一次对话吗？我不太确定。"——那一刻我决定研究上下文管理。

📚 定义

Agent Context Management 是管理AI Agent上下文（包括对话历史、系统状态、用户信息、任务状态等）的技术体系。

核心挑战：LLM的上下文窗口是有限的（通常2K-200K token），但对话可以无限长。

上下文窗口：LLM单次能处理的token上限
上下文压缩：用更少的token表示相同的信息
上下文回退：旧的上下文被丢弃后需要检索回来
状态同步：多轮对话中状态一致性维护

🔬 核心策略

策略1: 滑动窗口 (Sliding Window)
├── 保留最近N轮对话
├── 旧的自动丢弃
└── 适合：简短问答场景

策略2: 摘要压缩 (Summary Compression)
├── 每N轮对话生成一次摘要
├── 替换掉原始对话
└── 适合：长对话、复杂任务

策略3: 分层内存 (Hierarchical Memory)
├── L1: 当前对话 (滑动窗口)
├── L2: 会话摘要 (压缩)
├── L3: 长期记忆 (向量数据库)
└── 适合：多会话、长期交互

策略4: 智能检索 (Intelligent Retrieval)
├── 根据当前查询检索相关历史
├── 选择性注入
└── 适合：知识密集型任务

🚀 OpenClaw 实战

# OpenClaw 上下文管理配置
agent:
  context:
    # 滑动窗口
    sliding_window:
      max_turns: 20
      max_tokens: 8192
    
    # 摘要压缩
    compression:
      enabled: true
      summarize_every: 10    # 每10轮
      model: claude-3-haiku  # 用轻量模型
      template: |
        以下是{username}和AI助手的对话摘要。
        用户主要兴趣：{interests}
        已完成任务：{completed_tasks}
        待办事项：{pending_tasks}
        活跃主题：{active_topics}
    
    # 记忆系统
    memory:
      type: vector_db  # chromadb | sqlite | redis
      embedding_model: text-embedding-3-small
      collections:
        - name: working_memory    # 短期：会话期间
          ttl_hours: 24
        - name: long_term_memory  # 长期：永不过期
          top_k: 5
    
    # 上下文注入策略
    injection:
      priority:  # 注入优先级
        - system_prompt    # 系统指令（固定）
        - working_memory   # 会话工作记忆（前几轮）
        - summary          # 对话摘要
        - long_term_memory # 长期记忆（按需）
        - recent_history   # 最近对话

💻 代码示例

// context-manager.js
export class ContextManager {
  constructor(options = {}) {
    this.maxTokens = options.maxTokens || 8192;
    this.summaryInterval = options.summaryInterval || 10;
    this.history = [];
    this.summary = null;
    this.turnCount = 0;
  }
  
  // 添加新消息
  async addMessage(role, content) {
    this.history.push({ role, content });
    this.turnCount++;
    
    // 检查是否需要摘要
    if (this.turnCount % this.summaryInterval === 0) {
      await this._summarize();
    }
    
    // 检查token是否超限
    if (this._estimateTokens() > this.maxTokens) {
      this._prune();
    }
  }
  
  // 构建最终的上下文
  buildContext(systemPrompt) {
    let context = [
      { role: 'system', content: systemPrompt }
    ];
    
    // 如果有摘要，先注入摘要
    if (this.summary) {
      context.push({
        role: 'system',
        content: `[对话摘要] ${this.summary}`
      });
    }
    
    // 然后注入实际对话
    // 保留足够多的最近对话
    const recentHistory = this._getRecentHistory();
    context.push(...recentHistory);
    
    return context;
  }
  
  async _summarize() {
    // 用轻量模型生成摘要
    // 实际请调用 API
    const textToSummarize = this.history
      .slice(-this.summaryInterval)
      .map(m => `${m.role}: ${m.content}`)
      .join('\n');
    
    this.summary = await this._callLLM(`请用中文总结这段对话：${textToSummarize}`);
    
    // 清空原始历史（保留未总结的部分）
    this.history = this.history.slice(-Math.min(5, this.history.length));
  }
  
  _getRecentHistory() {
    // 保留最近的5轮对话
    return this.history.slice(-5);
  }
  
  _estimateTokens() {
    // 粗略估计：每个中文字≈2 token，英文≈1 token
    return JSON.stringify(this.history).length;
  }
  
  _prune() {
    // 超限时丢弃最早的对话
    while (this._estimateTokens() > this.maxTokens && this.history.length > 2) {
      this.history.shift();
    }
  }
  
  async _callLLM(prompt) {
    // 模拟调用LLM
    return `这是对话摘要（${this.turnCount}轮对话）...`;
  }
}

🎯 最佳实践

✅ 好习惯：

对话窗口设20-30轮，兼顾记忆和token
用轻量模型做摘要（Claude Haiku / GPT-4o-mini）
定期保存上下文快照，支持恢复
上下文注入时按重要性排序

⚠️ 踩坑：

摘要丢失细节：摘要会丢细节，重要信息需要独立存储
上下文污染：历史错误信息会影响后续判断
多轮遗忘：超过20轮后Agent可能"忘记"之前的关键承诺

Agent Context Management 是什么？

📚 定义

🔬 核心策略

🚀 OpenClaw 实战

💻 代码示例

🎯 最佳实践

🔗 相关推荐

📚 相关术语

📚 推荐阅读

Agent Context Management 是什么？

📚 定义

🔬 核心策略

🚀 OpenClaw 实战

💻 代码示例

🎯 最佳实践

相关页面

相关页面

🔗 相关推荐

📚 相关术语

📚 推荐阅读