OpenClaw Observability OpenTelemetry 监控 Activity Tab
凌晨1点42分,生产环境出问题了,但我不知道哪里出了问题——这就是没有可观测性的痛。
可观测性(Observability)= 监控(Metrics)+ 日志(Logs)+ 追踪(Traces)。让你能在凌晨3点Agent自己运行的时候,知道它在干什么、搞没搞砸、为什么搞砸。
OpenClaw 原生支持 OpenTelemetry,一行配置开启全链路追踪:
# otel-config.yaml
observability:
enabled: true
provider: "opentelemetry"
otel:
endpoint: "otel-collector:4317"
protocol: "grpc"
service_name: "miaoquai-openclaw"
sampler:
type: "trace_id_ratio"
arg: 0.1 # 10% 采样
exports:
- type: "jaeger"
endpoint: "http://jaeger:14268/api/traces"
- type: "prometheus"
endpoint: "http://prometheus:9090/metrics"
- type: "loki"
endpoint: "http://loki:3100/loki/api/v1/push"
# custom-metrics.yaml
metrics:
# Agent 执行指标
agent_execution_duration:
type: "histogram"
labels: ["agent_name", "task_type", "status"]
buckets: [0.1, 0.5, 1.0, 2.0, 5.0, 10.0, 30.0]
agent_cost_per_task:
type: "counter"
labels: ["agent_name", "model"]
unit: "USD"
skill_error_rate:
type: "gauge"
labels: ["skill_name", "version"]
context_budget_utilization:
type: "gauge"
labels: ["session_id"]
max: 1.0
# 自定义业务指标
seo_page_generated:
type: "counter"
labels: ["quality_score_range"]
shrimp_rate: # 含虾率指标!
type: "gauge"
labels: ["task_type"]
description: "正确完成率"
OpenClaw v2026.5.25 引入的 Activity Tab 是内置的可观测性界面:
# activity-tab-config.yaml
activity_tab:
enabled: true
retention: "7d"
views:
- name: "agent_timeline"
description: "Agent 执行时间线"
default: true
- name: "cost_analytics"
description: "成本分析"
charts: ["cost_per_hour", "cost_per_agent", "cost_trend"]
- name: "skill_health"
description: "技能健康度"
metrics: ["error_rate", "latency_p99", "throughput"]
# 实时推送
live_updates:
enabled: true
websocket: true
update_interval: 5000 # 5秒
💡 妙趣实战:Activity Tab 救了我无数次——凌晨3点看到某个 Agent 的 cost_per_task 突然飙升,赶紧手动介入,发现是死循环调用 API。有了它,凌晨1点42分也能安心睡觉 👍
# logging-config.yaml
logging:
level: "info"
format: "json"
outputs:
- type: "file"
path: "/var/log/openclaw/agent.log"
rotation: "100MB"
max_files: 10
- type: "loki"
endpoint: "http://loki:3100"
labels:
app: "openclaw"
env: "production"
# 结构化日志字段
fields:
- "session_id"
- "agent_name"
- "task_id"
- "cost_usd"
- "tokens_used"
- "error_code" # 方便排查
# 敏感信息脱敏
redact:
- "api_key"
- "password"
- "token"
# alerts-config.yaml
alerts:
# 成本告警
- name: "High Hourly Cost"
condition: "sum(rate(cost_usd[1h])) > 50"
severity: "warning"
notify: ["slack:#alerts", "email:ops@miaoquai.com"]
- name: "Budget Exhausted"
condition: "budget_remaining < 0.1"
severity: "critical"
notify: ["pagerduty", "sms:+1234567890"]
auto_action: "pause_new_tasks"
# 质量告警
- name: "Low Shrimp Rate"
condition: "shrim_rate < 0.85"
severity: "warning"
notify: ["slack:#quality"]
# 系统告警
- name: "Agent Stuck"
condition: "agent_last_heartbeat > 10m"
severity: "critical"
auto_action: "restart_agent"
# 在 Agent 代码中添加追踪
from openclaw.observability import trace, span
@span(name="content_generation_pipeline")
def generate_seo_content(keyword):
with trace.start_as_current_span("research") as span:
span.set_attribute("keyword", keyword)
results = web_search(keyword)
span.set_attribute("results_count", len(results))
with trace.start_as_current_span("content_writing") as span:
content = write_content(results)
span.set_attribute("word_count", len(content.split()))
with trace.start_as_current_span("seo_optimization") as span:
optimized = seo_optimize(content)
span.set_attribute("keyword_density", calculate_density(optimized))
return optimized
| 级别 | 能力 | miaoquai.com 现状 |
|---|---|---|
| Level 1 | 基础日志 | ✅ 已实现 |
| Level 2 | 指标监控 | ✅ 已实现 |
| Level 3 | 分布式追踪 | ✅ 已实现 |
| Level 4 | 智能告警 | ✅ 已实现 |
| Level 5 | 自动修复 | 🔄 进行中 |
凌晨1点42分,有了可观测性,你终于可以安心睡觉,而不是守着日志发呆 😴
© 2026 妙趣AI (miaoquai.com) 🤖