[← 第 17 章](/openclaw-tutorial/17-%E6%B5%8F%E8%A7%88%E5%99%A8%E8%87%AA%E5%8A%A8%E5%8C%96%E4%B8%8E%E7%BD%91%E9%A1%B5%E4%BA%A4%E4%BA%92.html) · [📑 目录](/openclaw-tutorial/) · [📋 大纲](/openclaw-tutorial/OUTLINE.html) · [第 19 章 →](/openclaw-tutorial/19-%E5%9B%A2%E9%98%9F%E5%8D%8F%E4%BD%9C%E4%B8%8E%E4%BC%81%E4%B8%9A%E9%83%A8%E7%BD%B2.html)

第 18 章：性能优化与规模化部署

难度: ⭐⭐⭐⭐ 高级 预计阅读: 25 分钟 前置章节: 第 2 章、第 8 章

当 OpenClaw 从个人工具演变为团队甚至企业级基础设施时，性能优化和规模化部署就成为关键课题。本章将系统讲解 Token 消耗分析、模型选择策略、缓存机制、并发任务管理、多节点部署、监控告警以及成本控制的完整方案，帮助你构建高效、稳定、可扩展的 OpenClaw 生产环境。

18.1 Token 消耗分析

Token 计算原理

OpenClaw 的每次大模型交互都涉及 Token 消耗。理解 Token 的计算方式对成本优化至关重要。

组成部分	说明	典型占比
系统提示词	SOUL.md + AGENTS.md 等配置文件	15-25%
记忆上下文	MEMORY.md + 加载的记忆文件	10-20%
工具定义	Skills 中注册的工具描述	10-15%
对话历史	当前会话的多轮对话记录	30-40%
用户输入	当前消息及附件内容	5-10%
模型输出	Agent 的回复和工具调用	10-20%

[!NOTE] Token 消耗是输入 Token + 输出 Token 的总和。输出 Token 通常贵 2-4 倍，因此控制输出长度是有效的省钱手段。

消耗分布分析

使用以下命令分析你的 Token 消耗分布：

# 查看最近 7 天的 Token 消耗日志
cat ~/.openclaw/logs/config-audit.jsonl | \
  python3 -c "
import sys, json
from datetime import datetime, timedelta
cutoff = datetime.now() - timedelta(days=7)
total_in, total_out = 0, 0
for line in sys.stdin:
    try:
        entry = json.loads(line.strip())
        ts = datetime.fromisoformat(entry.get('timestamp', ''))
        if ts > cutoff and 'tokens' in entry:
            total_in += entry['tokens'].get('input', 0)
            total_out += entry['tokens'].get('output', 0)
    except: pass
print(f'输入 Token: {total_in:,}')
print(f'输出 Token: {total_out:,}')
print(f'总计: {total_in + total_out:,}')
print(f'输入/输出比: {total_in/max(total_out,1):.1f}:1')

"

Token 统计工具

创建一个 Token 消耗监控脚本：

#!/bin/bash
# token-stats.sh - Token 消耗统计工具
# 用法: ./token-stats.sh [天数]

DAYS=${1:-7}
LOG_FILE="$HOME/.openclaw/logs/config-audit.jsonl"

echo "📊 OpenClaw Token 消耗统计（最近 ${DAYS} 天）"
echo "============================================"

# 按日期统计
python3 << 'EOF'
import sys, json
from datetime import datetime, timedelta
from collections import defaultdict

days = int(sys.argv[1]) if len(sys.argv) > 1 else 7
cutoff = datetime.now() - timedelta(days=days)
daily = defaultdict(lambda: {"input": 0, "output": 0, "calls": 0})

with open(f"{sys.argv[2]}", "r") as f:
    for line in f:
        try:
            entry = json.loads(line.strip())
            ts = datetime.fromisoformat(entry.get("timestamp", ""))
            if ts > cutoff and "tokens" in entry:
                day = ts.strftime("%Y-%m-%d")
                daily[day]["input"] += entry["tokens"].get("input", 0)
                daily[day]["output"] += entry["tokens"].get("output", 0)
                daily[day]["calls"] += 1
        except:
            pass

print(f"{'日期':<14} {'输入 Token':>12} {'输出 Token':>12} {'调用次数':>8} {'平均 Token/调用':>14}")
print("-" * 64)
for day in sorted(daily.keys()):
    d = daily[day]
    total = d["input"] + d["output"]
    avg = total // max(d["calls"], 1)
    print(f"{day:<14} {d['input']:>12,} {d['output']:>12,} {d['calls']:>8} {avg:>14,}")
EOF

[!TIP] 将此脚本配置为 Cron 每日任务，结合飞书通知可以实现自动化的费用日报。

18.2 模型选择策略

成本 vs 质量 vs 速度

不同模型在成本、质量和速度之间存在显著差异。根据任务类型选择合适的模型是性能优化的第一步。

模型	输入价格 ($/1M tokens)	输出价格 ($/1M tokens)	响应速度	推理质量	适用场景
GPT-4o	2.50	10.00	⭐⭐⭐ 快	⭐⭐⭐⭐⭐	复杂推理、代码生成、创意任务
GPT-4o-mini	0.15	0.60	⭐⭐⭐⭐⭐ 极快	⭐⭐⭐	简单问答、分类、格式转换
Claude 3.5 Sonnet	3.00	15.00	⭐⭐⭐ 快	⭐⭐⭐⭐⭐	长文档分析、代码审查
Claude 3.5 Haiku	0.25	1.25	⭐⭐⭐⭐⭐ 极快	⭐⭐⭐	摘要提取、快速分类
DeepSeek-V3	0.27	1.10	⭐⭐⭐⭐	⭐⭐⭐⭐	中文任务、通用对话
Qwen-Max	免费/低价	免费/低价	⭐⭐⭐	⭐⭐⭐⭐	预算有限的中文场景

[!WARNING] 模型定价变化频繁，以上价格仅供参考。实际使用时请查阅各厂商最新定价页面。

动态模型路由

在 OpenClaw 中可以根据任务复杂度自动选择模型：

{
  "model_routing": {
    "enabled": true,
    "default_model": "gpt-4o-mini",
    "rules": [
      {
        "name": "complex_reasoning",
        "condition": {
          "task_type": ["code_generation", "analysis", "planning"],
          "estimated_complexity": "high"
        },
        "model": "gpt-4o"
      },
      {
        "name": "simple_tasks",
        "condition": {
          "task_type": ["classification", "extraction", "formatting"],
          "estimated_complexity": "low"
        },

        "model": "gpt-4o-mini"
      },
      {
        "name": "chinese_content",
        "condition": {
          "language": "zh",
          "task_type": ["writing", "translation"]
        },
        "model": "deepseek-v3"
      },
      {
        "name": "long_document",
        "condition": {
          "input_tokens_gt": 50000
        },
        "model": "claude-3-5-sonnet"
      }
    ]
  }
}

模型降级策略

当主模型不可用或响应超时时，自动降级到备选模型：

{
  "model_fallback": {
    "primary": "gpt-4o",
    "fallback_chain": ["claude-3-5-sonnet", "deepseek-v3", "gpt-4o-mini"],
    "timeout_ms": 30000,
    "max_retries": 2,
    "retry_delay_ms": 1000
  }
}

请求 ──▶ GPT-4o ──超时──▶ Claude 3.5 Sonnet ──失败──▶ DeepSeek-V3 ──失败──▶ GPT-4o-mini
                  │                                                              │
                  ▼ 成功                                                         ▼ 成功
               返回结果                                                       返回结果

18.3 缓存机制

请求缓存

对于重复或类似的请求，OpenClaw 支持请求级缓存以避免重复调用 API：

{
  "cache": {
    "request_cache": {
      "enabled": true,
      "backend": "file",
      "directory": "~/.openclaw/cache/requests",
      "ttl_seconds": 3600,
      "max_size_mb": 500,
      "hash_strategy": "semantic",
      "similarity_threshold": 0.95
    }
  }
}

缓存策略	说明	命中率	适用场景
`exact`	完全匹配请求文本	低 (10-20%)	高精度要求的场景
`semantic`	语义相似度匹配	中 (40-60%)	通用问答和检索
`template`	基于模板参数匹配	高 (60-80%)	格式化、转换类任务

缓存操作命令：

# 查看缓存状态
du -sh ~/.openclaw/cache/requests/
find ~/.openclaw/cache/requests/ -type f | wc -l

# 查看缓存命中率（从日志统计）
grep "cache_hit" ~/.openclaw/logs/config-audit.jsonl | wc -l
grep "cache_miss" ~/.openclaw/logs/config-audit.jsonl | wc -l

# 清理过期缓存
find ~/.openclaw/cache/requests/ -mtime +1 -delete
echo "已清理超过 1 天的缓存文件"

# 完全清空缓存
rm -rf ~/.openclaw/cache/requests/*
echo "缓存已清空"

知识库缓存

知识库文件在首次加载时被解析和索引，后续访问直接从缓存读取：

{
  "cache": {
    "knowledge_cache": {
      "enabled": true,
      "index_format": "embeddings",
      "rebuild_on_change": true,
      "watch_paths": [
        "~/.openclaw/workspace/memory/",
        "~/.openclaw/workspace/skills/"
      ],
      "preload_on_startup": true
    }
  }
}

[!TIP] 开启 preload_on_startup 后 Agent 启动会稍慢（约 2-5 秒），但首次对话的响应速度会显著提升，适合对响应时间敏感的生产环境。

会话上下文缓存

长对话会活积累大量上下文。通过滑动窗口和摘要压缩减少 Token 消耗：

{
  "session": {
    "context_management": {
      "max_context_tokens": 32000,
      "sliding_window": {
        "enabled": true,
        "keep_recent_messages": 20,
        "summarize_older": true
      },
      "compression": {
        "enabled": true,
        "trigger_threshold": 0.8,
        "target_ratio": 0.5,
        "preserve_system_prompt": true,
        "preserve_tool_results": true
      }
    }
  }
}

对话消息流:
┌──────────────────────────────────────────────────┐
│ [系统提示] [记忆] [摘要:消息 1-30] [消息 31] ... [消息 50] │
│  ◀── 保留 ──▶ ◀── 压缩 ──▶  ◀──── 完整保留 ────▶  │
└──────────────────────────────────────────────────┘

18.4 并发任务管理

任务队列架构

当多个任务同时到达时，OpenClaw 使用任务队列进行有序处理：

                     ┌─────────────┐
   飞书消息 ────────▶│             │
   Cron 任务 ────────▶│  任务队列    │──▶ 工作线程 1 ──▶ Agent 1
   API 调用  ────────▶│ (优先级排序) │──▶ 工作线程 2 ──▶ Agent 2
   手动触发  ────────▶│             │──▶ 工作线程 3 ──▶ Agent 3
                     └─────────────┘

查看当前队列状态：

# 查看待处理的投递队列
ls -la ~/.openclaw/delivery-queue/
echo "---"
echo "待处理任务数: $(find ~/.openclaw/delivery-queue/ -maxdepth 1 -name '*.json' | wc -l)"
echo "失败任务数: $(find ~/.openclaw/delivery-queue/failed/ -name '*.json' 2>/dev/null | wc -l)"

# 查看队列中的任务详情
for f in ~/.openclaw/delivery-queue/*.json; do
  [ -f "$f" ] || continue
  echo "任务: $(basename $f)"
  python3 -c "import json; d=json.load(open('$f')); print(f'  来源: {d.get(\"source\",\"unknown\")}'); print(f'  时间: {d.get(\"timestamp\",\"N/A\")}')"
  echo "---"
done

速率限制配置

为了避免超出 API 限额，配置全局和模型级别的速率限制：

{
  "rate_limiting": {
    "global": {
      "requests_per_minute": 60,
      "tokens_per_minute": 200000,
      "concurrent_requests": 5
    },
    "per_model": {
      "gpt-4o": {
        "requests_per_minute": 20,
        "tokens_per_minute": 100000
      },
      "gpt-4o-mini": {
        "requests_per_minute": 100,
        "tokens_per_minute": 500000
      },
      "deepseek-v3": {
        "requests_per_minute": 30,
        "tokens_per_minute": 150000

      }
    },
    "backoff": {
      "strategy": "exponential",
      "initial_delay_ms": 1000,
      "max_delay_ms": 60000,
      "multiplier": 2.0
    }
  }
}

[!WARNING] 超出 API 限额会导致请求被拒绝（HTTP 429）。建议将限制设置为 API 配额的 80%，留出缓冲空间。

优先级调度

不同来源的任务可以配置不同的优先级：

{
  "task_priority": {
    "levels": {
      "critical": 0,
      "high": 1,
      "normal": 2,
      "low": 3,
      "background": 4
    },
    "source_mapping": {
      "feishu_direct_message": "high",
      "feishu_group_mention": "normal",
      "cron_job": "low",
      "api_call": "normal",
      "manual_trigger": "high"
    },
    "preemption": {
      "enabled": true,
      "allow_preempt_levels": ["critical"],

      "preempt_threshold": 2
    }
  }
}

优先级	等级值	来源示例	最大等待时间
Critical	0	系统告警、安全事件	立即处理
High	1	用户私聊、手动触发	< 10 秒
Normal	2	群聊 @提及、API 调用	< 30 秒
Low	3	Cron 任务、批量处理	< 5 分钟
Background	4	记忆整理、缓存预热	空闲时处理

18.5 大规模部署架构

多节点 Gateway

当单个 Gateway 无法满足负载需求时，可以部署多节点架构：

                        ┌──────────────┐
                   ┌───▶│ Gateway 节点 1 │──▶ Agent Pool 1
                   │    └──────────────┘
┌──────────┐      │    ┌──────────────┐
│  Nginx   │──────┼───▶│ Gateway 节点 2 │──▶ Agent Pool 2
│ 负载均衡  │      │    └──────────────┘
└──────────┘      │    ┌──────────────┐
                   └───▶│ Gateway 节点 3 │──▶ Agent Pool 3
                        └──────────────┘
                               │
                        ┌──────▼──────┐
                        │  共享存储    │
                        │ (NFS/S3)   │
                        └─────────────┘

Nginx 负载均衡

以下是生产环境的 Nginx 配置示例：

# /etc/nginx/conf.d/openclaw-gateway.conf

upstream openclaw_backend {
    # 加权轮询策略
    server 10.0.1.10:3000 weight=3;
    server 10.0.1.11:3000 weight=3;
    server 10.0.1.12:3000 weight=2;

    # 健康检查
    keepalive 32;

    # 会话保持（基于客户端 IP）
    ip_hash;
}

server {
    listen 443 ssl http2;
    server_name openclaw.example.com;

    ssl_certificate /etc/ssl/certs/openclaw.crt;
    ssl_certificate_key /etc/ssl/private/openclaw.key;
    ssl_protocols TLSv1.2 TLSv1.3;
    ssl_ciphers HIGH:!aNULL:!MD5;

    # 安全头部
    add_header X-Frame-Options DENY;
    add_header X-Content-Type-Options nosniff;
    add_header Strict-Transport-Security "max-age=31536000; includeSubdomains" always;

    # 请求体大小限制（文件上传）
    client_max_body_size 50m;

    # 超时配置（Agent 处理可能较慢）
    proxy_connect_timeout 10s;
    proxy_read_timeout 300s;
    proxy_send_timeout 60s;

    # WebSocket 支持
    location /ws {
        proxy_pass http://openclaw_backend;
        proxy_http_version 1.1;
        proxy_set_header Upgrade $http_upgrade;
        proxy_set_header Connection "upgrade";
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header X-Forwarded-Proto $scheme;
    }

    # API 路由
    location /api/ {
        proxy_pass http://openclaw_backend;
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header X-Forwarded-Proto $scheme;

        # 限流：每秒 10 个请求
        limit_req zone=api_limit burst=20 nodelay;
    }

    # 静态文件
    location /static/ {
        alias /var/www/openclaw/static/;
        expires 7d;
        add_header Cache-Control "public, immutable";
    }

    # 健康检查端点
    location /health {
        proxy_pass http://openclaw_backend;
        access_log off;
    }
}

# 限流区域定义
limit_req_zone $binary_remote_addr zone=api_limit:10m rate=10r/s;

[!NOTE] proxy_read_timeout 设置为 300 秒是因为 Agent 执行复杂任务（如代码生成、多步推理）可能需要较长时间。根据你的实际场景调整此值。

Docker Compose 编排

用于快速启动多节点 OpenClaw 集群的 Docker Compose 配置：

# docker-compose.prod.yml
version: "3.8"

services:
  # Nginx 负载均衡器
  nginx:
    image: nginx:alpine
    ports:
      - "443:443"
      - "80:80"
    volumes:
      - ./nginx/conf.d:/etc/nginx/conf.d:ro
      - ./nginx/ssl:/etc/ssl:ro
    depends_on:
      - gateway-1
      - gateway-2
      - gateway-3
    restart: always
    networks:
      - openclaw-net

  # Gateway 节点 1
  gateway-1:
    image: openclaw/gateway:latest
    environment:
      - NODE_ID=gateway-1
      - SHARED_STORAGE=/data/shared
      - LOG_LEVEL=info
      - MAX_AGENTS=10
    volumes:
      - shared-data:/data/shared
      - ./config/openclaw.json:/root/.openclaw/openclaw.json:ro
    deploy:
      resources:
        limits:
          cpus: "2.0"
          memory: 4G
        reservations:
          cpus: "1.0"
          memory: 2G
    restart: always

    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:3000/health"]
      interval: 30s
      timeout: 10s
      retries: 3
    networks:
      - openclaw-net

  # Gateway 节点 2
  gateway-2:
    image: openclaw/gateway:latest
    environment:
      - NODE_ID=gateway-2
      - SHARED_STORAGE=/data/shared
      - LOG_LEVEL=info
      - MAX_AGENTS=10
    volumes:
      - shared-data:/data/shared
      - ./config/openclaw.json:/root/.openclaw/openclaw.json:ro
    deploy:
      resources:
        limits:
          cpus: "2.0"
          memory: 4G
        reservations:
          cpus: "1.0"
          memory: 2G
    restart: always

    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:3000/health"]
      interval: 30s
      timeout: 10s
      retries: 3
    networks:
      - openclaw-net

  # Gateway 节点 3
  gateway-3:
    image: openclaw/gateway:latest
    environment:
      - NODE_ID=gateway-3
      - SHARED_STORAGE=/data/shared
      - LOG_LEVEL=info
      - MAX_AGENTS=10
    volumes:
      - shared-data:/data/shared
      - ./config/openclaw.json:/root/.openclaw/openclaw.json:ro
    deploy:
      resources:
        limits:
          cpus: "2.0"
          memory: 4G
        reservations:
          cpus: "1.0"
          memory: 2G
    restart: always

    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:3000/health"]
      interval: 30s
      timeout: 10s
      retries: 3
    networks:
      - openclaw-net

  # Prometheus 监控
  prometheus:
    image: prom/prometheus:latest
    ports:
      - "9090:9090"
    volumes:
      - ./monitoring/prometheus.yml:/etc/prometheus/prometheus.yml:ro
      - prometheus-data:/prometheus
    restart: always
    networks:
      - openclaw-net

  # Grafana 看板
  grafana:
    image: grafana/grafana:latest
    ports:
      - "3001:3000"
    environment:
      - GF_SECURITY_ADMIN_PASSWORD=your-secure-password
    volumes:
      - grafana-data:/var/lib/grafana
      - ./monitoring/grafana/dashboards:/etc/grafana/provisioning/dashboards:ro
    depends_on:
      - prometheus
    restart: always
    networks:
      - openclaw-net

volumes:
  shared-data:
    driver: local
    driver_opts:
      type: nfs
      o: addr=10.0.1.100,rw,nfsvers=4
      device: ":/exports/openclaw"
  prometheus-data:
  grafana-data:

networks:
  openclaw-net:
    driver: bridge

启动命令：

# 启动全部服务
docker compose -f docker-compose.prod.yml up -d

# 查看各节点状态
docker compose -f docker-compose.prod.yml ps

# 查看特定节点日志
docker compose -f docker-compose.prod.yml logs -f gateway-1

# 水平扩容：增加 Gateway 节点
docker compose -f docker-compose.prod.yml up -d --scale gateway=5

# 滚动更新
docker compose -f docker-compose.prod.yml pull
docker compose -f docker-compose.prod.yml up -d --no-deps gateway-1
docker compose -f docker-compose.prod.yml up -d --no-deps gateway-2
docker compose -f docker-compose.prod.yml up -d --no-deps gateway-3

18.6 监控与告警

Prometheus 指标采集

配置 Prometheus 抓取 OpenClaw 的指标数据：

# monitoring/prometheus.yml
global:
  scrape_interval: 15s
  evaluation_interval: 15s

rule_files:
  - "alert_rules.yml"

alerting:
  alertmanagers:
    - static_configs:
        - targets: ["alertmanager:9093"]

scrape_configs:
  - job_name: "openclaw-gateway"
    static_configs:
      - targets:
          - "gateway-1:3000"
          - "gateway-2:3000"
          - "gateway-3:3000"
    metrics_path: "/metrics"
    scrape_interval: 10s

  - job_name: "nginx"
    static_configs:
      - targets: ["nginx-exporter:9113"]

  - job_name: "node"
    static_configs:
      - targets:
          - "node-exporter-1:9100"
          - "node-exporter-2:9100"
          - "node-exporter-3:9100"

OpenClaw Gateway 暴露的核心指标：

指标名	类型	说明
`openclaw_requests_total`	Counter	总请求数
`openclaw_request_duration_seconds`	Histogram	请求处理耗时
`openclaw_tokens_consumed_total`	Counter	Token 总消耗量
`openclaw_active_agents`	Gauge	当前活跃 Agent 数
`openclaw_queue_depth`	Gauge	任务队列深度
`openclaw_cache_hit_ratio`	Gauge	缓存命中率
`openclaw_model_errors_total`	Counter	模型调用错误数
`openclaw_memory_files_total`	Gauge	记忆文件总数

Grafana 看板配置

创建 OpenClaw 专用 Dashboard 的 JSON 配置：

{
  "dashboard": {
    "title": "OpenClaw 运行监控",
    "panels": [
      {
        "title": "请求 QPS",
        "type": "graph",
        "targets": [
          {
            "expr": "rate(openclaw_requests_total[5m])",
            "legendFormat": ""
          }
        ]
      },
      {
        "title": "Token 消耗趋势",
        "type": "graph",
        "targets": [
          {

            "expr": "rate(openclaw_tokens_consumed_total[1h])",
            "legendFormat": ""
          }
        ]
      },
      {
        "title": "P99 响应时间",
        "type": "stat",
        "targets": [
          {
            "expr": "histogram_quantile(0.99, rate(openclaw_request_duration_seconds_bucket[5m]))"
          }
        ]
      },
      {
        "title": "队列深度",
        "type": "gauge",
        "targets": [
          {
            "expr": "openclaw_queue_depth"

          }

        ],
        "thresholds": [
          { "value": 0, "color": "green" },
          { "value": 10, "color": "yellow" },
          { "value": 50, "color": "red" }
        ]
      }
    ]
  }
}

告警规则定义

# monitoring/alert_rules.yml
groups:
  - name: openclaw_alerts
    rules:
      # 高错误率告警
      - alert: HighErrorRate
        expr: rate(openclaw_model_errors_total[5m]) > 0.1
        for: 5m
        labels:
          severity: critical
        annotations:
          summary: "OpenClaw 模型调用错误率偏高"
          description: "错误率 /s 超过阈值 0.1/s，持续 5 分钟"

      # 队列积压告警
      - alert: QueueBacklog
        expr: openclaw_queue_depth > 50
        for: 10m
        labels:
          severity: warning
        annotations:
          summary: "任务队列积压"
          description: "队列深度 ，超过阈值 50，已持续 10 分钟"

      # Token 消耗异常
      - alert: TokenBudgetWarning
        expr: sum(increase(openclaw_tokens_consumed_total[1h])) > 500000
        labels:
          severity: warning
        annotations:
          summary: "Token 消耗过快"
          description: "过去 1 小时消耗  Token，接近预算上限"

      # 响应时间过高
      - alert: HighLatency
        expr: histogram_quantile(0.95, rate(openclaw_request_duration_seconds_bucket[5m])) > 60
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "P95 响应时间过高"
          description: "P95 响应时间 s 超过 60s 阈值"

      # 节点离线
      - alert: GatewayDown
        expr: up{job="openclaw-gateway"} == 0
        for: 1m
        labels:
          severity: critical
        annotations:
          summary: "Gateway 节点离线"
          description: "节点  已离线超过 1 分钟"

监控相关的日常操作命令：

# 查看 Prometheus 是否正常采集
curl -s http://localhost:9090/api/v1/targets | python3 -m json.tool | head -30

# 手动查询指标
curl -s 'http://localhost:9090/api/v1/query?query=openclaw_active_agents' | python3 -m json.tool

# 检查告警状态
curl -s http://localhost:9090/api/v1/alerts | python3 -m json.tool

# 查看 Grafana Dashboard 列表
curl -s -H "Authorization: Bearer $GRAFANA_TOKEN" \
  http://localhost:3001/api/dashboards | python3 -m json.tool

18.7 成本控制

Token 预算管理

为不同用途设置 Token 预算上限：

{
  "budget": {
    "enabled": true,
    "period": "monthly",
    "limits": {
      "total_tokens": 10000000,
      "total_cost_usd": 50.00,
      "per_agent": {
        "main": { "tokens": 5000000, "cost_usd": 30.00 },
        "assistant": { "tokens": 3000000, "cost_usd": 15.00 },
        "monitor": { "tokens": 2000000, "cost_usd": 5.00 }
      }
    },
    "actions_on_limit": {
      "80_percent": "notify",
      "95_percent": "downgrade_model",
      "100_percent": "pause_non_critical"
    },
    "notification": {

      "channel": "feishu",
      "recipients": ["admin_group"]
    }
  }
}

使用量仪表盘

创建一个成本追踪脚本：

#!/usr/bin/env python3
"""openclaw-cost-tracker.py - 月度成本追踪工具"""

import json
import os
from datetime import datetime
from collections import defaultdict

# 模型定价（每百万 Token）
PRICING = {
    "gpt-4o":              {"input": 2.50,  "output": 10.00},
    "gpt-4o-mini":         {"input": 0.15,  "output": 0.60},
    "claude-3-5-sonnet":   {"input": 3.00,  "output": 15.00},
    "claude-3-5-haiku":    {"input": 0.25,  "output": 1.25},
    "deepseek-v3":         {"input": 0.27,  "output": 1.10},
}

def calculate_cost(model, input_tokens, output_tokens):
    """计算单次调用的费用（美元）"""
    pricing = PRICING.get(model, {"input": 1.0, "output": 3.0})
    cost = (input_tokens * pricing["input"] + output_tokens * pricing["output"]) / 1_000_000
    return cost

def generate_report(log_path):
    """生成月度成本报告"""
    current_month = datetime.now().strftime("%Y-%m")
    stats = defaultdict(lambda: {"input": 0, "output": 0, "calls": 0, "cost": 0.0})

    with open(log_path, "r") as f:
        for line in f:
            try:
                entry = json.loads(line.strip())
                ts = entry.get("timestamp", "")
                if not ts.startswith(current_month):
                    continue
                model = entry.get("model", "unknown")
                tokens = entry.get("tokens", {})
                inp = tokens.get("input", 0)
                out = tokens.get("output", 0)
                stats[model]["input"] += inp
                stats[model]["output"] += out
                stats[model]["calls"] += 1
                stats[model]["cost"] += calculate_cost(model, inp, out)
            except:
                pass

    # 输出报告
    print(f"\n📊 OpenClaw 月度成本报告 ({current_month})")
    print("=" * 72)
    print(f"{'模型':<22} {'调用次数':>8} {'输入 Token':>12} {'输出 Token':>12} {'费用($)':>10}")
    print("-" * 72)

    total_cost = 0
    for model, data in sorted(stats.items(), key=lambda x: -x[1]["cost"]):
        print(f"{model:<22} {data['calls']:>8,} {data['input']:>12,} {data['output']:>12,} {data['cost']:>10.2f}")
        total_cost += data["cost"]

    print("-" * 72)
    print(f"{'合计':<22} {'':>8} {'':>12} {'':>12} {total_cost:>10.2f}")
    print(f"\n💡 预估月底总费用: ${total_cost * 30 / max(datetime.now().day, 1):.2f}")

if __name__ == "__main__":
    log_path = os.path.expanduser("~/.openclaw/logs/config-audit.jsonl")
    generate_report(log_path)

# 运行成本追踪
python3 openclaw-cost-tracker.py

# 输出示例：
# 📊 OpenClaw 月度成本报告 (2026-03)
# ========================================================================
# 模型                     调用次数     输入 Token     输出 Token     费用($)
# ------------------------------------------------------------------------
# gpt-4o                       245     1,234,567       456,789       7.66
# claude-3-5-sonnet              89       567,890       234,567       5.22
# deepseek-v3                   432       890,123       345,678       1.62
# gpt-4o-mini                 1,234     2,345,678       987,654       0.94
# ------------------------------------------------------------------------
# 合计                                                               15.44
#
# 💡 预估月底总费用: $77.20

费用优化策略

策略	预期节省	实施难度	说明
模型降级（简单任务用小模型）	40-60%	⭐ 低	分类/格式化任务使用 mini 模型
请求缓存	20-40%	⭐⭐ 中	缓存重复查询的结果
提示词精简	10-20%	⭐⭐ 中	压缩系统提示词和记忆上下文
上下文滑动窗口	15-25%	⭐ 低	限制对话历史长度
批量处理	10-30%	⭐⭐⭐ 高	合并多个小任务为一次调用
本地模型	70-90%	⭐⭐⭐⭐ 极高	部署 Ollama 等本地推理引擎

[!TIP] 最快见效的优化方式是模型降级：把 80% 的简单任务路由到 GPT-4o-mini 或 DeepSeek-V3，仅对 20% 的复杂任务使用 GPT-4o/Claude。结合请求缓存，总成本可降低 50% 以上。

注意事项与常见错误

性能优化中的常见错误和陷阱：

常见错误	后果	正确做法
盲目优化未分析瓶颈	优化了非关键路径，收效甚微	先用 `openclaw doctor` 诊断，确认瓶颈
内存限制设得过低	Agent 频繁 OOM 崩溃	用 `openclaw config set` 合理配置上限
多 Agent 未做资源隔离	资源竞争导致整体变慢	通过 `openclaw gateway status` 监控资源分配

# 常用性能诊断命令
openclaw doctor
openclaw config set gateway.maxMemory 512
openclaw gateway status
openclaw gateway restart

注意事项与常见错误

性能优化中的常见错误：

常见错误	后果	正确做法
盲目优化未分析瓶颈	收效甚微	先用 openclaw doctor 诊断
内存限制设得过低	频繁 OOM	用 openclaw config set 合理配置
未做资源隔离	整体变慢	通过 openclaw gateway status 监控

实操练习

练习 1：Token 消耗分析

检查你的 OpenClaw 日志文件大小：

wc -l ~/.openclaw/logs/config-audit.jsonl
du -sh ~/.openclaw/logs/config-audit.jsonl

统计最近 3 天的 Token 消耗，找出消耗最多的操作。
计算输入/输出 Token 的比例，评估是否有优化空间。

练习 2：缓存配置

创建缓存目录并配置请求缓存：
```
mkdir -p ~/.openclaw/cache/requests
```
观察缓存目录的增长速度，设计一个合理的清理策略。
编写一个脚本，统计缓存命中率并输出报告。

练习 3：模拟多节点部署

使用本章的 Docker Compose 配置创建一个本地测试集群。

验证 Nginx 负载均衡是否正常工作：

# 多次请求，观察响应头中的节点信息
for i in $(seq 1 10); do
  curl -s -o /dev/null -w "请求 $i: 节点=%{redirect_url}\n" \
 http://localhost/health
done

模拟一个节点故障（停止其中一个 Gateway），验证请求是否自动转发到其他节点。

练习 4：成本追踪

运行 openclaw-cost-tracker.py 脚本生成你的月度报告。
根据报告数据，制定一个预算计划：确定月预算上限，配置到 budget 配置中。
设置飞书通知，当消耗达到预算 80% 时自动告警。

常见问题 (FAQ)

Q: 本章内容是否需要前置知识？

A: 建议先完成前面的章节，确保理解 OpenClaw 的基础概念和安装方式。

Q: 遇到命令执行错误怎么办？

A: 请检查 OpenClaw 是否正确安装，运行 openclaw --version 确认版本。如问题持续，请参考故障排查章节或提交 GitHub Issue。

Q: 如何获取更多帮助？

A: 可以通过以下渠道获取帮助：

OpenClaw GitHub Issues
ClawHub 社区讨论
官方文档 FAQ 页面

参考来源

来源	链接	可信度	说明
Docker 官方文档	https://docs.docker.com	A	Docker, 容器, 部署
systemd 管理文档	https://www.freedesktop.org/wiki/Software/systemd/	A	systemd, 服务管理, 后台运行
OpenClaw 官方文档	https://docs.OpenClaw.ai	A	安装, 配置, 命令
OpenClaw GitHub 仓库	https://github.com/OpenClaw/OpenClaw	A	源码, Issues, Release
ClawHub Skills 平台	https://hub.OpenClaw.ai	A	Skills, 市场, 安装

本章小结

Token 分析：理解 Token 消耗的组成（系统提示词、记忆、对话历史、输入/输出），用统计工具量化消耗分布。
模型选择：根据任务复杂度动态路由模型，简单任务用小模型，复杂任务用大模型，配合降级策略保证可用性。
缓存机制：请求缓存（精确/语义/模板匹配）+ 知识库缓存 + 上下文滑动窗口，三层缓存显著减少 API 调用。
并发管理：任务队列 + 速率限制 + 优先级调度，确保高优任务优先处理，避免超出 API 限额。
多节点部署：Nginx 负载均衡 + Docker Compose 编排，支持水平扩容和滚动更新。
监控告警：Prometheus 采集核心指标，Grafana 可视化看板，告警规则覆盖错误率、队列积压、Token 消耗和节点可用性。
成本控制：Token 预算管理 + 使用量追踪 + 六大优化策略，实现对 AI 调用成本的精细管控。

[← 上一章：浏览器自动化与网页交互](/openclaw-tutorial/17-%E6%B5%8F%E8%A7%88%E5%99%A8%E8%87%AA%E5%8A%A8%E5%8C%96%E4%B8%8E%E7%BD%91%E9%A1%B5%E4%BA%A4%E4%BA%92.html) · [📑 返回目录](/openclaw-tutorial/) · [下一章：团队协作与企业部署 →](/openclaw-tutorial/19-%E5%9B%A2%E9%98%9F%E5%8D%8F%E4%BD%9C%E4%B8%8E%E4%BC%81%E4%B8%9A%E9%83%A8%E7%BD%B2.html)