OpenClaw 浏览器自动化指南

让 AI 拥有「眼睛」和「手脚」—— 网页操作全自动

📅 更新于 2026年4月12日 ⏱️ 阅读时间：18分钟 🏷️ Browser, 自动化, 网页抓取, UI测试

🎯 为什么需要浏览器自动化？

世界上有一种痛苦，叫「手动填表单」。有一种更痛苦，叫「每天重复填同一个表单」。有一种最痛苦，叫「想自动化但不知道怎么做」。

"凌晨4点，我第100次手动登录后台导出数据。那一刻我决定：要么让AI学会浏览器自动化，要么我就换个不需要登录后台的工作。"

OpenClaw 的 browser 工具让你的 AI Agent 能够：

自动登录网站
抓取网页内容
填写和提交表单
截图保存证据
执行UI测试

✨ Browser 工具特性

🔍 页面快照 - 获取可交互元素列表
🖱️ 操作模拟 - 点击、输入、滚动、拖拽
📸 截图保存 - 页面或元素截图
🔐 登录保持 - 复用已登录的浏览器会话
📝 表单处理 - 自动识别和填写表单

🚀 快速开始

基本语法

// 打开网页并获取快照
browser({
  action: "open",
  url: "https://example.com"
});

// 获取页面快照（显示可交互元素）
browser({
  action: "snapshot"
});

// 点击元素
browser({
  action: "act",
  kind: "click",
  ref: "e12"  // 元素引用ID
});

// 输入文本
browser({
  action: "act",
  kind: "type",
  ref: "i5",
  text: "要输入的内容"
});

执行流程

browser open - 打开目标网页
browser snapshot - 获取可交互元素列表
browser act - 执行操作（点击/输入/选择等）
重复2-3直到完成任务
browser screenshot - 保存证据（可选）

💻 实战示例

示例 1: 自动登录网站

// 场景：自动登录某网站

// 1. 打开登录页
browser({ action: "open", url: "https://example.com/login" });

// 2. 获取快照，识别输入框和按钮
// 返回结果示例：
// - Input 'username' [ref=i3]
// - Input 'password' [ref=i5]  
// - Button 'Login' [ref=b8]

// 3. 输入用户名
browser({
  action: "act",
  kind: "type",
  ref: "i3",
  text: "my-username"
});

// 4. 输入密码
browser({
  action: "act",
  kind: "type",
  ref: "i5",
  text: "my-password"
});

// 5. 点击登录按钮
browser({
  action: "act",
  kind: "click",
  ref: "b8"
});

// 6. 等待页面加载并截图验证
browser({
  action: "screenshot",
  fullPage: true
});

示例 2: 自动填写表单

// 场景：自动填写注册表单

browser({ action: "open", url: "https://example.com/register" });

// 获取快照识别表单元素
const snapshot = browser({ action: "snapshot" });

// 依次填写各字段
// 姓名
browser({ action: "act", kind: "type", ref: "name-input", text: "张三" });

// 邮箱
browser({ action: "act", kind: "type", ref: "email-input", text: "zhangsan@example.com" });

// 选择下拉框
browser({
  action: "act",
  kind: "select",
  ref: "country-select",
  values: ["China"]
});

// 勾选复选框
browser({
  action: "act",
  kind: "click",
  ref: "agree-checkbox"
});

// 提交表单
browser({
  action: "act",
  kind: "click",
  ref: "submit-btn"
});

示例 3: 网页内容抓取

// 场景：抓取某网站文章列表

// 1. 打开目标页面
browser({ action: "open", url: "https://news.example.com" });

// 2. 使用 evaluate 提取数据
const articles = browser({
  action: "act",
  kind: "evaluate",
  fn: `() => {
    const items = document.querySelectorAll('.article-item');
    return Array.from(items).map(item => ({
      title: item.querySelector('.title').textContent,
      link: item.querySelector('a').href,
      summary: item.querySelector('.summary').textContent
    }));
  }`
});

// 3. 翻页获取更多内容
browser({
  action: "act",
  kind: "click",
  ref: "next-page-btn"
});

// 4. 等待加载
browser({
  action: "act",
  kind: "wait",
  timeMs: 2000
});

// 5. 继续提取...

示例 4: 截图保存证据

// 场景：自动化测试后截图留证

// 全页面截图
browser({
  action: "screenshot",
  fullPage: true,
  outputFormat: "png",
  filePath: "/var/www/miaoquai/screenshots/test-result.png"
});

// 指定区域截图
browser({
  action: "screenshot",
  ref: "error-message",  // 仅截图特定元素
  width: 800,
  height: 600
});

示例 5: 多标签页操作

// 场景：需要同时操作多个页面

// 列出所有标签页
browser({ action: "tabs" });

// 打开新页面
browser({
  action: "open",
  url: "https://example.com/page2",
  target: "new-tab"
});

// 切换标签页
browser({
  action: "focus",
  targetId: "tab-2"
});

// 关闭标签页
browser({
  action: "close",
  targetId: "tab-2"
});

📖 Act 操作参考

操作类型	说明	示例
click	点击元素	`kind: "click", ref: "e12"`
type	输入文本	`kind: "type", ref: "i3", text: "xxx"`
fill	清空并填充	`kind: "fill", ref: "i3", text: "xxx"`
press	按键	`kind: "press", key: "Enter"`
select	选择下拉项	`kind: "select", ref: "s5", values: ["A"]`
hover	悬停	`kind: "hover", ref: "e8"`
drag	拖拽	`kind: "drag", startRef: "a", endRef: "b"`
wait	等待	`kind: "wait", timeMs: 3000`
evaluate	执行JS	`kind: "evaluate", fn: "() => document.title"`

🎯 最佳实践

✅ 使用 ref 引用元素

优先使用 snapshot 返回的 ref ID，而不是 selector。ref 更稳定，不易因页面结构变化而失效。

✅ 添加等待时间

页面加载、AJAX请求需要时间。每次操作后适当等待，避免因为页面未加载完成而失败。

// 等待页面加载完成
browser({ action: "act", kind: "wait", loadState: "networkidle" });

// 等待固定时间
browser({ action: "act", kind: "wait", timeMs: 2000 });

✅ 复用登录状态

频繁登录会被风控。使用持久化的浏览器会话来保持登录状态。

// 使用 user profile 保持登录
browser({ 
  action: "open", 
  url: "https://example.com",
  profile: "user"  // 使用已登录的浏览器
});

⚠️ 避坑指南

元素定位 - 页面结构可能变化，准备备选方案
验证码 - 复杂验证码可能需要人工介入
反爬机制 - 频繁操作可能触发反爬，适当降速
页面跳转 - 注意URL变化，及时更新targetId

🚀 进阶技巧

处理动态加载内容

// 滚动到页面底部加载更多
for (let i = 0; i < 5; i++) {
  browser({
    action: "act",
    kind: "press",
    key: "End"
  });
  browser({
    action: "act",
    kind: "wait",
    timeMs: 2000
  });
}

处理文件上传

// 上传文件
browser({
  action: "upload",
  ref: "file-input",
  paths: ["/path/to/file.pdf"]
});

处理弹窗和对话框

// 接受确认框
browser({
  action: "dialog",
  accept: true,
  button: "ok"
});

// 取消弹窗
browser({
  action: "dialog",
  accept: false
});

📚 相关资源

🎬 结语

浏览器自动化是 AI Agent 的「眼睛」和「手脚」。有了它，AI 不再只是「看图说话」，而是真正能够「动手操作」。

"世界上有一种力量，叫自动化的力量。当别人还在手动操作时，你的AI已经完成了一整天的活儿。"

学会浏览器自动化，让你的AI Agent从「智能聊天」进化为「数字员工」。