让浏览器自己工作：AI自动化技术落地全攻略【AI助力全员提效方向】

在数字化转型的浪潮中，自动化技术已经从简单的脚本执行发展为具备智能决策能力的复杂系统。根据Gartner最新报告，到2025年，超过70%的企业将在其业务流程中采用某种形式的AI驱动自动化。在测试运行时，Midscene.js 会尝试复用之前缓存的资源(如渲染结果、静态文件等)，从而加速测试执行。MidScene.js 是一款面向智能自动化的 AI 场景化编程框架，通过自然语言交互和机器学习能力，

地推

679人浏览 · 2025-08-25 16:16:56

地推 · 2025-08-25 16:16:56 发布

自动化技术的演进与现状

在数字化转型的浪潮中，自动化技术已经从简单的脚本执行发展为具备智能决策能力的复杂系统。根据Gartner最新报告，到2025年，超过70%的企业将在其业务流程中采用某种形式的AI驱动自动化。这种转变不仅提高了效率，更重要的是赋予了自动化系统前所未有的适应性和创造力。

传统自动化工具虽然能够完成重复性任务，但面对动态变化的网页元素、复杂的用户交互场景时往往力不从心。这正是AI技术可以大显身手的地方——通过机器学习算法理解上下文，做出智能决策，并实时调整执行策略。

传统自动化VS智能自动化

流程图

传统自动化

让浏览器自己工作：AI自动化技术落地全攻略【AI助力全员提效方向】_UI

智能自动化

让浏览器自己工作：AI自动化技术落地全攻略【AI助力全员提效方向】_API_02

各自特点

维度	传统自动化	智能自动化
元素定位	精确选择器匹配	视觉特征+语义理解混合定位
流程设计	固定工作流	基于目标的动态路径生成
异常处理	预设try-catch块	实时诊断+自主恢复
测试数据	静态数据集	动态生成符合业务规则的数据
维护成本	变更导致大量脚本失效	自动适应部分UI变化
执行速度	快(毫秒级响应)	较慢(需AI推理时间)
准确定位	100%精确但脆弱	95%准确但健壮
适用场景	稳定业务流程	动态复杂场景

代码对比

传统自动化

async function testLogin(page) {
  await page.fill('#username', 'testuser');
  await page.fill('#password', 'Pass123!');
  await page.click('#login-btn');
  await expect(page).toHaveURL(/dashboard/);
}

痛点：元素ID变更即导致脚本失败

智能自动化

async function smartLogin(page, ai) {
  const context = {
    pageHTML: await page.content(),
    task: "完成登录操作",
    constraints: "使用有效测试凭证"
  };
  
  const plan = await ai.generateActionPlan(context);
  
  for (const action of plan.actions) {
    if (action.type === 'fill') {
      const element = await ai.locateElement({
        page: page,
        description: action.field
      });
      await element.fill(await ai.generateTestData(action.field));
    }
    // 其他动作类型处理...
  }
  
  const result = await ai.verifyOutcome({
    page: page,
    expected: "成功登录"
  });
}

优势：自动适应登录表单结构调整

使用技术

Playwright是什么？

Playwright 是由 Microsoft 开发的一款跨浏览器、跨平台的 Web 自动化与测试工具，支持 Chromium(Chrome/Edge)、Firefox 和 WebKit(Safari)。它提供了一套统一的 API，用于自动化浏览器操作，适用于：

端到端(E2E)测试
UI 自动化
网页截图 & PDF 生成
爬取动态渲染的网页
性能监控

详细介绍可参考此篇文章：点我跳转

MidScene.js是什么？

MidScene.js 是一款面向智能自动化的 AI 场景化编程框架，通过自然语言交互和机器学习能力，赋予传统自动化工具(如 Playwright)认知决策能力。它的核心定位是：

AI 增强型自动化：将大语言模型(LLM)与自动化脚本结合
低代码/无代码友好：支持自然语言描述任务场景
多模态交互：处理文本、图像、结构化数据等多种输入
企业级扩展：支持私有化部署和垂直领域微调

技术架构

让浏览器自己工作：AI自动化技术落地全攻略【AI助力全员提效方向】_API_03

网页或移动应用

网页自动化

与 Puppeteer集成
Puppeteer 是一个 Node.js 库，它通过 DevTools 协议或 WebDriver BiDi 提供控制 Chrome 或 Firefox 的高级 API。Puppeteer 默认在无界面模式(headless)下运行，但可以配置为在可见的浏览器模式(headed)中运行。

安装依赖

demo脚本

import puppeteer from "puppeteer";
import { PuppeteerAgent } from "@midscene/web/puppeteer";

const sleep = (ms: number) => new Promise((r) => setTimeout(r, ms));
Promise.resolve(
  (async () => {
    const browser = await puppeteer.launch({
      headless: false, // here we use headed mode to help debug
    });

    const page = await browser.newPage();
    await page.setViewport({
      width: 1280,
      height: 800,
      deviceScaleFactor: 1,
    });

    await page.goto("https://www.ebay.com");
    await sleep(5000);

    // 👀 初始化 Midscene agent 
    const agent = new PuppeteerAgent(page);

    // 👀 执行搜索
    // 注：尽管这是一个英文页面，你也可以用中文指令控制它
    await agent.aiAction('在搜索框输入 "Headphones" ，敲回车');
    await sleep(5000);

    // 👀 理解页面，提取数据
    const items = await agent.aiQuery(
      '{itemTitle: string, price: Number}[], 找到列表里的商品标题和价格',
    );
    console.log("耳机商品信息", items);

    // 👀 用 AI 断言
    await agent.aiAssert("界面左侧有类目筛选功能");

    await browser.close();
  })()
);

与Playwright集成

安装依赖

demo代码

import { chromium } from 'playwright';
import { PlaywrightAgent } from '@midscene/web/playwright';
import 'dotenv/config'; // read environment variables from .env file

const sleep = (ms) => new Promise((r) => setTimeout(r, ms));

Promise.resolve(
  (async () => {
    const browser = await chromium.launch({
      headless: true, // 'true' means we can't see the browser window
      args: ['--no-sandbox', '--disable-setuid-sandbox'],
    });

    const page = await browser.newPage();
    await page.setViewportSize({
      width: 1280,
      height: 768,
    });
    await page.goto('https://www.ebay.com');
    await sleep(5000); // 👀 init Midscene agent
    const agent = new PlaywrightAgent(page);

    // 👀 type keywords, perform a search
    await agent.aiAction('type "Headphones" in search box, hit Enter');

    // 👀 wait for the loading
    await agent.aiWaitFor('there is at least one headphone item on page');
    // or you may use a plain sleep:
    // await sleep(5000);

    // 👀 understand the page content, find the items
    const items = await agent.aiQuery(
      '{itemTitle: string, price: Number}[], find item in list and corresponding price',
    );
    console.log('headphones in stock', items);

    const isMoreThan1000 = await agent.aiBoolean(
      'Is the price of the headphones more than 1000?',
    );
    console.log('isMoreThan1000', isMoreThan1000);

    const price = await agent.aiNumber(
      'What is the price of the first headphone?',
    );
    console.log('price', price);

    const name = await agent.aiString(
      'What is the name of the first headphone?',
    );
    console.log('name', name);

    const location = await agent.aiLocate(
      'What is the location of the first headphone?',
    );
    console.log('location', location);

    // 👀 assert by AI
    await agent.aiAssert('There is a category filter on the left');

    // 👀 click on the first item
    await agent.aiTap('the first item in the list');

    await browser.close();
  })(),
);

Chrome 桥接模式(Bridge Mode)
使用 Midscene 的 Chrome 插件桥接模式(Bridge Mode)，你可以用本地脚本控制桌面版本的 Chrome。你的脚本可以连接到新标签页或当前已激活的标签页。

使用桌面版本的 Chrome 可以让你复用已有的 cookie、插件、页面状态等。你可以使用自动化脚本与操作者互动，来完成你的任务。

让浏览器自己工作：AI自动化技术落地全攻略【AI助力全员提效方向】_UI_04

安装依赖

demo脚本

import { AgentOverChromeBridge } from "@midscene/web/bridge-mode";

const sleep = (ms) => new Promise((r) => setTimeout(r, ms));
Promise.resolve(
  (async () => {
    const agent = new AgentOverChromeBridge();

    // 这个方法将连接到你的桌面 Chrome 的新标签页
    // 记得启动你的 Chrome 插件，并点击 'allow connection' 按钮。否则你会得到一个 timeout 错误
    await agent.connectNewTabWithUrl("https://www.bing.com");

    // 这些方法与普通 Midscene agent 相同
    await agent.ai('type "AI 101" and hit Enter');
    await sleep(3000);

    await agent.aiAssert("there are some search results");
    await agent.destroy();
  })()
);

启动 Chrome 插件

让浏览器自己工作：AI自动化技术落地全攻略【AI助力全员提效方向】_Chrome_05

运行脚本

Android 自动化

可以通过安装MCP工具，操作安卓端

关键工具

更快,通过设置缓存,可以大幅减少AI服务相关步骤的执行时间

MIDSCENE_CACHE=1
这是一个环境变量，设置为 1 表示启用 Midscene.js 的缓存功能。在测试运行时，Midscene.js 会尝试复用之前缓存的资源(如渲染结果、静态文件等)，从而加速测试执行。

playwright test
运行 Playwright 的测试脚本。

--config=playwright.config.ts
指定 Playwright 的配置文件路径(这里是 TypeScript 格式的配置文件)。

更标准，支持MCP

{
  "mcpServers": {
    "mcp-midscene": {
      "command": "npx",
      "args": ["-y", "@midscene/mcp"],
      "env": {
        "MIDSCENE_MODEL_NAME": "REPLACE_WITH_YOUR_MODEL_NAME",
        "OPENAI_API_KEY": "REPLACE_WITH_YOUR_OPENAI_API_KEY",
        "MCP_SERVER_REQUEST_TIMEOUT": "800000"
      }
    }
  }
}

API

交互方法
agent.aiAction() 或 .ai()  # UI 操作步骤
agent.aiTap()   #点击某个元素
agent.aiHover()  # 鼠标悬停某个元素上
agent.aiInput()  # 在某个元素中输入文本
agent.aiKeyboardPress()  # 按下键盘上的某个键、
agent.aiScroll()  # 滚动页面或某个元素
agent.aiRightClick() # 右键点击某个元素
数据提取
agent.aiAsk()  # 针对当前页面，直接向 AI 模型发起提问，并获得字符串形式的回答
agent.aiQuery() # 直接从 UI 提取结构化的数据
agent.aiBoolean() # 从 UI 中提取一个布尔值
agent.aiNumber()  # 从 UI 中提取一个数字
agent.aiString()  # 从 UI 中提取一个字符串
断言等
agent.aiAssert()  # 进行结果断言

更多参考官网

案例实操

已当前测试登录页面为例，进行了实操

让浏览器自己工作：AI自动化技术落地全攻略【AI助力全员提效方向】_UI_06

共建

欢迎有想法的伙伴们，咱们一起共建，让AI自动化助力你我

长沙城市开发者社区

惟楚有才，于斯为盛。欢迎来到长沙！！！茶颜悦色、臭豆腐、CSDN和你一个都不能少~

更多推荐

Maple Mono多语言支持：简繁中日字符集兼容

在当今全球化开发环境中，开发者经常需要处理包含简体中文、繁体中文、日文和英文的混合代码。传统等宽字体往往无法完美支持这种多语言场景，导致：- 中英文字符宽度比例失调，表格对齐困难- 标点符号显示不一致，影响代码可读性- 特殊符号和连字功能在多语言环境下失效- 终端图标与中文字符兼容性问题Maple Mono字体通过创新的技术方案，彻底解决了这些痛点，为多语言开发者提供了完美的字体...

长沙城市开发者社区

Graphite直方图分析：图形色彩分布的视觉化工具

还在为图像色彩分布不均衡而烦恼？想要精确掌握图像中的色彩构成却无从下手？Graphite的直方图分析功能为你提供了一套完整的色彩分布视觉化解决方案，让你能够深入理解图像的色彩特性并进行精准的色彩调整。## 什么是直方图分析？直方图（Histogram）是数字图像处理中用于表示像素值分布的重要工具。在Graphite中，直方图分析能够：- **可视化色彩分布**：直观展示RGB各通道的像...