我用 AI Agent 做了一套自动化测试系统：从测试用例生成到 Bug 定位，效率提升 10 倍

haoxinpoju

395人浏览 · 2026-06-23 14:10:00

haoxinpoju · 2026-06-23 14:10:00 发布

我用 AI Agent 做了一套自动化测试系统：从测试用例生成到 Bug 定位，效率提升 10 倍

我是一个独立开发者，项目大了之后测试成了最头疼的事：写用例慢、覆盖不全、回归测试每次都要花好几个小时。后来我试着用 AI Agent 接管测试流程，从生成测试用例、自动跑测试、到定位 Bug，整套流程搭起来之后，测试时间从 3 小时降到了不到 20 分钟。这篇记录完整实现过程，代码拿去就能用。

为什么传统测试这么低效

传统测试流程有三个痛点：

痛点	具体表现	耗时比例
用例编写	手写 assert，覆盖率靠经验	约 40%
回归测试	每次改完都要全跑一遍	约 35%
Bug 定位	看报错信息、找上下文	约 25%

三个环节，AI 都能接手。下面拆开来讲。

系统架构：4 个模块

代码输入
    │
    ▼
┌─────────────────┐
│  模块 1         │  ← 解析代码结构，提取函数签名/类/边界值
│  代码分析器     │
└────────┬────────┘
         │
         ▼
┌─────────────────┐
│  模块 2         │  ← 调用 LLM 生成测试用例（含边界值、异常路径）
│  用例生成器     │
└────────┬────────┘
         │
         ▼
┌─────────────────┐
│  模块 3         │  ← pytest 执行 + 覆盖率统计
│  测试执行器     │
└────────┬────────┘
         │
         ▼
┌─────────────────┐
│  模块 4         │  ← 解析失败报告，AI 定位根因
│  Bug 定位器     │
└─────────────────┘

模块 1：代码分析器

解析 Python 文件，提取函数签名、参数类型和潜在边界条件。

import ast
import inspect
from typing import Any, Dict, List

class CodeAnalyzer:
    """解析代码结构，为用例生成提供输入"""
    
    def analyze_file(self, file_path: str) -> List[Dict]:
        """分析 Python 文件，提取所有函数信息"""
        with open(file_path, 'r', encoding='utf-8') as f:
            source = f.read()
        
        tree = ast.parse(source)
        functions = []
        
        for node in ast.walk(tree):
            if isinstance(node, (ast.FunctionDef, ast.AsyncFunctionDef)):
                func_info = self._extract_function_info(node, source)
                functions.append(func_info)
        
        return functions
    
    def _extract_function_info(self, node: ast.FunctionDef, source: str) -> Dict:
        """提取函数的详细信息"""
        # 获取参数列表
        args = []
        for arg in node.args.args:
            arg_info = {'name': arg.arg}
            if arg.annotation:
                arg_info['type'] = ast.unparse(arg.annotation)
            args.append(arg_info)
        
        # 获取返回类型
        return_type = None
        if node.returns:
            return_type = ast.unparse(node.returns)
        
        # 提取函数体（用于 AI 分析）
        start_line = node.lineno - 1
        end_line = node.end_lineno
        func_body = '\n'.join(source.split('\n')[start_line:end_line])
        
        # 识别边界条件关键字
        boundary_keywords = ['if', 'elif', 'raise', 'assert', 'None', '0', 'len']
        has_boundaries = any(kw in func_body for kw in boundary_keywords)
        
        return {
            'name': node.name,
            'args': args,
            'return_type': return_type,
            'source': func_body,
            'has_boundaries': has_boundaries,
            'line': node.lineno
        }


# 使用示例
analyzer = CodeAnalyzer()
functions = analyzer.analyze_file('my_module.py')
for f in functions:
    print(f"函数: {f['name']}, 参数: {f['args']}, 有边界条件: {f['has_boundaries']}")

模块 2：AI 用例生成器

把函数信息发给 LLM，让它生成覆盖正常/边界/异常的测试用例。

from openai import OpenAI
import json

class TestCaseGenerator:
    """调用 AI 生成测试用例"""
    
    def __init__(self, api_key: str, model: str = "gpt-4o-mini"):
        self.client = OpenAI(api_key=api_key)
        self.model = model
    
    def generate_tests(self, func_info: Dict) -> str:
        """为单个函数生成测试用例"""
        
        prompt = f"""你是一个专业的 Python 测试工程师。请为以下函数生成完整的 pytest 测试用例。

函数信息：
- 函数名：{func_info['name']}
- 参数：{json.dumps(func_info['args'], ensure_ascii=False)}
- 返回类型：{func_info.get('return_type', '未知')}
- 源代码：
{func_info['source']}

要求：
1. 生成 4-6 个测试用例，覆盖正常路径、边界值、异常情况
2. 测试函数命名格式：test_函数名_场景描述
3. 每个测试加注释说明测试意图
4. 对于可能抛出异常的情况，用 pytest.raises 捕获
5. 只输出 Python 代码，不要解释

输出格式：
```python
import pytest
# 测试代码
```"""
        
        response = self.client.chat.completions.create(
            model=self.model,
            messages=[{"role": "user", "content": prompt}],
            temperature=0.3  # 测试代码需要确定性，温度低一点
        )
        
        # 提取代码块
        content = response.choices[0].message.content
        if "```python" in content:
            content = content.split("```python")[1].split("```")[0].strip()
        
        return content
    
    def generate_for_file(self, functions: List[Dict], output_path: str):
        """为整个文件生成测试文件"""
        all_tests = ["import pytest\n\n"]
        
        for func in functions:
            # 跳过私有方法和魔法方法
            if func['name'].startswith('_'):
                continue
            
            print(f"  生成 {func['name']} 的测试用例...")
            test_code = self.generate_tests(func)
            
            # 去掉重复的 import
            test_code = test_code.replace("import pytest\n", "")
            all_tests.append(f"# === 测试 {func['name']} ===\n")
            all_tests.append(test_code + "\n\n")
        
        with open(output_path, 'w', encoding='utf-8') as f:
            f.write('\n'.join(all_tests))
        
        print(f"  测试文件已生成：{output_path}")


# 使用示例
generator = TestCaseGenerator(api_key="your-api-key")
generator.generate_for_file(functions, output_path="test_my_module.py")

实测效果：一个 200 行的 Python 文件，包含 8 个函数，AI 生成测试文件约需 30 秒，覆盖率从手写的 60% 提升到约 85%。

模块 3：测试执行器

自动跑 pytest 并收集结构化结果。

import subprocess
import json
import re
from pathlib import Path

class TestRunner:
    """执行测试并收集结果"""
    
    def run(self, test_file: str, source_file: str) -> Dict:
        """运行测试，返回结构化结果"""
        
        cmd = [
            "python", "-m", "pytest",
            test_file,
            f"--cov={Path(source_file).stem}",  # 覆盖率统计
            "--cov-report=json:coverage.json",
            "--json-report",                      # JSON 格式报告
            "--json-report-file=test_report.json",
            "-v",
            "--tb=short"                          # 简短错误信息
        ]
        
        result = subprocess.run(
            cmd,
            capture_output=True,
            text=True,
            encoding='utf-8'
        )
        
        # 解析测试结果
        test_result = self._parse_results(result, test_file)
        return test_result
    
    def _parse_results(self, result: subprocess.CompletedResult, test_file: str) -> Dict:
        """解析 pytest 输出"""
        output = result.stdout + result.stderr
        
        # 统计通过/失败数量
        passed = len(re.findall(r'PASSED', output))
        failed = len(re.findall(r'FAILED', output))
        errors = len(re.findall(r'ERROR', output))
        
        # 提取失败用例详情
        failed_cases = []
        # 匹配 FAILED test_xxx - AssertionError: xxx 格式
        fail_pattern = r'FAILED (.+?) - (.+?)(?:\n|$)'
        for match in re.finditer(fail_pattern, output):
            failed_cases.append({
                'test': match.group(1).strip(),
                'error': match.group(2).strip()
            })
        
        # 读取覆盖率数据
        coverage_pct = 0
        try:
            with open('coverage.json', 'r') as f:
                cov_data = json.load(f)
            total = cov_data.get('totals', {})
            coverage_pct = total.get('percent_covered', 0)
        except FileNotFoundError:
            pass
        
        return {
            'passed': passed,
            'failed': failed,
            'errors': errors,
            'coverage': round(coverage_pct, 1),
            'failed_cases': failed_cases,
            'raw_output': output
        }


# 使用示例
runner = TestRunner()
result = runner.run("test_my_module.py", "my_module.py")
print(f"通过: {result['passed']} | 失败: {result['failed']} | 覆盖率: {result['coverage']}%")

模块 4：AI Bug 定位器

把失败的测试信息发给 AI，让它找出根因并给出修复建议。

class BugLocator:
    """AI 驱动的 Bug 定位"""
    
    def __init__(self, api_key: str):
        self.client = OpenAI(api_key=api_key)
    
    def locate(self, failed_case: Dict, source_code: str) -> Dict:
        """定位单个失败用例的 Bug"""
        
        prompt = f"""你是一个经验丰富的 Python 调试专家。请分析以下测试失败信息，找出 Bug 所在并给出修复建议。

【失败的测试用例】
{failed_case['test']}

【错误信息】
{failed_case['error']}

【源代码】
{source_code}

请按以下格式输出（JSON）：
{{
  "root_cause": "根因描述（一句话）",
  "location": "出错的代码行或函数名",
  "fix": "修复建议（具体代码或步骤）",
  "severity": "high/medium/low"
}}"""
        
        response = self.client.chat.completions.create(
            model="gpt-4o-mini",
            messages=[{"role": "user", "content": prompt}],
            temperature=0.1,
            response_format={"type": "json_object"}
        )
        
        analysis = json.loads(response.choices[0].message.content)
        analysis['test'] = failed_case['test']
        return analysis
    
    def locate_all(self, test_result: Dict, source_code: str) -> List[Dict]:
        """定位所有失败用例"""
        analyses = []
        
        for failed_case in test_result['failed_cases']:
            print(f"  分析失败用例: {failed_case['test'][:50]}...")
            analysis = self.locate(failed_case, source_code)
            analyses.append(analysis)
        
        return analyses


# 输出示例：
# {
#   "root_cause": "函数未处理输入为空字符串的情况",
#   "location": "process_text() 第 12 行",
#   "fix": "在函数开头添加 if not text: return None",
#   "severity": "medium"
# }

整合：一键运行完整流程

把 4 个模块串起来，一个命令搞定全流程。

import os
from pathlib import Path

class AITestingPipeline:
    """AI 自动化测试流水线"""
    
    def __init__(self, api_key: str):
        self.analyzer = CodeAnalyzer()
        self.generator = TestCaseGenerator(api_key)
        self.runner = TestRunner()
        self.locator = BugLocator(api_key)
    
    def run(self, source_file: str) -> Dict:
        """运行完整测试流水线"""
        
        print(f"\n{'='*50}")
        print(f"AI 自动化测试：{source_file}")
        print(f"{'='*50}\n")
        
        # Step 1: 分析代码
        print("📊 Step 1: 分析代码结构...")
        functions = self.analyzer.analyze_file(source_file)
        print(f"  发现 {len(functions)} 个函数")
        
        # Step 2: 生成测试用例
        test_file = source_file.replace('.py', '_test_ai.py')
        print(f"\n🤖 Step 2: AI 生成测试用例...")
        self.generator.generate_for_file(functions, test_file)
        
        # Step 3: 执行测试
        print(f"\n🧪 Step 3: 执行测试...")
        test_result = self.runner.run(test_file, source_file)
        print(f"  通过: {test_result['passed']} | 失败: {test_result['failed']} | 覆盖率: {test_result['coverage']}%")
        
        # Step 4: 定位 Bug（如果有失败）
        bug_reports = []
        if test_result['failed'] > 0:
            print(f"\n🔍 Step 4: AI 定位 {test_result['failed']} 个失败原因...")
            with open(source_file, 'r', encoding='utf-8') as f:
                source_code = f.read()
            bug_reports = self.locator.locate_all(test_result, source_code)
            
            for report in bug_reports:
                severity_icon = {"high": "🔴", "medium": "🟡", "low": "🟢"}.get(report['severity'], "⚪")
                print(f"\n  {severity_icon} {report['test'][:40]}...")
                print(f"     根因: {report['root_cause']}")
                print(f"     修复: {report['fix']}")
        else:
            print("\n✅ 所有测试通过！")
        
        return {
            'test_result': test_result,
            'bug_reports': bug_reports,
            'test_file': test_file
        }


# 运行
pipeline = AITestingPipeline(api_key=os.getenv("OPENAI_API_KEY"))
report = pipeline.run("my_module.py")

实战效果数据

我用这套系统测了自己写的 4 个模块，数据如下：

模块	代码行数	手写测试耗时	AI 测试耗时	覆盖率（手写）	覆盖率（AI）
数据解析器	180 行	45 分钟	4 分钟	62%	81%
API 封装	220 行	60 分钟	5 分钟	71%	88%
文件处理	150 行	35 分钟	3 分钟	58%	79%
配置管理	90 行	20 分钟	2 分钟	75%	92%

总结：平均节省 90% 的测试编写时间，覆盖率平均提升 20 个百分点。

踩坑记录

坑 1：AI 生成的测试用例 import 路径不对

症状：生成的测试文件里 from my_module import xxx 报 ModuleNotFoundError。

原因：AI 不知道你的项目目录结构。

解决：在 prompt 里加一行：「测试文件和源文件在同一目录，使用相对导入」，或者在生成后自动替换 import 路径。

# 修复：生成后自动检查并修正 import
test_code = test_code.replace(
    "from src.my_module import",
    "from my_module import"
)

坑 2：温度值设太高，生成的测试每次都不一样

症状：同一个函数，跑两次生成的测试名称和断言值都不同，无法做版本比对。

原因：temperature 设了 0.7，生成结果随机性太大。

解决：测试生成固定用 temperature=0.3，Bug 分析用 temperature=0.1。

坑 3：pytest-json-report 插件没装，报 UnknownMarkWarning

症状：跑测试时报错 pytest: error: unrecognized arguments: --json-report。

解决：

pip install pytest-json-report pytest-cov

坑 4：覆盖率统计报 0%

症状：coverage.json 里 percent_covered 始终是 0。

原因：--cov 参数指定的模块名不对，多了 .py 后缀。

解决：用 Path(source_file).stem 去掉后缀，确保只传模块名（如 my_module 而不是 my_module.py）。

坑 5：复杂函数（含外部依赖）生成的测试用例无法运行

症状：测试函数里直接调用了真实的数据库/API，根本没法跑。

原因：AI 没意识到需要 Mock 外部依赖。

解决：在 prompt 里加：「对于数据库操作/网络请求/文件 IO，使用 unittest.mock.patch 进行 Mock」。

# prompt 补充说明
prompt += "\n注意：如果函数依赖外部服务（数据库/API/文件），必须使用 unittest.mock.patch 进行 Mock。"

总结

3 条核心经验：

AI 生成测试的价值不是「替代」手写，而是「第一遍快速覆盖」。AI 能在 3 分钟内把主要路径跑通，之后你只需要针对业务逻辑补充 2-3 个特殊用例。
Bug 定位的 AI 要用低温度。测试生成可以稍微有随机性，但 Bug 分析必须精准，temperature=0.1 是经验值。
覆盖率提升了，但不代表 Bug 变少了。AI 生成的测试主要覆盖「代码路径」，不一定覆盖「业务边界」。两个指标要一起看。

你在测试方面有什么好方法？欢迎评论区交流。

龙虾开发者社区

小龙虾开发者社区是 CSDN 旗下专注 OpenClaw 生态的官方阵地，聚焦技能开发、插件实践与部署教程，为开发者提供可直接落地的方案、工具与交流平台，助力高效构建与落地 AI 应用

更多推荐

cover

Agent方向面试，我面了30多场之后的真心话

龙虾开发者社区

cover

程序员&安全小白必收藏！195个技能点拆解渗透测试全流程的开源技能库

龙虾开发者社区

cover

AI Agent工程：从Prompt到Loop，构建自主智能体的完整指南

龙虾开发者社区

所有评论(0)

查看更多评论

haoxinpoju

已为社区贡献7条内容