基于LLM的智能GDB分析工具方案设计

本文介绍了一种基于LLM的智能GDB调试工具方案，通过Python脚本作为中间层实现GDB与LLM的自动化交互。该工具首先通过Python控制GDB获取核心转储文件的分析结果(包括调用栈、线程信息和寄存器状态)，然后构造提示词将这些信息发送给LLM进行分析。文中详细展示了Python控制GDB的实现方法、自定义GDB命令的注册过程，以及如何将分析结果对接LLM API进行智能诊断。该方案有效解决了

加油2019

743人浏览 · 2025-09-27 21:50:42

加油2019 · 2025-09-27 21:50:42 发布

文章目录

背景
方案设计

背景

众所周知LLM对于开源库运行问题分析比较有用，比如Kernel panic、coredump的异常调用栈信息分析比较有用。但是GDB分析仍需要手动将调用栈和对应代码喂给LLM，进行多轮交互完成分析。本文章介绍使用python脚本接入LLM，以及操作GDB进行自动分析coredump的智能GDB工具。

方案设计

当前LLM还是仅限于文本解析和交互，无法直接分析coredump等二进制文件。所以智能GDB分析工具采用LLM+python+GDB的软件框架，使用GDB工具解析coredump，将输出内容喂给LLM，并将LLM给出的调试命令输入到GDB完成多轮交互，python软件实现LLM和GDB之间的粘合。交互过程如下：

1. python 控制gdb

整体思路是使用python调用gdb工具，使用subprocess函数调用gdb工具，并且进行交互，gdb_control.py代码如下

import subprocess

# 启动GDB进程，连接到你的程序
gdb_process = subprocess.Popen(
    ['gdb', 'test_core', '/tmp/core_dump.core.7037'],
    stdin=subprocess.PIPE,
    stdout=subprocess.PIPE,
    stderr=subprocess.PIPE,
    text=True
)

# 向GDB发送命令并获取响应的函数
def send_gdb_command(command):
    gdb_process.stdin.write(command + "\n")
    gdb_process.stdin.flush()
    output = []
    while True:
        line = gdb_process.stdout.readline()
        if line.startswith('(gdb)'):
            break
        output.append(line.strip())
    return "\n".join(output)


backtrace_output = send_gdb_command("backtrace")
print(backtrace_output)

执行后获取到函数调用栈信息：
在这里插入图片描述

2. 注册自定义gdb命令

为了方便一次性告诉LLM更多的信息，比如调用栈、线程、寄存器等信息，可以在gdb工具中自定义命令，封装多个命令。可以在gdb中加载python脚本加载自定义命令，自定义命令hang_analyze，llm_gdb.py脚本代码如下：

import gdb

class HangAnalyzer(gdb.Command):
    def __init__(self):
        super(HangAnalyzer, self).__init__("hang_analyze", gdb.COMMAND_USER)

    def invoke(self, arg, from_tty):
        threads_output = gdb.execute("info threads", to_string=True)
        backtrace_output = gdb.execute("bt full", to_string=True)
        info_regs_output = gdb.execute("info reg", to_string=True)

        print(f"Threads info:\n{threads_output}")
        print(f"Backtrace:\n{backtrace_output}")
        print(f"Regs info:\n{info_regs_output}")

HangAnalyzer()

执行

gdb test_core /tmp/core_dump.core.7037

在这里插入图片描述

然后执行hang_analyze命令，可能会报如下错误：

Python Exception <class 'UnicodeDecodeError'>: 'utf-8' codec can't decode byte 0xf5 in position 3079: invalid start byte

Error occurred in Python: 'utf-8' codec can't decode byte 0xf5 in position 3079: invalid start byte

解决方案：在执行gdb前source如下环境变量，默认python使用utf-8编码

export LC_ALL=en_US.UTF-8
export PYTHONIOENCODING=UTF-8

可以获取到正确的输出：
在这里插入图片描述

core调试注意事项：

1）编译加-g选项

2）开启coredump

ulimit -c unlimited
echo /tmp/core_dump.core > /proc/sys/kernel/core_pattern

3）注意gdb 12才支持python脚本，可以使用如下命令测试：

gdb -ex "python print('Python scripting enabled!')" -ex quit

参考链接：

如何用python实现GDB交互式调试程序的功能

3. 对接LLM

上述已经可以完成python脚本与gdb正确交互，接下来讲gdb分析内容交给LLM分析，对接LLM提示词和API，完成自动化分析和处理。

LLM 本地脚本部署可以参考：LLM API使用教程：NVIDIA免费API KEY

提示词设计如下，简易版，持续迭代。

作为一个C语言程序员，当前程序遇到segment fault，使用gdb分析的结果如下，请分析下可能是什么原因？

由于python脚本调用gdb时没法加载python脚本，最终方案还是拆成3个命令分别输入。

脚本内容

import subprocess
import requests, base64

def llm_ask(prompt):
    invoke_url = "https://integrate.api.nvidia.com/v1/chat/completions"
    stream = False
    headers = {
	# 替换为你的API KEY
    "Authorization": "{API_KEY}",
    "Accept": "text/event-stream" if stream else "application/json"
    }

    payload = {
    "model": "meta/llama-4-maverick-17b-128e-instruct",
    "messages": [{"role":"user","content":prompt}],
    "max_tokens": 512,
    "temperature": 1.00,
    "top_p": 1.00,
    "frequency_penalty": 0.00,
    "presence_penalty": 0.00,
    "stream": stream
    }

    response = requests.post(invoke_url, headers=headers, json=payload)

    if stream:
        for line in response.iter_lines():
            if line:
                print(line.decode("utf-8"))
    else:
        print(response.json())

# 启动GDB进程，连接到你的程序
gdb_process = subprocess.Popen(
	# 替换为coredump文件
    ['gdb', 'test_core', '/tmp/core_dump.core.7037'],
    stdin=subprocess.PIPE,
    stdout=subprocess.PIPE,
    stderr=subprocess.PIPE,
    text=True
)

# 向GDB发送命令并获取响应的函数
def send_gdb_command(command):
    gdb_process.stdin.write(command + "\n")
    gdb_process.stdin.flush()
    output = []
    while True:
        line = gdb_process.stdout.readline()
        if line.startswith('(gdb)'):
            break
        output.append(line.strip())
    return "\n".join(output)

backtrace_output = send_gdb_command("bt full")
#print(backtrace_output)
threads_output = send_gdb_command("info threads")
info_regs_output = send_gdb_command("info reg")
prompt=f"作为一个C语言程序员，当前程序遇到segment fault，使用gdb分析的coredump结果如下，请分析下可能是什么原因？\nbacktrace:{backtrace_output}\n threads:{threads_output}\n regs:{info_regs_output}"

llm_ask(prompt)

llm输出结果：

{'id': 'chatcmpl-a57221de0de24181999b72ffecdf94e4', 'object': 'chat.completion', 'created': 1758979705, 'model': 'meta/llama-4-maverick-17b-128e-instruct', 'choices': [{'index': 0, 'message': {'role': 'assistant', 'content': '根据gdb的backtrace输出，程序在`__strlen_sse2`函数处发生了段错误（Segmentation fault）。让我们一步步分析：\n\n1. **`__strlen_sse2`函数**: 这是一个计算字符串长度的函数，由glibc库提供。它使用SSE2指令集来优化字符串操作。这个函数被调用时，传入了一个无效的指针，导致了段错误。\n\n2. **调用栈**: 查看调用栈，我们看到：\n   - `__strlen_sse2` 是被 `__vfprintf_internal` 调用。\n   - `__vfprintf_internal` 是 `__printf` 调用的。\n   - `__printf` 是由 `test_core.c` 的第5行调用的（在函数 `test` 中）。\n   - `test` 是由 `main` 函数调用的（在 `test_core.c` 的第12行）。\n\n3. **关键信息**: \n   - 在 `#3 0x00006039b5b92178 in test (p=0x0) at test_core.c:5` 这里，我们看到 `test` 函数被调用时，`p` 的值是 `0x0`，即 `NULL`。\n   - `test_core.c` 的第5行代码涉及到对 `printf` 的调用，而 `printf` 里使用了 `%s` 来打印字符串。\n\n由此，我们可以推断，程序在 `test` 函数中试图打印一个 `NULL` 字符串指针，导致 `printf` 内部调用 `strlen` 时出错，因为 `strlen` 不能接受 `NULL` 作为参数。\n\n可能的修复方法是，在 `test` 函数中检查指针是否为 `NULL`，如果是，则避免打印它，或者打印一个默认值，如 "(null)"。\n\n例如，如果 `test_core.c` 的相关代码类似于：\n```c\nvoid test(char *p) {\n    printf("str:%s\\n", p); // 第5行\n}\n\nint main() {\n    char *p = NULL;\n    test(p); // 第12行\n    return 0;\n}\n```\n你可以通过在 `test` 函数中添加一个简单的检查来修复这个问题：\n```c\nvoid test(char *p) {\n    if (p == NULL) {\n        printf("str:(null)\\n");\n    } else {\n        printf("str:%s\\n", p);\n    }\n}\n```\n或者，更优雅的方式是使用 `%s` 的一种特殊用法，它可以在参数为 `NULL` 时打印 "(null)"，尽管这不是标准', 'refusal': None, 'annotations': None, 'audio': None, 'function_call': None, 'tool_calls': [], 'reasoning_content': None}, 'logprobs': None, 'finish_reason': 'length', 'stop_reason': None}], 'service_tier': None, 'system_fingerprint': None, 'usage': {'prompt_tokens': 2000, 'total_tokens': 2512, 'completion_tokens': 512, 'prompt_tokens_details': None}, 'prompt_logprobs': None, 'kv_transfer_params': None}

广州城市开发者社区

欢迎加入我们的广州开发者社区，与优秀的开发者共同成长！

更多推荐

NLP：关于Bert模型的基础讲解

广州城市开发者社区

JSP在线租房70974（程序+源码+数据库+调试部署+开发环境）

广州城市开发者社区

PPT auto Crorrector

PPT-auto-Corrector 是一个基于 AI 的自动化工具，能够将拍摄的 PPT 照片智能矫正为标准幻灯片图像。项目结合 Segment Anything Model (SAM) 实现 PPT 区域自动识别，利用 OpenCV 进行透视变换与去畸变处理，输出统一尺寸（如 1920×1080）的高清图像，并支持批量处理与 PDF 合并。适用于讲座记录、会议归档、教学整理等场景，让模糊、倾斜