多线程调试必杀技 - GDB的non-stop模式

多线程调试必杀技 - GDB的non-stop模式作者：破砂锅开源的GDB被广泛使用在Linux、OSX、Unix和各种嵌入式系统（例如手机），这次它又带给我们一个惊喜。多线程调试之痛调试器（如VS2008和老版GDB）往往只支持all-stop模式，调试多线程程序时，如果某个线程断在一个断点上，你的调试器会让整个程序freeze，直到你continue这个

zb872676223

4837人浏览 · 2014-07-17 14:23:10

zb872676223 · 2014-07-17 14:23:10 发布

多线程调试必杀技 - GDB的non-stop模式

作者：破砂锅

开源的GDB被广泛使用在Linux、OSX、Unix和各种嵌入式系统（例如手机），这次它又带给我们一个惊喜。

多线程调试之痛

调试器（如VS2008和老版GDB）往往只支持all-stop模式，调试多线程程序时，如果某个线程断在一个断点上，你的调试器会让整个程序freeze，直到你continue这个线程，程序中的其他线程才会继续运行。这个限制使得被调试的程序不能够像真实环境中那样运行--当某个线程断在一个断点上，让其他线程并行运行。

GDBv7.0引入的non-stop模式使得这个问题迎刃而解。在这个模式下，

当某个或多个线程断在一个断点上，其他线程仍会并行运行
你可以选择某个被断的线程，并让它继续运行

让我们想象一下，有了这个功能后

当其他线程断在断点上时，程序里的定时器线程可以正常的运行了，从而避免不必要得超时
当其他线程断在断点上时，程序里的watchdog线程可以正常的运行了，从而避免嵌入式硬件以为系统崩溃而重启
可以控制多个线程运行的顺序，从而重现deadlock场景了。由于GDB可以用python脚本驱动调试，理论上可以对程序在不同的线程运行顺序下进行自动化测试。

因此，non-stop模式理所当然成为多线程调试“必杀技”。这2009年下半年之后发布的Linux版本里都带有GDBv7.0之后的版本。很好奇，不知道VS2010里是不是也支持类似的调试模式了。

演示GDB的non-stop模式

让破砂锅用一个C++小程序在Ubuntu Linux 09.10下demo这个必杀技。虽然我的demo使用命令行版gdb，如果你喜欢图形化的调试器，Eclipse2009年5月之后的版本可以轻松的调用这个功能，详情参见Eclipse参见http://live.eclipse.org/node/723

1. 编译以下程序nonstop

    
   

     1  //  gdb non-stop mode demo
  2   //  build instruction: g++ -g -o nonstop nonstop.cpp -lboost_thread 
  3   
  4  #include  < iostream > 
  5  #include  < boost / thread / thread.hpp > 
  6  
  7   struct  op
  8  {
  9          op( int  id): m_id(id) {}
 10  
 11           void   operator ()()
 12          {
 13                  std::cout  <<  m_id  <<   "  begin "   <<  std::endl;
 14                  std::cout  <<  m_id  <<   "  end "   <<  std::endl;
 15          }
 16  
 17           int  m_id;
 18  };
 19  
 20   int  main( int  argc,  char   **  argv)
 21  {
 22          boost::thread t1(op( 1 )), t2(op( 2 )), t3(op( 3 ));
 23          t1.join(); t2.join(); t3.join();
 24           return   0 ;
 25  }
 26  
   

    
   

2. 把一下3行添加到~/.gdbinit来打开non-stop模式

    set  target-async  1 
 set  pagination  off 
 set  non-stop  on 

   

3. 启动gdb,设断点,运行.可以看到主线程1是running,3个子线程都断在断点上,而不是只有一个子线程断在断点上.

    
   

    ~/devroot/nonstop$ gdb ./nonstop
 GNU gdb (GDB)  7.0 -ubuntu
 Reading symbols from /home/frankwu/devroot/nonstop/nonstop...done.
 (gdb) break  14 
 Breakpoint  1  at 0x402058: file nonstop.cpp ,  line  14 .
 (gdb) break  24 
 Breakpoint  3  at 0x401805: file nonstop.cpp ,  line  24 .
 (gdb) run
 Starting program: /home/frankwu/devroot/nonstop/nonstop
 [ Thread debugging using libthread_db enabled ] 
 [ New Thread 0x7ffff6c89910 (LWP 2762) ] 
 [ New Thread 0x7ffff6488910 (LWP 2763) ] 
 1  begin
 Breakpoint  1 ,  op::operator() (this = 0x605118) at nonstop.cpp: 14 
 14                   std::cout << m_id <<  "  end "  << std::endl ;
 2  begin
 Breakpoint  1 ,  op::operator() (this = 0x605388) at nonstop.cpp: 14 
 14                   std::cout << m_id <<  "  end "  << std::endl ;
 [ New Thread 0x7ffff5c87910 (LWP 2764) ] 
 3  begin
 Breakpoint  1 ,  op::operator() (this = 0x605618) at nonstop.cpp: 14 
 14                   std::cout << m_id <<  "  end "  << std::endl ;
 (gdb) info threads

         4  Thread 0x7ffff5c87910 (LWP  2764 )  op::operator() (this = 0x605618) at nonstop.cpp: 14 

         3  Thread 0x7ffff6488910 (LWP  2763 )  op::operator() (this = 0x605388) at nonstop.cpp: 14 

         2  Thread 0x7ffff6c89910 (LWP  2762 )  op::operator() (this = 0x605118) at nonstop.cpp: 14 
 *  1  Thread 0x7ffff7fe3710 (LWP  2759 )  (running)
   

    
   

4. 让线程3继续运行,注意我顾意把主线程1也continue,这是我发现的workaround,否则gdb不能切回thread 1.

    
    (gdb) thread apply  3   1  continue

 Thread  3  (Thread 0x7ffff6488910 (LWP  2763 )):
 Continuing.

 Thread  1  (Thread 0x7ffff7fe3710 (LWP  2759 )):
 Continuing.
 Cannot execute this command while the selected thread is running.
 2  end
 [ Thread 0x7ffff6488910 (LWP 2763) exited ] 

 warning: Unknown thread  3 .

 Thread  1  (Thread 0x7ffff7fe3710 (LWP  2759 )):
 Continuing.
 Cannot execute this command while the selected thread is running.
 (gdb) info threads

         4  Thread 0x7ffff5c87910 (LWP  2764 )  op::operator() (this = 0x605618) at nonstop.cpp: 14 

         2  Thread 0x7ffff6c89910 (LWP  2762 )  op::operator() (this = 0x605118) at nonstop.cpp: 14 
 *  1  Thread 0x7ffff7fe3710 (LWP  2759 )  (running)

5. 让另外两个线程继续运行而结束,主线程断在第24行,最后结束.

    
    (gdb) thread apply  4   2   1  continue

 Thread  4  (Thread 0x7ffff5c87910 (LWP  2764 )):
 Continuing.

 Thread  2  (Thread 0x7ffff6c89910 (LWP  2762 )):
 Continuing.

 Thread  1  (Thread 0x7ffff7fe3710 (LWP  2759 )):
 Continuing.
 Cannot execute this command while the selected thread is running.
 3  end
 1  end
 [ Thread 0x7ffff5c87910 (LWP 2764) exited ] 
 [ Thread 0x7ffff6c89910 (LWP 2762) exited ] 

 Breakpoint  3 ,  main (argc = 1 ,  argv = 0x7fffffffe348) at nonstop.cpp: 24 
 24           return  0 ;
 
 (gdb) continue
 Thread  1  (Thread 0x7ffff7fe3710 (LWP  2759 )):
 Continuing.

 Program exited normally.

参考资料

Debugging with GDB

Reverse Debugging, Multi-Process and Non-Stop Debugging Come to the CDT

GDB调试信号、多线程、多进程

2011-07-05 14:35 1058人阅读评论(1) 收藏举报

GDB的功能很强大，本文主要介绍用GDB来调试信号、多进程、多线程，具体如下：

（一）信号

GDB有能力在你调试程序的时候处理任何一种信号，你可以告诉GDB需要处理哪一种信号。你可以要求GDB收到你所指定的信号时，马上停住正在运行的程序，以供你进行调试。你可以用GDB的handle命令来完成这一功能。

    handle <signal> <keywords...>
        在GDB中定义一个信号处理。信号<signal>可以以SIG开头或不以SIG开头，可以用定义一个要处理信号的范围（如：SIGIO- SIGKILL，表示处理从SIGIO信号到SIGKILL的信号，其中包括SIGIO， SIGIOT，SIGKILL三个信号），也可以使用关键字 all来标明要处理所有的信号。一旦被调试的程序接收到信号，运行程序马上会被GDB停住，以供调试。其<keywords>可以是以下几种关键字的一个或多个。

        nostop
            当被调试的程序收到信号时，GDB不会停住程序的运行，但会打出消息告诉你收到这种信号。
        stop
            当被调试的程序收到信号时，GDB会停住你的程序。
        print
            当被调试的程序收到信号时，GDB会显示出一条信息。
        noprint
            当被调试的程序收到信号时，GDB不会告诉你收到信号的信息。
        pass
        noignore
            当被调试的程序收到信号时，GDB不处理信号。这表示，GDB会把这个信号交给被调试程序会处理。
        nopass
        ignore
            当被调试的程序收到信号时，GDB不会让被调试程序来处理这个信号。

    info signals
    info handle
    查看有哪些信号在被GDB检测中。

（二）线程

如果你程序是多线程的话，你可以定义你的断点是否在所有的线程上，或是在某个特定的线程。GDB很容易帮你完成这一工作。

    break <linespec> thread <threadno>
    break <linespec> thread <threadno> if ...
        linespec指定了断点设置在的源程序的行号。threadno指定了线程的ID，注意，这个ID是GDB分配的，你可以通过“info threads”命令来查看正在运行程序中的线程信息。如果你不指定
thread <threadno>则表示你的断点设在所有线程上面。你还可以为某线程指定断点条件。如：

        (gdb) break frik.c:13 thread 28 if bartab > lim

    当你的程序被GDB停住时，所有的运行线程都会被停住。这方便你你查看运行程序的总体情况。而在你恢复程序运行时，所有的线程也会被恢复运行。那怕是主进程在被单步调试时。

线程有自己的寄存器，运行时堆栈或许还会有私有内存。
gdb提供了以下供调试多线程的进程的功能：
* 自动通告新线程。
* \ "thread THREADNO\ "，一个用来在线程之间切换的命令。
* \ "info threads\ "，一个用来查询现存线程的命令。
* \ "thread apply [THREADNO] [ALL] ARGS\ ",一个用来向线程提供命令的命令。
* 线程有关的断点设置。
注意：这些特性不是在所有gdb版本都能使用，归根结底要看操作系统是否支持。
如果你的gdb不支持这些命令，会显示出错信息：
(gdb) info threads
(gdb) thread 1
Thread ID 1 not known. Use the \ "info threads\ " command to
see the IDs of currently known threads.
gdb的线程级调试功能允许你观察你程序运行中所有的线程，但无论什么时候
gdb控制，总有一个“当前”线程。调试命令对“当前”进程起作用。
一旦gdb发现了你程序中的一个新的线程，它会自动显示有关此线程的系统信
息。比如：
[New process 35 thread 27]
不过格式和操作系统有关。
为了调试的目的，gdb自己设置线程号。
`info threads\ "
显示进程中所有的线程的概要信息。gdb按顺序显示：
1.线程号(gdb设置)
2.目标系统的线程标识。
3.此线程的当前堆栈。
一前面打\ "*\ "的线程表示是当前线程。
例如：
(gdb) info threads
3 process 35 thread 27 0x34e5 in sigpause ()
2 process 35 thread 23 0x34e5 in sigpause ()
* 1 process 35 thread 13 main (argc=1, argv=0x7ffffff8)
at threadtest.c:68
`thread THREADNO\ "
把线程号为THREADNO的线程设为当前线程。命令行参数THREADNO是gdb内定的
线程号。你可以用\ "info threads\ "命令来查看gdb内设置的线程号。gdb显示该线程
的系统定义的标识号和线程对应的堆栈。比如：

(gdb) thread 2
[Switching to process 35 thread 23]
0x34e5 in sigpause ()
\ "Switching后的内容取决于你的操作系统对线程标识的定义。

`thread apply [THREADNO] [ALL] ARGS\ "
此命令让你对一个以上的线程发出相同的命令\ "ARGS\ ",[THREADNO]的含义同上。
如果你要向你进程中的所有的线程发出命令使用[ALL]选项。
无论gdb何时中断了你的程序(因为一个断点或是一个信号)，它自动选择信号或
断点发生的线程为当前线程。gdb将用一个格式为\ "[Switching to SYSTAG]\ "的消息
来向你报告。

（三）进程

1. 用的是attach子进程的方法
attach到正在运行的进程的功能，即attach <pid>命令。因此我们可以利用该命令attach到子进程然后进行调试。

一般的步骤是 1.首先要在要调试的子进程初始代码中，加入一段特殊代码，使子进程睡眠等待，然后运行待调试的程序
2.用ps -ef |grep 查看产生的子进程pid
3. 然后gdb启动，attach <pid>到进程后在该代码段后设上断点，就可以调试了

eg.

pid = fork();
if (pid <0) {
printf("fork err\n");
exit(-1);
} else if (pid == 0) {
/* in child */
sleep(60); ------------------ (!)

int value = 10;

**********
} /**in parent****/

执行程序（比如说是 examp）
examp & --- 让examp后台运行吧。(不后台运行也可以)；

查找进程id:
ps -ef

运行gdb:
gdb
(gdb) attach xxxxx --- xxxxx为利用ps命令获得的子进程 id
(gdb) stop --- 这点很重要，你需要先暂停那个子进程，然后设置一些断点和一些Watch
(gdb) break xx
Breakpoint 1 at 0x10808: file eg1.c, line 37.
(gdb) c
Continuing.

2. 使用follow-fork-mode 的方法调试多进程
最好使用GDB 6.6或以上版本

使用follow-fork-mode 的方法调试多进程时的一般步骤是：
1.启动gdb
2设置set follow-fork-mode [child/parent] 想调试child就使用child参数，如调试parent则使用parent
3启动程序文件 file ./hello ----------当然路径名自己要搞对哦
4.break num -----------如调试子进程则num是子进程代码的行数,
还有个小技巧就是break fork也就是在fork函数处设置断点,