前段时间在读xfce的Thunar(file manager)的源码时碰到了G_LIKELY和G_UNLIKELY的调用,虽然大概知道什么意思,跟linux内核里用的likely和unlikely应该是一样的,但是还是想在这里总结一下。

从glib的源码里面(glib/gmacros.h)可以找到G_LIKELY和G_UNLIKELY的定义:

#define _G_BOOLEAN_EXPR(expr)                   \
 G_GNUC_EXTENSION ({                            \
   int _g_boolean_var_;                         \
   if (expr)                                    \
      _g_boolean_var_ = 1;                      \
   else                                         \
      _g_boolean_var_ = 0;                      \
   _g_boolean_var_;                             \
})
#define G_LIKELY(expr) (__builtin_expect (_G_BOOLEAN_EXPR(expr), 1))
#define G_UNLIKELY(expr) (__builtin_expect (_G_BOOLEAN_EXPR(expr), 0))
它的实现靠的就是gcc的内建函数__builtin_expect,为了进一步了解该内建函数的功能,可以从gcc手册里面找到它的描述:

long __builtin_expect (long exp, long c) [Built-in Function]
You may use __builtin_expect to provide the compiler with branch prediction
information. In general, you should prefer to use actual profile feedback for this
(‘-fprofile-arcs’), as programmers are notoriously bad at predicting how their
programs actually perform. However, there are applications in which this data is
hard to collect.
The return value is the value of exp, which should be an integral expression. The
semantics of the built-in are that it is expected that exp == c. For example:
if (__builtin_expect (x, 0))
  foo ();
would indicate that we do not expect to call foo, since we expect x to be zero. Since
you are limited to integral expressions for exp, you should use constructions such as
if (__builtin_expect (ptr != NULL, 1))
  foo (*ptr);
when testing pointer or floating-point values.

通过上面这段描述,可以知道__builtin_expect可以用来告诉编译器做分支预判,其实这个分支预判就是提前告诉编译器让它在大概率事件或小概率事件发生时能做一定的优化,把大概率事件或小概率事件发生时程序的分支能够尽量短,即减少了这个分支的运行机器周期。

从上面的描述也可知道,__builtin_expect的返回值还是表达式x的值,所以if (__builtin_expect (x, 0)) 或 if (__builtin_expect (x, 1))的结果跟if (x)是一样的。

if (__builtin_expect (x, 0))的目的只是告诉编译器我们期望表达式x的值为0,这样其实是告诉编译器我们不期望if 紧接着的语句被执行,而是期望else后面的语句被执行,这样编译器就会帮我们把else后面的语句优化到if语句的后面。同理if (__builtin_expect (x, 1))的目的只是告诉编译器我们期望表达式x的值为1,即期望if紧接着的语句被执行。

总结一下就是__builtin_expect的功能是让编译器帮我们把我们期望被执行的语句优化到紧接着分支的后面。

下面来看个例子:

  1 #include <stdio.h>
  2 
  3 int main (int argc, char **argv)
  4 {
  5         int a = argc;
  6 
  7         if (__builtin_expect (a, 0))
  8         {
  9                 a = 0x5a;
 10         }
 11         else
 12         {
 13                 a = 0xaa;
 14         }
 15 
 16         printf ("a = 0x%x\n", a);
 17 
 18         return 0;
 19 }
gcc -fprofile-arcs -O2 test_expect.c -o test_expect
看一下这个例子反汇编的代码:
 173 08048870 <main>:
 174  8048870:       55                      push   %ebp
 175  8048871:       89 e5                   mov    %esp,%ebp
 176  8048873:       83 e4 f0                and    $0xfffffff0,%esp
 177  8048876:       83 ec 10                sub    $0x10,%esp
 178  8048879:       8b 45 08                mov    0x8(%ebp),%eax
 179  804887c:       83 05 e8 c0 04 08 01    addl   $0x1,0x804c0e8
 180  8048883:       83 15 ec c0 04 08 00    adcl   $0x0,0x804c0ec
 181  804888a:       85 c0                   test   %eax,%eax
 182  804888c:       75 3d                   jne    80488cb <main+0x5b>
 183  804888e:       83 05 f0 c0 04 08 01    addl   $0x1,0x804c0f0
 184  8048895:       b8 aa 00 00 00          mov    $0xaa,%eax
 185  804889a:       83 15 f4 c0 04 08 00    adcl   $0x0,0x804c0f4
 186  80488a1:       89 44 24 08             mov    %eax,0x8(%esp)
 187  80488a5:       c7 44 24 04 f0 a0 04    movl   $0x804a0f0,0x4(%esp)
 188  80488ac:       08
 189  80488ad:       c7 04 24 01 00 00 00    movl   $0x1,(%esp)
 190  80488b4:       e8 67 ff ff ff          call   8048820 <__printf_chk@plt>
 191  80488b9:       83 05 f8 c0 04 08 01    addl   $0x1,0x804c0f8
 192  80488c0:       83 15 fc c0 04 08 00    adcl   $0x0,0x804c0fc
 193  80488c7:       31 c0                   xor    %eax,%eax
 194  80488c9:       c9                      leave
 195  80488ca:       c3                      ret
 196  80488cb:       b8 5a 00 00 00          mov    $0x5a,%eax
 197  80488d0:       eb cf                   jmp    80488a1 <main+0x31>
注意184行和196行,可以看到分支优化的效果,为了达到这种优化效果,请在编译时使用选项-fprofile-arcs -O2来编译。

再来看一看不使用选项-fprofile-arcs -O2编译后反汇编的结果:

123 080483e4 <main>:
124  80483e4:       55                      push   %ebp
125  80483e5:       89 e5                   mov    %esp,%ebp
126  80483e7:       83 e4 f0                and    $0xfffffff0,%esp
127  80483ea:       83 ec 20                sub    $0x20,%esp
128  80483ed:       8b 45 08                mov    0x8(%ebp),%eax
129  80483f0:       89 44 24 1c             mov    %eax,0x1c(%esp)
130  80483f4:       8b 44 24 1c             mov    0x1c(%esp),%eax
131  80483f8:       85 c0                   test   %eax,%eax
132  80483fa:       74 0a                   je     8048406 <main+0x22>
133  80483fc:       c7 44 24 1c 5a 00 00    movl   $0x5a,0x1c(%esp)
134  8048403:       00
135  8048404:       eb 08                   jmp    804840e <main+0x2a>
136  8048406:       c7 44 24 1c aa 00 00    movl   $0xaa,0x1c(%esp)
137  804840d:       00
138  804840e:       b8 00 85 04 08          mov    $0x8048500,%eax
139  8048413:       8b 54 24 1c             mov    0x1c(%esp),%edx
140  8048417:       89 54 24 04             mov    %edx,0x4(%esp)
141  804841b:       89 04 24                mov    %eax,(%esp)
142  804841e:       e8 dd fe ff ff          call   8048300 <printf@plt>
143  8048423:       b8 00 00 00 00          mov    $0x0,%eax
144  8048428:       c9                      leave
145  8048429:       c3                      ret
146  804842a:       90                      nop
147  804842b:       90                      nop
148  804842c:       90                      nop
149  804842d:       90                      nop
150  804842e:       90                      nop
151  804842f:       90                      nop
Logo

更多推荐