Linux系统调用__get_thread获取TLS失败导致应用程序奔溃
背景Android模拟器运行在PC端,Android应用运行在模拟器内部,当PC机在BIOS中没有打开虚拟化技术(vt-x: intel的硬件虚拟化技术; AMD-V: AMD CPU的硬件虚拟化技术)的时候,在模拟器内部运行ARM库的游戏,出现崩溃或者运行一段时间之后崩溃的问题. 具体奔溃点在__get_tls()+6处. 这里以当乐.apk这个游戏为例子,删除其中libs下的x86库,只保留a
背景
Android模拟器运行在PC端,Android应用运行在模拟器内部,当PC机在BIOS中没有打开虚拟化技术(vt-x: intel的硬件虚拟化技术; AMD-V: AMD CPU的硬件虚拟化技术)的时候,在模拟器内部运行ARM库的游戏,出现崩溃或者运行一段时间之后崩溃的问题. 具体奔溃点在__get_tls()+6
处. 这里以当乐.apk
这个游戏为例子,删除其中libs下的x86库,只保留arm类型库文件,安装运行后整个崩溃日志如下:
03-27 15:51:21.236 E/ZKOPCountUtil( 3290): find Name = 当乐
03-27 15:51:21.344 D/dalvikvm( 4203): GC_CONCURRENT freed 1093K, 8% free 13255K/14404K, paused 15ms+12ms, total 212ms
03-27 15:51:21.344 D/dalvikvm( 4203): WAIT_FOR_CONCURRENT_GC blocked 96ms
03-27 15:51:21.348 D/dalvikvm( 4203): WAIT_FOR_CONCURRENT_GC blocked 89ms
03-27 15:51:21.360 D/dalvikvm( 4203): WAIT_FOR_CONCURRENT_GC blocked 96ms
03-27 15:51:21.404 W/View ( 4203): requestLayout() improperly called by android.support.v7.widget.AppCompatTextView{52831f4c V.ED.... ......I. 20,0-148,91 #7f0d0438 app:id/expand_title} during layout: running second layout pass
03-27 15:51:21.584 D/Volley ( 4203): [148] b.a: HTTP response for request=<[ ] http://res5.d.cn/cp/img/502487/o_1bbl6epie170sbec184qs9i1ggou.png 0x22e400ee LOW 2> [lifetime=4156], [size=67], [rc=200], [retryCount=0]
03-27 15:51:21.920 F/libc ( 4203): Fatal signal 11 (SIGSEGV) at 0x24244c8d (code=1), thread 4246 (Thread-133)
03-27 15:51:22.044 I/DEBUG ( 112): *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** ***
03-27 15:51:22.044 I/DEBUG ( 112): Build fingerprint: 'SAMSUNG/hlteatt/hlteuc:4.4.4/tt/eng.jenkins.20170306.140753:userdebug/test-keys'
03-27 15:51:22.044 I/DEBUG ( 112): Revision: '0'
03-27 15:51:22.044 I/DEBUG ( 112): pid: 4203, tid: 4246, name: Thread-133 >>> com.diguayouxi <<<
03-27 15:51:22.044 I/DEBUG ( 112): signal 11 (SIGSEGV), code 1 (SEGV_MAPERR), fault addr 24244c8d
03-27 15:51:23.412 D/dalvikvm( 784): GC_CONCURRENT freed 437K, 28% free 3834K/5272K, paused 7ms+6ms, total 75ms
03-27 15:51:23.568 I/GAv4-SVC( 2937): Google Analytics 8.7.03 is starting up.
03-27 15:51:23.884 I/DEBUG ( 112): eax 24244c89 ebx b76b7fcc ecx 00000018 edx 00004000
03-27 15:51:23.888 I/DEBUG ( 112): esi b76c694c edi 00000000
03-27 15:51:23.888 I/DEBUG ( 112): xcs 00000073 xds 0000007b xes 0000007b xfs 0000003b xss 0000007b
03-27 15:51:23.888 I/DEBUG ( 112): eip b76343c6 ebp 00004000 esp 956396cc flags 00210206
03-27 15:51:23.900 D/dalvikvm( 697): GC_CONCURRENT freed 489K, 12% free 4671K/5284K, paused 10ms+1ms, total 74ms
03-27 15:51:23.976 I/DEBUG ( 112):
03-27 15:51:23.976 I/DEBUG ( 112): backtrace:
03-27 15:51:23.984 I/DEBUG ( 112): #00 pc 000183c6 /system/lib/libc.so (__get_thread+6)
03-27 15:51:23.984 I/DEBUG ( 112): #01 pc 0000de2d /system/lib/libc.so (pthread_mutex_lock+205)
03-27 15:51:23.988 I/DEBUG ( 112): #02 pc 0005a745 /system/lib/libc.so (flockfile+37)
03-27 15:51:23.988 I/DEBUG ( 112): #03 pc 0004651f /system/lib/libc.so (fread+335)
03-27 15:51:23.992 I/DEBUG ( 112): #04 pc 00075f6a /system/lib/libc.so (android_getaddrinfo_proxy+1050)
03-27 15:51:23.996 I/DEBUG ( 112): #05 pc 00078c30 /system/lib/libc.so (android_getaddrinfoforiface+1936)
03-27 15:51:24.000 I/DEBUG ( 112): #06 pc 00078e97 /system/lib/libc.so (getaddrinfo+55)
03-27 15:51:24.000 I/DEBUG ( 112): #07 pc 00037160 /system/lib/libjavacore.so (Posix_getaddrinfo(_JNIEnv*, _jobject*, _jstring*, _jobject*)+336)
03-27 15:51:24.000 I/DEBUG ( 112): #08 pc 0002a4ab /system/lib/libdvm.so (dvmPlatformInvoke+79)
03-27 15:51:24.008 I/DEBUG ( 112): #09 pc 00077a27 [heap]
03-27 15:51:24.008 I/DEBUG ( 112): #10 pc 00086da2 /system/lib/libdvm.so (dvmCallJNIMethod(unsigned int const*, JValue*, Method const*, Thread*)+434)
03-27 15:51:24.008 I/DEBUG ( 112): #11 pc 001775b8 /system/lib/libdvm.so
03-27 15:51:24.008 I/DEBUG ( 112): #12 pc 00003cf7 <unknown>
03-27 15:51:24.008 I/DEBUG ( 112): #13 pc 0003b962 /system/lib/libdvm.so (dvmMterpStd(Thread*)+66)
03-27 15:51:24.008 I/DEBUG ( 112): #14 pc 00037029 /system/lib/libdvm.so (dvmInterpret(Thread*, Method const*, JValue*)+217)
03-27 15:51:24.008 I/DEBUG ( 112): #15 pc 000bd027 /system/lib/libdvm.so (dvmCallMethodV(Thread*, Method const*, Object*, bool, JValue*, char*)+759)
03-27 15:51:24.008 I/DEBUG ( 112): #16 pc 000bd437 /system/lib/libdvm.so (dvmCallMethod(Thread*, Method const*, Object*, JValue*, ...)+55)
03-27 15:51:24.008 I/DEBUG ( 112): #17 pc 000993c3 /system/lib/libdvm.so (interpThreadStart(void*)+995)
03-27 15:51:24.008 I/DEBUG ( 112): #18 pc 0000bc3c /system/lib/libc.so (__thread_entry+236)
03-27 15:51:24.008 I/DEBUG ( 112): #19 pc 0003e1b5 /system/lib/libc.so (__pthread_clone+69)
03-27 15:51:24.008 I/DEBUG ( 112): #20 pc 00098fdf /system/lib/libdvm.so (internalThreadStart(void*)+655)
03-27 15:51:24.008 I/DEBUG ( 112):
03-27 15:51:24.008 I/DEBUG ( 112): stack:
03-27 15:51:24.008 I/DEBUG ( 112): 9563968c b4db080e /system/lib/libdvm.so (dvmMterp_OP_RETURN_VOID_BARRIER+158)
03-27 15:51:24.008 I/DEBUG ( 112): 95639690 b8cadbc0 [heap]
03-27 15:51:24.008 I/DEBUG ( 112): 95639694 00000001
03-27 15:51:24.008 I/DEBUG ( 112): 95639698 00000000
03-27 15:51:24.008 I/DEBUG ( 112): 9563969c b7629f39 /system/lib/libc.so (pthread_mutex_unlock+25)
03-27 15:51:24.008 I/DEBUG ( 112): 956396a0 00000000
03-27 15:51:24.024 I/DEBUG ( 112): 956396a4 9db6fdee /data/dalvik-cache/system@framework@core.jar@classes.dex
03-27 15:51:24.024 I/DEBUG ( 112): 956396a8 9563dce4
03-27 15:51:24.024 I/DEBUG ( 112): 956396ac b7629fba /system/lib/libc.so (pthread_mutex_unlock+154)
03-27 15:51:24.024 I/DEBUG ( 112): 956396b0 00000000
03-27 15:51:24.024 I/DEBUG ( 112): 956396b4 b8cadbd0 [heap]
03-27 15:51:24.024 I/DEBUG ( 112): 956396b8 9dd30518 /dev/ashmem/dalvik-LinearAlloc (deleted)
03-27 15:51:24.024 I/DEBUG ( 112): 956396bc b7629fba /system/lib/libc.so (pthread_mutex_unlock+154)
03-27 15:51:24.032 I/DEBUG ( 112): 956396c0 00004000
03-27 15:51:24.032 I/DEBUG ( 112): 956396c4 b8cae030 [heap]
03-27 15:51:24.036 I/DEBUG ( 112): 956396c8 b7629d69 /system/lib/libc.so (pthread_mutex_lock+9)
03-27 15:51:24.052 I/DEBUG ( 112): #00 956396cc b7629e2e /system/lib/libc.so (pthread_mutex_lock+206)
03-27 15:51:24.052 I/DEBUG ( 112): #01 956396d0 a59e7eec /dev/ashmem/dalvik-heap (deleted)
03-27 15:51:24.052 I/DEBUG ( 112): 956396d4 b8ea6808 [heap]
03-27 15:51:24.052 I/DEBUG ( 112): 956396d8 b76bc718
03-27 15:51:24.052 I/DEBUG ( 112): 956396dc b762ed4f /system/lib/libc.so (dlmalloc+351)
03-27 15:51:24.052 I/DEBUG ( 112): 956396e0 b76bc800
03-27 15:51:24.052 I/DEBUG ( 112): 956396e4 b8cae030 [heap]
03-27 15:51:24.052 I/DEBUG ( 112): 956396e8 00004000
03-27 15:51:24.052 I/DEBUG ( 112): 956396ec 00004000
03-27 15:51:24.052 I/DEBUG ( 112): 956396f0 00000050
03-27 15:51:24.052 I/DEBUG ( 112): 956396f4 b8e2bee8 [heap]
03-27 15:51:24.052 I/DEBUG ( 112): 956396f8 b7629d69 /system/lib/libc.so (pthread_mutex_lock+9)
03-27 15:51:24.052 I/DEBUG ( 112): 956396fc b76b7fcc /system/lib/libc.so
03-27 15:51:24.052 I/DEBUG ( 112): 95639700 b8ea6808 [heap]
03-27 15:51:24.052 I/DEBUG ( 112): 95639704 00000001
03-27 15:51:24.052 I/DEBUG ( 112): 95639708 b76c63a0
03-27 15:51:24.052 I/DEBUG ( 112): 9563970c b7676746 /system/lib/libc.so (flockfile+38)
03-27 15:51:24.052 I/DEBUG ( 112): #02 95639710 b76c694c
03-27 15:51:24.052 I/DEBUG ( 112): 95639714 b8e2bee8 [heap]
03-27 15:51:24.052 I/DEBUG ( 112): 95639718 00001000
03-27 15:51:24.068 I/DEBUG ( 112): 9563971c b76b7fcc /system/lib/libc.so
03-27 15:51:24.068 I/DEBUG ( 112): 95639720 956397da [stack:4246]
03-27 15:51:24.068 I/DEBUG ( 112): 95639724 b7676726 /system/lib/libc.so (flockfile+6)
03-27 15:51:24.068 I/DEBUG ( 112): 95639728 b76b7fcc /system/lib/libc.so
03-27 15:51:24.068 I/DEBUG ( 112): 9563972c b7662520 /system/lib/libc.so (fread+336)
03-27 15:51:24.296 I/DEBUG ( 112):
03-27 15:51:24.296 I/DEBUG ( 112): memory map around fault addr 24244c8d:
03-27 15:51:24.304 I/DEBUG ( 112): 1c142000-1c145000 rw-
03-27 15:51:24.308 I/DEBUG ( 112): (no map for address)
03-27 15:51:24.308 I/DEBUG ( 112): 9296d000-9296e000 ---
03-27 15:51:24.940 I/PhenotypeConfigurator( 697): Scheduling Phenotype for one-off execution 667 seconds from now (1490601084941)
03-27 15:51:25.244 D/dalvikvm( 2937): GC_CONCURRENT freed 214K, 6% free 5252K/5572K, paused 19ms+26ms, total 106ms
问题定位
根据奔溃日志,找到相应的函数__get_tls()
,在源码中实现如下:
//android-4.4.4\bionic\libc\arch-x86\bionic\__get_tls.c
/* see the implementation of __set_tls and pthread.c to understand this
* code. Basically, the content of gs:[0] always is a pointer to the base
* address of the tls region
*/
void* __get_tls(void)
{
void* tls;
asm ( " movl %%gs:0, %0" : "=r"(tls) );
return tls;
}
从代码的注释可以看出,这个gs寄存器
保存的是指向TLS(Thread Local Storage:线程本地存储)的基地址指针.用IDA能更加直观的看到奔溃的点.如下是用IDA打开libc.so的__get_tls()
函数,那么在__get_tls()+6
这行崩溃,也就是mov eax, [eax+4]
间接取址崩溃.
.text:000183C0
.text:000183C0 ; =============== S U B R O U T I N E =======================================
.text:000183C0
.text:000183C0
.text:000183C0 public __get_thread
.text:000183C0 __get_thread proc near ; CODE XREF: __pthread_cleanup_push+1Bp
.text:000183C0 ; __pthread_cleanup_pop+1Bp ...
.text:000183C0 mov eax, large gs:0
.text:000183C6 mov eax, [eax+4]
.text:000183C9 nop
.text:000183CA nop
.text:000183CB nop
.text:000183CC nop
.text:000183CD retn
.text:000183CD __get_thread endp
那么问题来了,eax是从gs寄存器读取的值,加4后间接寻址失败.这里gs寄存器的值肯定有问题,从奔溃日志的来看,eax寄存器的值就是gs:0的值,这里地址有问题.那么现在我们需要了解的是这个gs寄存器哪里设置,作用时啥?
既然代码注释说明了gs时存放tls基地址指针的,tls存放在内核GDT表中,那么这个gs应该是由内核来设置的.这里以x86的段分配为例子,段定义文件在asm\Segment.h
中,如下:
// genymotion_kernel_3.10\arch\x86\include\asm\Segment.h
/*
* The layout of the per-CPU GDT under Linux:
*
* 0 - null
* 1 - reserved
* 2 - reserved
* 3 - reserved
*
* 4 - unused <==== new cacheline
* 5 - unused
*
* ------- start of TLS (Thread-Local Storage) segments:
*
* 6 - TLS segment #1 [ glibc's TLS segment ]
* 7 - TLS segment #2 [ Wine's %fs Win32 segment ]
* 8 - TLS segment #3
* 9 - reserved
* 10 - reserved
* 11 - reserved
*
* ------- start of kernel segments:
*
* 12 - kernel code segment <==== new cacheline
* 13 - kernel data segment
* 14 - default user CS
* 15 - default user DS
* 16 - TSS
* 17 - LDT
* 18 - PNPBIOS support (16->32 gate)
* 19 - PNPBIOS support
* 20 - PNPBIOS support
* 21 - PNPBIOS support
* 22 - PNPBIOS support
* 23 - APM BIOS support
* 24 - APM BIOS support
* 25 - APM BIOS support
*
* 26 - ESPFIX small SS
* 27 - per-cpu [ offset to per-cpu data area ]
* 28 - stack_canary-20 [ for stack protector ]
* 29 - unused
* 30 - unused
* 31 - TSS for double fault handler
*/
... ...
//省去部分代码
/*
* Save a segment register aw
*/
#define savesegment(seg, value) \
asm("mov %%" #seg ",%0":"=r" (value) : : "memory")
/*
* x86_32 user gs accessors.
*/
#ifdef CONFIG_X86_32
#ifdef CONFIG_X86_32_LAZY_GS
#define get_user_gs(regs) (u16)({unsigned long v; savesegment(gs, v); v;})
#define set_user_gs(regs, v) loadsegment(gs, (unsigned long)(v))
#define task_user_gs(tsk) ((tsk)->thread.gs)
#define lazy_save_gs(v) savesegment(gs, (v))
#define lazy_load_gs(v) loadsegment(gs, (v))
#else /* X86_32_LAZY_GS */
#define get_user_gs(regs) (u16)((regs)->gs)
#define set_user_gs(regs, v) do { (regs)->gs = (v); } while (0)
#define task_user_gs(tsk) (task_pt_regs(tsk)->gs)
#define lazy_save_gs(v) do { } while (0)
#define lazy_load_gs(v) do { } while (0)
#endif /* X86_32_LAZY_GS */
#endif /* X86_32 */
问题解决
从上表可以看出整个GDT的分段,其中包括TLS段,关键的是在最后有关获取gs寄存器值的方法.可以看到,在内核配置了CONFIG_X86_32
的情况下,有两个获取gs寄存器值的方法,依赖于内核中宏CONFIG_X86_32_LAZY_GS
的定义与否.
通过查看内核中CONFIG_X86_32_LAZY_GS
的定义,发现处于选中状态,那么此时gs的值是从局部变量v中赋值给gs的,这个时候局部变量的值由于没有初始化,所以为一个随机值.如果没有选CONFIG_X86_32_LAZY_GS
,那么直接获取gs寄存器的值返回,这是regs的值在哪里设置gs暂且不表.看到这里也许还是不明白gs在整个内核中的作用以及流程.没有关系,后续在深入. 至于解决这个问题,由于发现CONFIG_X86_32_LAZY_GS
对获取gs寄存器的影响,配置内核,去除CONFIG_X86_32_LAZY_GS
选项,重编后验证,当乐.apk正常运行.说明此配置影响gs寄存器的取值.
解决patch如下,合入x86的deconfig配置文件即可:
@@ -37,7 +37,6 @@ CONFIG_ARCH_SUPPORTS_DEBUG_PAGEALLOC=y
CONFIG_HAVE_INTEL_TXT=y
CONFIG_X86_32_SMP=y
CONFIG_X86_HT=y
-CONFIG_X86_32_LAZY_GS=y
CONFIG_ARCH_HWEIGHT_CFLAGS="-fcall-saved-ecx -fcall-saved-edx"
CONFIG_ARCH_CPU_PROBE_RELEASE=y
CONFIG_ARCH_SUPPORTS_UPROBES=y
@@ -452,7 +451,7 @@ CONFIG_ARCH_RANDOM=y
CONFIG_X86_SMAP=y
# CONFIG_EFI is not set
# CONFIG_SECCOMP is not set
-# CONFIG_CC_STACKPROTECTOR is not set
+CONFIG_CC_STACKPROTECTOR=y
# CONFIG_HZ_100 is not set
CONFIG_HZ_250=y
# CONFIG_HZ_300 is not set
- 上述
CONFIG_X86_32_LAZY_GS
和CONFIG_CC_STACKPROTECTOR
是依赖关系,去除CONFIG_X86_32_LAZY_GS
配置需要选择CONFIG_CC_STACKPROTECTOR=y
- 如果打开上述内核配置选项出现内核编译错误
error: undefined reference to '__stack_chk_guard'
,请参考本人的另外一篇文章: Linux编译x86架构内核出现_stack_chk_guard未定义错误
总结
好了,此问题解决了,但是还有很多疑点没有搞清楚,这个最要命了,作为开发,不了解整个流程总是心里没底,不踏实.但是还是得慢慢来,后续就是对整个GDT以及内存进行学习
感谢
2017 …… ,卷起裤管跑,撸起袖子干!
更多推荐
所有评论(0)