什么是ASAN

ASAN(Address Sanitizer)是一个内存检测工具。gcc4.8版本及以上自带。支持多平台(x86, ARM, MIPS (both 32- and 64-bit versions of all architectures), PowerPC64)、多操作系统(Linux, Darwin (OS X and iOS Simulator), FreeBSD, Android)。

ASAN主要由两部分构成:instrumentation module和run-time library。

instrumentation module主要做了两件事:

  1. 对所有的memory access 都去检查该memory所对应的shadow memory的状态。这是静态打桩,需要重新编译;
  2. 为所有栈上对象或全局对象创建 前后的保护区(redzone),为检测溢出做准备。

run-time library主要也做两件事:

  1. 替换默认的malloc/free函数。为所有堆对象创建 前后的保护区,将free掉的堆区域隔离一段时间,避免它立即被分配使用。
  2. 对错误情况进行输出,包括堆栈信息。

什么是shadow memory

The virtual address space is divided into 2 disjoint classes:

  • Main application memory (Mem): this memory is used by the regular application code.
  • Shadow memory (Shadow): this memory contains the shadow values (or metadata). There is a correspondence between the shadow and the main application memory. Poisoning a byte in the main memory means writing some special value into the corresponding shadow memory.

shadow memory也是内存中的一块区域,但与main memory又不同,shadow memory有中元数据的思想,其中的数据放映的是main memory的状态信息。因此,可以将shadow memory看做是main memory的元数据,而main memory中存储的才是程序真正的数据。

shadow memory和main memory的映射关系

1字节的shadow memory 负责记录 8字节的main memory的可寻址状态。

There are only 9 different values for any aligned 8 bytes of the application memory:

  • All 8 bytes in qword are unpoisoned (i.e. addressable). The shadow value is 0.
  • All 8 bytes in qword are poisoned (i.e. not addressable). The shadow value is negative.
  • First k bytes are unpoisoned, the rest 8-k are poisoned. The shadow value is k. This is guaranteed by the fact that malloc returns 8-byte aligned chunks of memory. The only case where different bytes of an aligned qword have different state is the tail of a malloc-ed region. For example, if we call malloc(13), we will have one full unpoisoned qword and one qword where 5 first bytes are unpoisoned.

可以用9种状态来表示8字节对齐的内存的可访问(可寻址)状态:

  • 所有的8个字节都可寻址,shadow memory值为0;
  • 所有的8个字节都不可寻址,shadow memory值为负数;
  • 前k(0≤k≤7)个字节可寻址,剩下的7-k个字节不可寻址,shadow memory的值为k;

我们知道malloc函数返回的地址通常是8字节对齐的,因此任意一个由(对齐的)8字节所组成的内存区域必然落在以上9种状态之中。这9种状态便可以用shadow memory中的一个字节来进行编码。实际上,一个字节可以编码的状态总共有256(2^8)种,因此用在这里绰绰有余。例如,我们调用malloc(13),将会得到一个完整的可寻址的8字节和只有前5个字节可寻址的8字节main memory。

对于64位的机器,main memory和shadow memory的映射计算公式是:Shadow = (Mem >> 3) + 0x7fff8000

下面通过一个实际的例子来看一看shadow memory和main memory

/**
 * Copyright (c) 2021 junfu0903@aliyun.com.
 *
 * Unpublished copyright. All rights reserved. This material contains
 * proprietary information that should be used or copied only within
 * junfu0903@aliyun.com, except with written permission of junfu0903@aliyun.com.
 *
 * @file heap_buffer_overflow.c
 * @brief
 * @author junfu0903@aliyun.com
 * @version 1.0.0
 * @date 2021-06-15 10:18:45
 */

#include <stdio.h>
#include <stdlib.h>

int main(int argc, char** argv)
{
    char *p = NULL;

    p = (char*)malloc(16);

    p[17] = 12;

    free(p);

    return 0;
}

从上面的代码可以看出,malloc了16字节的内存空间,然后p[17] = 12肯定会造成内存溢出。

读懂报错信息

heap-buffer-overflow on address 0x602000000021WRITE of size 1 at 0x602000000021 thread T0,可以直到出现异常的地址是0x602000000021。

0x602000000021 is located 1 bytes to the right of 16 byte region [0x602000000010, 0x602000000020]region[0x602000000010, 0x602000000020]刚好是malloc分配的16字节的内存空间。

由前文可知,Shadow = (Mem >> 3) + 0x7fff8000,此例中,mem的地址 0x602000000021, 得到shadow memory的地址是 0xC047FFF8004,刚好就是上图中由中括号括起来的**[fa]**,显然,shadow memory的值是负数,说明内存0x602000000021不可访问。而0xC047FFF8002和0xC047FFF8003两个shadow memory对应的值都为0,说明这两个shadow memory对应的main memory是可寻址的,也就是malloc分配的16字节的空间。

Shadow byte legent (one shadow byte represents 8 application bytes)表示对shadow byte的说明;

shadow byte的值为0,则表示对应8 application bytes 是Addressable

shadow byte的值为1-7,则表示对应8 application bytes中只有前1-7个字节可寻址,也就是partially addressable

后面的heap left redzone等,是一个键值表。若shadow memory的值为fa,表示对应application bytes是heap left redzone,是不能访问的。

The redzone is a region of unaccessible data both to the left and to the right of an allocation. ASan keeps a bitmask of the entire memory and determines for each 8-byte region what kind of memory it is.

Logo

更多推荐