oom-killer
OOM是Out of Memory的简写,也就是内存不足。出现该问题的原因有很多,如程序内存泄漏等。内存泄漏问题可以通过定时地终止和重启有问题的程序来发现和解决。在比较新的Linux内核版本中,有一种名为OOM(Out Of Memory )杀手的算法,它可以在必要时执行Kill
OOM是Out of Memory的简写,也就是内存不足。出现该问题的原因有很多,如程序内存泄漏等。内存泄漏问题可以通过定时地终止和重启有问题的程序来发现和解决。在比较新的Linux内核版本中,有一种名为OOM(Out Of Memory )杀手的算法,它可以在必要时执行Kill而杀掉一些程序。
OOM killer不会kill掉占用内存最高的进程,而是根据/proc/$pid/oom_score中的数值,决定干掉哪个程序。值越高被干掉的可能性越大。同时,系统还会给每个进程分配一个权重/proc/$pid/oom_adj,这个权重与进程占用内存、cpu时间、存活时间一起算出oom_score
系统环境:
mysql版本信息
mysql Ver 14.12 Distrib 5.0.45, for redhat-linux-gnu (x86_64) using readline 5.0
系统版本信息
root@bhdc006.baihui.com~>cat /etc/redhat-release
CentOS release 5 (Final)
内核信息
root@bhdc006.baihui.com~>uname -a
Linux bhdc006.baihui.com 2.6.18-53.1.4.el5 #1 SMP Fri Nov 30 00:45:55 EST 2007 x86_64 x86_64 x86_64 GNU/Linux
下面看一下报错信息:
Sep 8 13:45:44 bhdc005 kernel: hald invoked oom-killer: gfp_mask=0x201d2, order=0, oomkilladj=0
Sep 8 13:45:45 bhdc005 kernel:
Sep 8 13:45:45 bhdc005 kernel: Call Trace:
Sep 8 13:45:45 bhdc005 kernel: [<ffffffff800bed1c>] out_of_memory+0x8e/0x2f5
Sep 8 13:45:45 bhdc005 kernel: [<ffffffff8000f082>] __alloc_pages+0x22b/0x2b4
Sep 8 13:45:45 bhdc005 kernel: [<ffffffff80012731>] __do_page_cache_readahead+0x95/0x1d9
Sep 8 13:45:45 bhdc005 kernel: [<ffffffff800618e1>] __wait_on_bit_lock+0x5b/0x66
Sep 8 13:45:45 bhdc005 kernel: [<ffffffff8003f113>] __lock_page+0x5e/0x64
Sep 8 13:45:45 bhdc005 kernel: [<ffffffff800130bc>] filemap_nopage+0x148/0x322
Sep 8 13:45:45 bhdc005 kernel: [<ffffffff800087ed>] __handle_mm_fault+0x1f8/0xdf4
Sep 8 13:45:45 bhdc005 kernel: [<ffffffff80064a6a>] do_page_fault+0x4b8/0x81d
Sep 8 13:45:45 bhdc005 kernel: [<ffffffff80060f29>] thread_return+0x0/0xeb
Sep 8 13:45:45 bhdc005 kernel: [<ffffffff8006058f>] __sched_text_start+0x17f/0xb19
Sep 8 13:45:45 bhdc005 kernel: [<ffffffff8002c864>] mntput_no_expire+0x19/0x89
Sep 8 13:45:45 bhdc005 kernel: [<ffffffff8005bde9>] error_exit+0x0/0x84
Sep 8 13:45:45 bhdc005 kernel:
Sep 8 13:45:45 bhdc005 kernel: Mem-info:
Sep 8 13:45:45 bhdc005 kernel: Node 0 DMA32 per-cpu:
Sep 8 13:45:45 bhdc005 kernel: cpu 0 hot: high 186, batch 31 used:158
Sep 8 13:45:45 bhdc005 kernel: cpu 0 cold: high 62, batch 15 used:61
Sep 8 13:45:45 bhdc005 kernel: cpu 1 hot: high 186, batch 31 used:158
Sep 8 13:45:45 bhdc005 kernel: cpu 1 cold: high 62, batch 15 used:53
Sep 8 13:45:45 bhdc005 kernel: cpu 2 hot: high 186, batch 31 used:126
Sep 8 13:45:45 bhdc005 kernel: cpu 2 cold: high 62, batch 15 used:46
Sep 8 13:45:45 bhdc005 kernel: cpu 3 hot: high 186, batch 31 used:75
Sep 8 13:45:45 bhdc005 kernel: cpu 3 cold: high 62, batch 15 used:56
Sep 8 13:45:45 bhdc005 kernel: cpu 4 hot: high 186, batch 31 used:174
Sep 8 13:45:45 bhdc005 kernel: cpu 4 cold: high 62, batch 15 used:49
Sep 8 13:45:45 bhdc005 kernel: cpu 5 hot: high 186, batch 31 used:114
Sep 8 13:45:45 bhdc005 kernel: cpu 5 cold: high 62, batch 15 used:47
Sep 8 13:45:45 bhdc005 kernel: cpu 6 hot: high 186, batch 31 used:49
Sep 8 13:45:45 bhdc005 kernel: cpu 6 cold: high 62, batch 15 used:55
Sep 8 13:45:45 bhdc005 kernel: cpu 7 hot: high 186, batch 31 used:134
Sep 8 13:45:45 bhdc005 kernel: cpu 7 cold: high 62, batch 15 used:53
Sep 8 13:45:45 bhdc005 kernel: Node 0 HighMem per-cpu: empty
Sep 8 13:45:45 bhdc005 kernel: Free pages: 79816kB (0kB HighMem)
Sep 8 13:45:45 bhdc005 kernel: Active:2037497 inactive:2028979 dirty:0 writeback:0 unstable:0 free:19954 slab:3527 mapped-file:216 mappe
d-anon:4067361 pagetables:9892
Sep 8 13:45:45 bhdc005 kernel: Node 0 DMA free:11136kB min:8kB low:8kB high:12kB active:0kB inactive:0kB present:10784kB pages_scanned:0
all_unreclaimable? yes
Sep 8 13:45:45 bhdc005 kernel: lowmem_reserve[]: 0 2995 16125 16125
Sep 8 13:45:45 bhdc005 kernel: Node 0 DMA32 free:55512kB min:3016kB low:3768kB high:4524kB active:1497228kB inactive:1468520kB present:3
067424kB pages_scanned:4632969 all_unreclaimable? yes
Sep 8 13:45:45 bhdc005 kernel: lowmem_reserve[]: 0 0 13130 13130
Sep 8 13:45:45 bhdc005 kernel: Node 0 Normal free:13168kB min:13224kB low:16528kB high:19836kB active:6652464kB inactive:6647412kB prese
nt:13445120kB pages_scanned:23161924 all_unreclaimable? yes
Sep 8 13:45:45 bhdc005 kernel: lowmem_reserve[]: 0 0 0 0
Sep 8 13:45:45 bhdc005 kernel: Node 0 HighMem free:0kB min:128kB low:128kB high:128kB active:0kB inactive:0kB present:0kB pages_scanned:
0 all_unreclaimable? no
Sep 8 13:45:45 bhdc005 kernel: lowmem_reserve[]: 0 0 0 0
Sep 8 13:45:45 bhdc005 kernel: Node 0 DMA: 2*4kB 3*8kB 2*16kB 4*32kB 3*64kB 2*128kB 1*256kB 0*512kB 2*1024kB 0*2048kB 2*4096kB = 11136kB
Sep 8 13:45:45 bhdc005 kernel: Node 0 DMA32: 0*4kB 9*8kB 1*16kB 0*32kB 0*64kB 1*128kB 0*256kB 0*512kB 0*1024kB 1*2048kB 13*4096kB = 5551
2kB
Sep 8 13:45:46 bhdc005 kernel: Node 0 Normal: 12*4kB 2*8kB 7*16kB 2*32kB 2*64kB 0*128kB 0*256kB 1*512kB 0*1024kB 0*2048kB 3*4096kB = 131
68kB
Sep 8 13:45:46 bhdc005 kernel: Mem-info:
Sep 8 13:45:46 bhdc005 kernel: Node 0 DMA per-cpu:
Sep 8 13:45:47 bhdc005 kernel: Node 0 HighMem: empty
Sep 8 13:45:47 bhdc005 kernel: Swap cache: add 384502, delete 384502, find 52394/53198, race 0+0
Sep 8 13:45:47 bhdc005 kernel: Free swap = 0kB
Sep 8 13:45:47 bhdc005 kernel: Total swap = 1506584kB
Sep 8 13:45:47 bhdc005 kernel: Free swap: 0kB
Sep 8 13:45:47 bhdc005 kernel: 4456448 pages of RAM
Sep 8 13:45:47 bhdc005 kernel: 348373 reserved pages
Sep 8 13:45:47 bhdc005 kernel: 3673 pages shared
Sep 8 13:45:47 bhdc005 kernel: 0 pages swap cached
Sep 8 13:45:47 bhdc005 kernel: Out of memory: Killed process 4429 (mysqld).
Sep 8 13:45:47 bhdc005 kernel: xinetd invoked oom-killer: gfp_mask=0x201d2, order=0, oomkilladj=0
Sep 8 13:45:47 bhdc005 kernel:
分析故障:
从屏幕上可以看到,swap已经为零。怀疑是内存耗尽问题
1、从free看,可用物理内存很低,cached和buffers都比较低;
2、swap使用率比较高,free swap很少,swap io交换比较频繁;
3、死机前一段时间,一些进程会不明原因的死掉。
能判断故障发生时,内核还处于可运行状态,但由于内存不足,触发了OOM killer机制,导致部分服务被强制终止而引发了问题
故障处理
可能是如下过程:
服务器重启之后
使用uptime查看系统负载
使用top查询占用资源的进程
使用lsof和ps命令分析其工作内容:
lsof -c 进程名|more
ps -ef –forest
尝试用常规方式停止占用内存的服务
另外由于有这个提示“Out of memory: Killed process 4429 (mysqld).” 所以需要查看一下mysql的相关配置是否正确,值是否够大
key_buffer
innodb_buffer_pool_size
max_allowed_packet
table_cache
sort_buffer_size
read_buffer_size
read_rnd_buffer_size
myisam_sort_buffer_size
thread_cache_size
query_cache_size=
long_query_time
join_buffer_size=3M
thread_concurrency
set-variable=max_connections=
以上纯属个人想法
更多推荐
所有评论(0)