在debug k8s node不可用过程中,有可能会看到:

System OOM encountered, victim process: xx

为了搞清楚oom事件是什么,以及如何产生的,我们做了一定探索,并输出了下面的信息。(本文关注oom事件是如何生成&传输的,具体cadvisor如何判定oom不在本片的讨论范围)

解析

主要代码文件:

1)pkg.kubelet.oom.oom_watcher_linux.go

oom_watcher主要描述了kubelet是如何接受并log系统产生的oom事件的

2)oom_watcher_linux.go:

NewWatcher方法会返回一个Watcher类型的对象,该对象包含recorder和oomStreamer。recorder用于记录,oomStreamer是一个OomParser(Cadvisor)类型的对象, 用于将OomInstance类型的对象写入outStream管道(channel)

package oom

import (
    "fmt"

    v1 "k8s.io/api/core/v1"
    "k8s.io/apimachinery/pkg/util/runtime"
    "k8s.io/client-go/tools/record"
    "k8s.io/klog/v2"

    "github.com/google/cadvisor/utils/oomparser"
)

// streamer 接口定义了一个 StreamOoms 函数,
// 它接收一个 oomparser.OomInstance 类型的 channel,存储OomInstance类型数据
type streamer interface {
    StreamOoms(chan<- *oomparser.OomInstance)
}

var _ streamer = &oomparser.OomParser{}

type realWatcher struct {
    recorder    record.EventRecorder
    oomStreamer streamer
}

var _ Watcher = &realWatcher{}

// NewWatcher creates and initializes a OOMWatcher backed by Cadvisor as
// the oom streamer.
// 启动一个新的OOM watcher, 参数是一个 EventRecorder
// EventRecorder 是一个能够存储event并记录到一个queue里的Interface
// 函数声明中前面的括号里面是函数形参列表;后面的括号里面是函数返回值列表。
func NewWatcher(recorder record.EventRecorder) (Watcher, error) {
// 生成一个oomStreamer,由cadvisor的oomparser创建
    oomStreamer, err := oomparser.New()
    if err != nil {
        return nil, err
    }
// 生成一个watcher,包含上面的两个对象: recorder 和 oomStreamer
    watcher := &realWatcher{
        recorder:    recorder,
        oomStreamer: oomStreamer,
    }

    return watcher, nil
}

// Start watches for system oom's and records an event for every system oom encountered.
func (ow *realWatcher) Start(ref *v1.ObjectReference) error {
// 这段代码用来创建一个outStream channel,它是一个由 oomparser.OomInstance 
// 类型指针元素的channel,并可以向channel中传输10个元素。接着就启动了一个goroutine,
// 该goroutine调用ow.oomStreamer.StreamOoms方法并将outStream作为参数传入。该方法会往outStream channel中不断地写数据(即oom instance对象)
    outStream := make(chan *oomparser.OomInstance, 10)
    go ow.oomStreamer.StreamOoms(outStream)

    go func() {
        defer runtime.HandleCrash()
// 从outStream 读取event,并根据判断条件做是否oom。并输出相应的log
        for event := range outStream {
            if event.VictimContainerName == recordEventContainerName {
                klog.V(1).InfoS("Got sys oom event", "event", event)
                eventMsg := "System OOM encountered"
                if event.ProcessName != "" && event.Pid != 0 {
                    eventMsg = fmt.Sprintf("%s, victim process: %s, pid: %d", eventMsg, event.ProcessName, event.Pid)
                }
                ow.recorder.Eventf(ref, v1.EventTypeWarning, systemOOMEvent, eventMsg)
            }
        }
        klog.ErrorS(nil, "Unexpectedly stopped receiving OOM notifications")
    }()
    return nil
}

再来看下kubelet.go中如何应用
kubelet.go:
创建oomWatcher

# 通过上面的NewWathcher方法创建一个新的oomWatcher
oomWatcher, err := oomwatcher.NewWatcher(kubeDeps.Recorder)
# 如果创建新的oomWatcher报错,则查看原因
if err != nil {
    if libcontaineruserns.RunningInUserNS() {
        if utilfeature.DefaultFeatureGate.Enabled(features.KubeletInUserNamespace) {
            // oomwatcher.NewWatcher returns "open /dev/kmsg: operation not permitted" error,
            // when running in a user namespace with sysctl value `kernel.dmesg_restrict=1`.
            klog.V(2).InfoS("Failed to create an oomWatcher (running in UserNS, ignoring)", "err", err)
            oomWatcher = nil
        } else {
            klog.ErrorS(err, "Failed to create an oomWatcher (running in UserNS, Hint: enable KubeletInUserNamespace feature flag to ignore the error)")
            return nil, err
        }
    } else {
        return nil, err
    }
}

启动oomWatcher

    // Start out of memory watcher.
    if kl.oomWatcher != nil {
        if err := kl.oomWatcher.Start(kl.nodeRef); err != nil {
            return fmt.Errorf("failed to start OOM watcher: %w", err)
        }
    }

图示

在这里插入图片描述

上面的代码体现的就是如下流程,下图较完整描述了oom事件是如何被cAdvisor读取最终输出到node的事件的。

图片参考:启动oomWatcher

参考

1)https://www.jianshu.com/p/ef524b0b0119

2)启动oomWatcher

Logo

K8S/Kubernetes社区为您提供最前沿的新闻资讯和知识内容

更多推荐