介绍

日志收集的目的:

  1. 分布式日志数据统一收集,实现集中式查询和管理。
  2. 能够采集多种来源的日志数据
  3. 能够稳定的把日志数据解析并过滤传输到存储系统,便于故障排查
  4. 安全信息和事件管理
  5. 报表统计及展示功能

日志收集的价值:

  1. 日志查询,问题排查
  2. 应用日志分析,错误报警
  3. 性能分析,用户行为分析

日志收集流程:

在这里插入图片描述

日志收集方式简介:

官方地址:https://kubernetes.io/zh/docs/concepts/cluster-administration/logging/

1、node节点收集,基于daemonset部署日志收集进程,实现json-file类型(标准输出/dev/stdout、错误输出/dev/stderr)日志收集。

2、使用sidcar容器(一个pod多容器)收集当前pod内一个或者多个业务容器的日志(通常基于emptyDir实现业务容器与sidcar之间的日志共享)。

3、在容器内置日志收集服务进程。

daemonset收集日志详介:

基于daemonset运行日志收集服务,主要收集以下两种类型的日志:

  1. node节点收集,基于daemonset部署日志收集进程,实现json-file类型(标准输出/dev/stdout、错误输出/dev/stderr)日志收集,即应用程序产生的标准输出和错误输出的日志。
  2. 宿主机系统日志等以日志文件形式保存的日志。

优缺点:
优点是资源耗费少,部署在node节点上,对应用无侵入。
缺点是只适合容器内应用日志的标准输出

containerd :
日志存储路径:
真实路径:/var/log/pods/$CONTAINER_NAMES
软连接:同时kubelet也会在/var/log/containers目录下创建软链接指向/var/log/pods/$CONTAINER_NAMES
日志配置参数:
配置文件:/etc/systemd/system/kubelet.service
配置参数:
–container-log-max-files=5 \
–container-log-max-size=“100Mi” \
–logging-format=“json” \

清理无用的镜像
nerdctl -n k8s.io image prune --all -f #不用交互

docker:
真实路径:/var/lib/docker/containers/$CONTAINERID
软连接:kubelet会在/var/log/pods和/var/log/containers创建软连接指向/var/lib/docker/containers/$CONTAINERID
配置文件:/etc/docker/daemon.json
参数:
“log-driver”: “json-file”,
“log-opts”: {
“max-file”: “5”,
“max-size”: “100m”
}

配置详解

filebeat-service-account.yaml:

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: filebeat-serviceaccount-clusterrole
  labels:
    k8s-app: filebeat-serviceaccount-clusterrole
rules:
- apiGroups: [""] # "" indicates the core API group
  resources:
  - namespaces
  - pods
  - nodes
  verbs:
  - get
  - watch
  - list

---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: filebeat-serviceaccount-clusterrolebinding
subjects:
- kind: ServiceAccount
  name: filebeat-serviceaccount-clusterrole
  namespace: application
roleRef:
  kind: ClusterRole
  name: filebeat-serviceaccount-clusterrole
  apiGroup: rbac.authorization.k8s.io
---
apiVersion: v1
kind: ServiceAccount
metadata:
  name: filebeat
  namespace: application
    labels:
      k8s-app: filebeat-serviceaccount-clusterrole

filebeat.yaml:

---
apiVersion: v1
kind: ConfigMap
metadata:
  name: filebeat-config
  namespace: application
  labels:
    k8s-app: filebeat
data:
  filebeat.yml: |-
    filebeat.autodiscover:
      providers:
      - type: kubernetes
        node: ${NODE_NAME}
        templates:
          - config:
            - type: container
              paths:
              #docker  - /var/lib/docker/containers/*/*-json.log 
                - /var/log/pods/*/*/*.log
              fields:
                topic: topic-filebeat-applog
    filebeat.inputs:
    - type: log
      paths:
        - /var/log/*.log
      fields:
        topic: topic-filebeat-syslog
    output.kafka:
      enabled: true
      hosts: ${KAFKA_HOST}
      topic: '%{[fields.topic]}'              
---
apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: filebeat
  namespace: application
  labels:
    k8s-app: filebeat
spec:
  selector:
    matchLabels:
      k8s-app: filebeat
  template:
    metadata:
      labels:
        k8s-app: filebeat
    spec:
      tolerations:
      - key: node-role.kubernetes.io/master
        effect: NoSchedule
      serviceAccountName: filebeat
      terminationGracePeriodSeconds: 30
      hostNetwork: true
      dnsPolicy: ClusterFirstWithHostNet
      containers:
      - name: filebeat
        image: 	elastic/filebeat:7.16.3
        args: [
          "-c", "/etc/filebeat.yml",
          "-e",
        ]
        env:
        - name: KAFKA_HOST
          value: '["192.168.2.131:9092","192.168.2.132:9092","192.168.2.133:9092"]'
        - name: NODE_NAME
          valueFrom:
            fieldRef:
              fieldPath: spec.nodeName
        securityContext:
          runAsUser: 0
          # If using Red Hat OpenShift uncomment this:
          #privileged: true
        resources:
          limits:
            cpu:  1000m
            memory: 2000Mi
          requests:
            cpu: 100m
            memory: 200Mi
        volumeMounts:
        - name: config
          mountPath: /etc/filebeat.yml
          readOnly: true
          subPath: filebeat.yml
        - name: data
          mountPath: /usr/share/filebeat/data
        - name: varlibdockercontainers
         #mountPath: /var/lib/docker/containers #docker挂载路径
          mountPath: /var/log/pods
          readOnly: true
        - name: varlog
          mountPath: /var/log
          readOnly: true
      volumes:
      - name: config
        configMap:
          defaultMode: 0640
          name: filebeat-config
      - name: varlibdockercontainers
        hostPath:
          #path: /var/lib/docker/containers #docker的宿主机日志路径
          path: /var/log/pods
      - name: varlog
        hostPath:
          path: /var/log
      # data folder stores a registry of read status for all files, so we don't send everything again on a Filebeat pod restart
      - name: data
        hostPath:
          # When filebeat runs as non-root user, this directory needs to be writable by group (g+w).
          path: /var/lib/filebeat-data
          type: DirectoryOrCreate

logstash:
vim /etc/logstash/pipelines.yml

- pipeline.id: applog
  path.config: "/etc/logstash/conf.d/logstash_applog.conf"
- pipeline.id: syslog
  path.config: "/etc/logstash/conf.d/logstash_syslog.conf"

vim /etc/logstash/conf.d/logstash_applog.conf


input {
  kafka {
    bootstrap_servers => "192.168.2.131:9092,192.168.2.132:9092,192.168.2.133:9092"
    topics => ["topic-filebeat-applog"]
    codec => "json"
  }
}

output {
  elasticsearch {
    hosts => ["http://192.168.2.135:9200"]
    index => "topic-filebeat-applog-%{+YYYY.MM.dd}"
    #user => "elastic"
    #password => "changeme"
  }
}

vim /etc/logstash/conf.d/logstash_syslog.conf

input {
  kafka {
    bootstrap_servers => "192.168.2.131:9092,192.168.2.132:9092,192.168.2.133:9092"
    topics => ["topic-filebeat-syslog"]
    codec => "json"
  }
}

output {
  elasticsearch {
    hosts => ["http://192.168.2.135:9200"]
    index => "topic-filebeat-syslog-%{+YYYY.MM.dd}"
    #user => "elastic"
    #password => "changeme"
  }
}

![在这里插入图片描述](https://img-blog.csdnimg.cn/f0765c1ca11d4e93a2c05151425a8a23.png
在这里插入图片描述
在这里插入图片描述
在这里插入图片描述

sidecar模式收集日志

使用 sidcar容器(一个pod多容器)收集当前pod内一个或者多个业务容器的日志(通常基于emptyDir实现业务容
器与 sidcar之间的日志共享)。

优缺点:
缺点是:node上会有两份相同的日志,一份是应用标准输出的,一份是sidecar收集的,对磁盘造成很大的浪费

构建filebeat-sidecar容器

Dockerfile

FROM ubuntu:20.04
WORKDIR /usr/local
ADD filebeat-7.16.3-linux-x86_64.tar.gz .
RUN ln -sv filebeat-7.16.3-linux-x86_64 filebeat 
RUN cd filebeat && mkdir conf 
COPY filebeat.yml /usr/local/filebeat/conf
ENTRYPOINT ["/usr/local/filebeat/filebeat","-c","/usr/local/filebeat/conf/filebeat.yml"]

filebeat.yml

filebeat.inputs:                           
- type: log                                
  paths:
   - /var/log/nginx/access.log       
  fields:  #自定义属性输出到output                            
    type: access-log                         
  enabled: true
  backoff: 1s # backoff 选项定义 Filebeat 在达到 EOF 之后再次检查文件之间等待的时间
  backoff_factor: 2 #指定backoff尝试等待时间几次,默认是2
  close_inactive: 5m  #harvester读取到文件末尾后,空闲5m, 该值设置时,要大于正常写入数据的频率。
  encoding: plain #编码,默认无,plain(不验证或者改变任何输入)
  harvester_buffer_size: 131072 #每个harvester使用的cache大小,单位为byte
  max_backoff: 5s #在达到EOF之后再次检查文件之前Filebeat等待的最长时间
  max_bytes: 10485760 #单文件最大收集的字节数,单文件超过此字节数后的字节将被丢弃,默认10MB,需要增大,保持与日志输出配置的单文件最大值一致即可
  scan_frequency: 10s  #prospector扫描新文件的时间间隔,默认10秒
  max_procs: 1 #使用最大的cpu核数
  tail_lines: true #可以配置为true和false。配置为true时,filebeat将从新文件的最后位置开始读取,如果配合日志轮循使用,新文件的第一行将被跳过
- type: log                                 
  paths:
     - /var/log/nginx/error.log       
  fields:                                   
    type: error-log                        
  enabled: true
  backoff: 1s
  backoff_factor: 2
  close_inactive: 1h
  encoding: plain
  harvester_buffer_size: 131072 
  max_backoff: 10s
  max_bytes: 10485760
  scan_frequency: 10s
  max_procs: 1 #使用最大的cpu核数
  tail_lines: true

output.kafka:
  enable: true
  hosts: 
  - 192.168.2.131:9092
  - 192.168.2.132:9092
  - 192.168.2.133:9092
  partition.round_robin:
    reachable_only: false
  topic: 'applog-sidecar-nginx'
  required_acks: 1
  compression: gzip
  max_message_bytes: 1000000

build-command.sh

#!/bin/bash
#docker build -t reg.zhangjw.com/library/filebeat-sidecar-nginx:v1.0 .
#docker push reg.zhangjw.com/library/filebeat-sidecar-nginx:v1.0
nerdctl build -t reg.zhangjw.com/library/filebeat-sidecar-nginx:v1.0 .
nerdctl push reg.zhangjw.com/library/filebeat-sidecar-nginx:v1.0

filebeat-sidecar-nginx.yml

kind: Deployment
apiVersion: apps/v1
metadata:
  labels:
    app: sidecar-nginx-deployment-label
  name: sidecar-nginx-deployment
  namespace: demo
spec:
  replicas: 1
  selector:
    matchLabels:
      app: sidecar-nginx-selector
  template:
    metadata:
      labels:
        app: sidecar-nginx-selector
    spec:
      containers:
      - name: filebeat-sidecar-nginx
        image: reg.zhangjw.com:8888/library/filebeat-sidecar-nginx:v7.0
        imagePullPolicy: IfNotPresent
        env:
        - name: "KAFKA_SERVER"
          value: "192.168.2.131:9092,192.168.2.132:9092,192.168.2.133:9092"
        - name: "TOPIC_ID"
          value: "filebeat-sidecar-nginx"
        - name: "CODEC"
          value: "json"
        volumeMounts:
        - name: applogs
          mountPath: /var/log/nginx
      - name: sidecar-nginx-container
        image: nginx
        imagePullPolicy: Always
        ports:
        - containerPort: 80
          protocol: TCP
          name: http
        - containerPort: 443
          protocol: TCP
          name: https
        volumeMounts:
        - name: applogs
          mountPath: /var/log/nginx
      volumes:
      - name: applogs
        emptyDir: {}

---
kind: Service
apiVersion: v1
metadata:
  labels:
    app: sidecar-nginx-service-label
  name: sidecar-nginx-service
  namespace: demo
spec:
  type: NodePort
  ports:
  - name: http
    port: 80
    protocol: TCP
    targetPort: 80
    nodePort: 30094
  - name: https
    port: 443
    protocol: TCP
    targetPort: 443
    nodePort: 30095
  selector:
    app: sidecar-nginx-selector

filebeat-sidecar-nginx-logstash.conf

input {
  kafka {
    bootstrap_servers => "192.168.2.131:9092,192.168.2.132:9092,192.168.2.133:9092"
    topics => ["applog-sidecar-nginx"]
    codec => "json"
  }
}

filter{
  grok {
    match => {
          "message" => "%{IP:clientip} - (%{USERNAME:user}|-) \[%{HTTPDATE:timestamp}\] \"%{WORD:request_method} %{NOTSPACE:request_uri} HTTP/%{NUMBER:http_protocol}\" %{NUMBER:http_status} %{NUMBER:body_bytes_sent} \"-\" \"%{GREEDYDATA:http_user_agent}\" \"-\""
 }}
}

output {
  if [fields][type] == "access-log" {
    elasticsearch {
      hosts => ["http://192.168.2.135:9200"]
      index => "topic-sidecar_accesslog-%{+YYYY.MM.dd}"
  }}
  
  if [fields][type] == "error-log" {
    elasticsearch {
      hosts => ["http://192.168.2.135:9200"]
      index => "topic-sidecar_errorlog-%{+YYYY.MM.dd}"
  }}

  else {
    elasticsearch {
      hosts => ["http://192.168.2.135:9200"]
      index => "topic-sidecar_unknownlog-%{+YYYY.MM.dd}"
  }}

 }

vim /etc/logstash/pipelines.yml

- pipeline.id: sidecar-ningx-log
  path.config: "/etc/logstash/conf.d/filebeat-sidecar-nginx-logstash.conf"

在这里插入图片描述

在这里插入图片描述
在这里插入图片描述
在这里插入图片描述
kibana汇总面板展示:
在这里插入图片描述

内置日志采集进程

内置日志采集进程与sidecar方式采集类似,只是把采集程序打在业务容器镜像中,这种方式耦合性太高,一般不使用,此处不再详述

Logo

权威|前沿|技术|干货|国内首个API全生命周期开发者社区

更多推荐