k8s Helm安装EFK

之前相信大家听过ELK工具栈，E: elasticsearch、L：logstash、K: kabana，其中logstash的缺点太重量级，Logstash性能低、资源消耗比较多、并且不支持消息队列缓存及存在数据丢失等问题，随后就有Fluentd出现，相比它更易用、资源消耗更少、性能更高，在数据处理上更高效可靠，受到企业欢迎，成为logstash的替代方案，亚马逊称其为数据收集的最佳方案EFK。

王羲之的之

424人浏览 · 2022-08-15 14:24:41

王羲之的之 · 2022-08-15 14:24:41 发布

1. 系统环境

系统版本号CentOS 7.6
docker Client版本号18.09.7， Server版本号18.09.7
k8s版本号v1.16.2
helm Client版本号v2.13.1，Server版本号v2.13.1

确认heml镜像源并更新镜像仓库

[root@ops1 test]# helm repo add stable http://mirror.azure.cn/kubernetes/charts/

[root@ops1 test]# helm repo list

NAME URL

local http://127.0.0.1:8879/charts

stable http://mirror.azure.cn/kubernetes/charts/

incubator http://mirror.azure.cn/kubernetes/charts-incubator/

[root@ops1 test]# helm repo update

[root@ops1 test]# kubectl create namespace efk

2. 快速安装EFK

注意我用的存储是storageClass "nfs2"，请注意修改。

2.1 helm安装elasticsearch

[root@ops1 test]# cat <<EOF> elasticsearch-values.yaml

image:

  repository: "docker.elastic.co/elasticsearch/elasticsearch-oss"

  # repository: "registry.cn-beijing.aliyuncs.com/wangzt/k8s/elasticsearch-oss:6.7.0" 个人镜像仓库

  tag: "6.7.0"

client:

  serviceType: "NodePort"

  httpNodePort: 30920

master:

  persistence:

    enabled: false  # elasticsearch-master使用pvc永久存储，如果是测试，可以换成false

    storageClass: "nfs2"

data:

  persistence:

    enabled: false  #elasticsearch-data使用pvc永久存储，如果是测试，可以换成false

    storageClass: "nfs2"

EOF

[root@ops1 test]# helm install elasticsearch -f elasticsearch-values.yaml --namespace=efk --version=1.32.4 stable/elasticsearch

[root@ops1 test]# kubectl get all -n efk

# 等到全部pod显示正常后，访问k8s工作节点

[root@ops1 test]# curl http://127.0.0.1:30920/

{

  "name" : "elasticsearch-client-65bfdd647c-kl9zb",

  "cluster_name" : "elasticsearch",

   ...

  "tagline" : "You Know, for Search"

}

2.2 helm安装fluented

# 配置可不加，我是为了日志量太大，和添加监控显示的

[root@ops1 test]# cat <<EOF> fluentd-values.yaml

image:

  repository: gcr.io/google-containers/fluentd-elasticsearch # 默认地址可能不可用

  # repository: registry.cn-beijing.aliyuncs.com/wangzt/kubernetes/fluentd-elasticsearch

elasticsearch:

  buffer_chunk_limit: 32M # 内存缓冲区

service: #启动监控monitor-agent

  type: NodePort

  ports:

    - name: "monitor-agent"

      port: 24231

env:

  OUTPUT_BUFFER_CHUNK_LIMIT: "32M" # 设置buffer缓存区大小

podAnnotations: # 让prometheus监控monitor-agent

  prometheus.io/scrape: "true"

  prometheus.io/port: "24231"

tolerations: #监控master

  - key: node-role.kubernetes.io/master

    operator: Exists

    effect: NoSchedule

EOF

[root@ops1 test]# helm install fluentd-elasticsearch -f fluentd-values.yaml --namespace=efk --version=2.0.7 stable/fluentd-elasticsearch

[root@ops1 test]# kubectl get pod -n efk | grep fluentd

#等服务全部正常后，可以看到有索引产生

[root@ops1 test]# curl http://127.0.0.1:30920/_cat/indices

green open logstash-2020.03.18 om-LUsRXQUGcBfww4ioa3w 5 1 26071 0 27.9mb 13.9mb

green open logstash-2020.03.16 3RAWut3DQkqlLWgQu9DxSQ 5 1 22269 0 23.7mb 11.8mb

2.3 helm安装kibana

[root@ops1 test]# cat <<EOF> kibana-values.yaml

files:

  kibana.yml:

    elasticsearch.hosts: http://elasticsearch-client:9200

service:

  type: NodePort

  nodePort: 30922

persistentVolumeClaim:

  enabled: true  # 如果不使用pvc永久存储，只做测试就改为false

  storageClass: "nfs2"

EOF

[root@ops1 test]# helm install kibana -f kibana-values.yaml --namespace=efk --version=3.2.6 stable/kibana

[root@ops1 test]# kubectl get pod -n efk | grep kibana

kibana-7bf95fb48-nb2z4                  1/1     Running            0          36s

2.4等待服务都起来，访问界面即可

http://192.168.70.122:30922/app/kibana#/home?_g=()

简介

对于单机版的集群式应用来说，如果遇到故障我们可以登录上服务器上通过查看日志文件的方式进行查看日志，但对多规模的分布式多节点应用来说，日志分散在多个节点，这显然通过登录上每个节点服务器通过命令查看日志不可取的，就需要统一的日志管理平台收集各各节点上的日志集中管理

之前相信大家听过ELK工具栈，E: elasticsearch、L：logstash、K: kabana，其中logstash的缺点太重量级，Logstash性能低、资源消耗比较多、并且不支持消息队列缓存及存在数据丢失等问题，随后就有Fluentd出现，相比它更易用、资源消耗更少、性能更高，在数据处理上更高效可靠，受到企业欢迎，成为logstash的替代方案，亚马逊称其为数据收集的最佳方案EFK。

一、kubernetes和docker都有哪些日志


以上日志都是默认日志，对日志不进行配置就这样
kubectl logs和docker logs一样的，都是查看容器内部应用的日志
对于容器内部应用产生stdout和stderr日志一定会被引擎拦截，如果在应用中通过代码把日志保存到容器内部，那么会产生两份日志；一份原生的日志（代码生成在项目中），一份经过加工过的日志比如json-file保存在/var/lib/docker/containers/<容器>/<容器>.log中。

二、kubernetes日志管理最佳实践EFK(采集)

kubernetes efk三大组件配方
https://github.com/kubernetes/kubernetes/tree/v1.16.1/cluster/addons/fluentd-elasticsearch
复制代码
其实只需要安装fluentd-es到你的k8s集群即可，完全可以将elasticsearch和kibana单独部署，如果有票子可以直接买云厂商现成的，需要的修改fluentd-es的configMap中output：
output.conf: |-
    <match **>
      @id elasticsearch
      @type elasticsearch
      ...
      ...
      host elasticsearch-logging # 替换集群外部的host
      port 9200
      user elastic # 如果开启账户验证
      password elastic # 如果开启账户验证
      ...
      ...
    </match>
复制代码

参考www.cnblogs.com/cocowool/p/…

三、问题排查
1、fluentd-es缓冲区
查看fluentd-es pod日志文件，有个[warn]：
[elasticsearch] failed to write data into buffer by buffer overflow action=:block\n

fluent报错2:buffer flush took longer time than slow_flush_log_threshold，所以增大buffer值
2020-03-30 03:27:16 +0000 [warn]: [elasticsearch] failed to flush the buffer. retry_time=18 next_retry_seconds=2020-03-30 03:27:45 +0000 chunk="5a2088ffc6b9665d350a939c7e599407" error_class=Fluent::Plugin::ElasticsearchOutput::ConnectionFailure error="Can not reach Elasticsearch cluster ({:host=>\"elasticsearch-client\", :port=>9200, :scheme=>\"http\"})! connect_write timeout reached"
2020-03-30 03:27:47 +0000 [info]: [elasticsearch] Connection opened to Elasticsearch cluster => {:host=>"elasticsearch-client", :port=>9200, :scheme=>"http"}
2020-03-30 03:27:47 +0000 [info]: [elasticsearch] Connection opened to Elasticsearch cluster => {:host=>"elasticsearch-client", :port=>9200, :scheme=>"http"}
2020-03-30 03:27:48 +0000 [warn]: [elasticsearch] retry succeeded. chunk_id="5a208901af1d77c4c45817d20e3d640a"
2020-03-30 03:50:05 +0000 [warn]: [elasticsearch] buffer flush took longer time than slow_flush_log_threshold: elapsed_time=29.358517162967473 slow_flush_log_threshold=20.0 plugin_id="elasticsearch"
2020-03-30 03:50:05 +0000 [warn]: [elasticsearch] buffer flush took longer time than slow_flush_log_threshold: elapsed_time=27.023319482104853 slow_flush_log_threshold=20.0 plugin_id="elasticsearch"



我们看采集具体流程

我们可以修改fluentd-es configMap配置文件中的buffer.chunk_limit_size
output.conf: |-
    <match **>
      @id elasticsearch
      @type elasticsearch
      @log_level info
      type_name _doc
      include_tag_key true
      host es-cn-36200000.public.elasticsearch.aliyuncs.com
      port 9200
      user elastic
      password elastic
      logstash_format true
      <buffer>
        @type file
        path /var/log/fluentd-buffers/kubernetes.system.buffer
        flush_mode interval
        ...
        chunk_limit_size 100M # 由原来的2M增加到100M，一般2M就够了，只是为了测试 ：）
        ...
        overflow_action block # 我们看到，这个block就是溢出标志
      </buffer>
    </match>
复制代码



kibana报错1：版本不匹配
{"statusCode":503,"error":"Service Unavailable","message":"Service Unavailable"}


{"publish_address":"10.100.252.234:9200"},"ip":"10.100.252.234"},{"version":"6.8.6","http":{"publish_address":"10.100.35.208:9200"},"ip":"10.100.35.208"},{"version":"6.8.6","http":{"publish_address":"10.100.252.235:9200"},"ip":"10.100.252.235"},{"version":"6.8.6","http":{"publish_address":"10.100.35.205:9200"},"ip":"10.100.35.205"},{"version":"6.8.6","http":{"publish_address":"10.100.35.211:9200"},"ip":"10.100.35.211"}],"message":"You're running Kibana 6.7.0 with some different versions of Elasticsearch. Update Kibana or Elasticsearch to the same version to prevent compatibility issues: v6.8.6 @ 10.100.35.206:9200 (10.100.35.206), v6.8.6 @ 10.100.252.237:9200 (10.100.252.237), v6.8.6 @ 10.100.252.234:9200 (10.100.252.234), v6.8.6 @ 10.100.35.208:9200 (10.100.35.208), v6.8.6 @ 10.100.252.235:9200 (10.100.252.235), v6.8.6 @ 10.100.35.205:9200 (10.100.35.205), v6.8.6 @ 10.100.35.211:9200 (10.100.35.211)"}


elasticsearch入库错误：gc overhead导致数据节点脱离集群

[root@ops1 test]# [root@ops1 test]# cat <<EOF> elasticsearch-values.yaml
image:
  repository: "docker.elastic.co/elasticsearch/elasticsearch-oss"
  # repository: "registry.cn-beijing.aliyuncs.com/wangzt/k8s/elasticsearch-oss:6.7.0" 个人镜像仓库
  tag: "6.7.0"
client:
  serviceType: "NodePort"
  httpNodePort: 30920
master:
  persistence:
    enabled: true  #  elasticsearch-master使用pvc永久存储，如果是测试，可以换成false 
    storageClass: "nfs2"    
data:
  persistence:
    enabled: true  #elasticsearch-data使用pvc永久存储，如果是测试，可以换成false 
    storageClass: "nfs2"   
EOF
[root@ops1 test]# helm install --name elasticsearch -f elasticsearch-values.yaml  --namespace=efk --version=1.32.4 stable/elasticsearch 
NAME:   elasticsearch
LAST DEPLOYED: Mon Mar 30 20:41:19 2020
NAMESPACE: efk
STATUS: DEPLOYED

RESOURCES:
==> v1/ConfigMap
NAME                DATA  AGE
elasticsearch       4     2s
elasticsearch-test  1     1s

==> v1/Deployment
NAME                  READY  UP-TO-DATE  AVAILABLE  AGE
elasticsearch-client  0/2    2           0          1s

==> v1/Pod(related)
NAME                                   READY  STATUS    RESTARTS  AGE
elasticsearch-client-65bfdd647c-f4mmc  0/1    Init:0/1  0         1s
elasticsearch-client-65bfdd647c-kl9zb  0/1    Init:0/1  0         1s
elasticsearch-data-0                   0/1    Pending   0         1s
elasticsearch-master-0                 0/1    Pending   0         1s

==> v1/Service
NAME                     TYPE       CLUSTER-IP    EXTERNAL-IP  PORT(S)         AGE
elasticsearch-client     NodePort   10.96.29.121  <none>       9200:30920/TCP  1s
elasticsearch-discovery  ClusterIP  None          <none>       9300/TCP        1s

==> v1/ServiceAccount
NAME                  SECRETS  AGE
elasticsearch-client  1        1s
elasticsearch-data    1        1s
elasticsearch-master  1        1s

==> v1/StatefulSet
NAME                  READY  AGE
elasticsearch-data    0/2    1s
elasticsearch-master  0/3    1s


NOTES:
The elasticsearch cluster has been installed.

Elasticsearch can be accessed:

  * Within your cluster, at the following DNS name at port 9200:

    elasticsearch-client.efk.svc

  * From outside the cluster, run these commands in the same shell:

    export NODE_PORT=$(kubectl get --namespace efk -o jsonpath="{.spec.ports[0].nodePort}" services elasticsearch-client)
    export NODE_IP=$(kubectl get nodes --namespace efk -o jsonpath="{.items[0].status.addresses[0].address}")
    echo http://$NODE_IP:$NODE_PORT
    
    
[root@ops1 test]# kubectl get all -n efk
NAME                                        READY   STATUS    RESTARTS   AGE
pod/elasticsearch-client-65bfdd647c-f4mmc   1/1     Running   0          5m32s
pod/elasticsearch-client-65bfdd647c-kl9zb   1/1     Running   0          5m32s
pod/elasticsearch-data-0                    1/1     Running   0          5m32s
pod/elasticsearch-data-1                    1/1     Running   0          2m56s
pod/elasticsearch-master-0                  1/1     Running   0          5m32s
pod/elasticsearch-master-1                  1/1     Running   0          3m29s
pod/elasticsearch-master-2                  1/1     Running   0          2m46s

NAME                              TYPE        CLUSTER-IP     EXTERNAL-IP   PORT(S)          AGE
service/elasticsearch-client      NodePort    10.96.29.121   <none>        9200:30920/TCP   5m33s
service/elasticsearch-discovery   ClusterIP   None           <none>        9300/TCP         5m33s

NAME                                   READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/elasticsearch-client   2/2     2            2           5m33s

NAME                                              DESIRED   CURRENT   READY   AGE
replicaset.apps/elasticsearch-client-65bfdd647c   2         2         2       5m33s

NAME                                    READY   AGE
statefulset.apps/elasticsearch-data     2/2     5m33s
statefulset.apps/elasticsearch-master   3/3     5m33s

# 等到全部pod显示正常后，访问k8s工作节点
[root@ops1 test]# curl http://127.0.0.1:30920/               
{
  "name" : "elasticsearch-client-65bfdd647c-kl9zb",
  "cluster_name" : "elasticsearch",
  "cluster_uuid" : "a5Bf3MkTSXWPGYn_mIC1bg",
  "version" : {
    "number" : "6.7.0",
    "build_flavor" : "oss",
    "build_type" : "docker",
    "build_hash" : "8453f77",
    "build_date" : "2019-03-21T15:32:29.844721Z",
    "build_snapshot" : false,
    "lucene_version" : "7.7.0",
    "minimum_wire_compatibility_version" : "5.6.0",
    "minimum_index_compatibility_version" : "5.0.0"
  },
  "tagline" : "You Know, for Search"
}

3. helm安装EFk详细步骤
3.1 helm安装elasticsearch详细步骤

3. helm安装EFk详细步骤
3.1 helm安装elasticsearch详细步骤
[root@ops1 test]# helm search elasticsearch
stable/elasticsearch            1.32.4          6.8.6           Flexible and powerful open source, distributed real-time ...
[root@ops1 test]# helm fetch stable/elasticsearch --version 1.32.4
[root@ops1 test]# tar -zxf elasticsearch-1.32.4.tgz 
[root@ops1 test]# cd elasticsearch/   # 查看具体的elasticsearch配置文件吧



这个-oss表示不包括X-Pack的ES镜像，这也是在6.0+版本后划分的，剩下两种类型是basic(默认)和platinum
[root@ops1 test]# [root@ops1 test]# cat <<EOF> elasticsearch-values.yaml
image:
  repository: "docker.elastic.co/elasticsearch/elasticsearch-oss"
  # repository: "registry.cn-beijing.aliyuncs.com/wangzt/k8s/elasticsearch-oss:6.7.0" 个人镜像仓库
  tag: "6.7.0"
client:
  serviceType: "NodePort"
  httpNodePort: 30920
master:
  persistence:
    enabled: true  #  elasticsearch-master使用pvc永久存储，如果是测试，可以换成false 
    storageClass: "nfs2"    
data:
  persistence:
    enabled: true  #elasticsearch-data使用pvc永久存储，如果是测试，可以换成false 
    storageClass: "nfs2"   
EOF
[root@ops1 test]# helm install --name elasticsearch -f elasticsearch-values.yaml  --namespace=efk --version=1.32.4 stable/elasticsearch 
NAME:   elasticsearch
LAST DEPLOYED: Mon Mar 30 20:41:19 2020
NAMESPACE: efk
STATUS: DEPLOYED

RESOURCES:
==> v1/ConfigMap
NAME                DATA  AGE
elasticsearch       4     2s
elasticsearch-test  1     1s

==> v1/Deployment
NAME                  READY  UP-TO-DATE  AVAILABLE  AGE
elasticsearch-client  0/2    2           0          1s

==> v1/Pod(related)
NAME                                   READY  STATUS    RESTARTS  AGE
elasticsearch-client-65bfdd647c-f4mmc  0/1    Init:0/1  0         1s
elasticsearch-client-65bfdd647c-kl9zb  0/1    Init:0/1  0         1s
elasticsearch-data-0                   0/1    Pending   0         1s
elasticsearch-master-0                 0/1    Pending   0         1s

==> v1/Service
NAME                     TYPE       CLUSTER-IP    EXTERNAL-IP  PORT(S)         AGE
elasticsearch-client     NodePort   10.96.29.121  <none>       9200:30920/TCP  1s
elasticsearch-discovery  ClusterIP  None          <none>       9300/TCP        1s

==> v1/ServiceAccount
NAME                  SECRETS  AGE
elasticsearch-client  1        1s
elasticsearch-data    1        1s
elasticsearch-master  1        1s

==> v1/StatefulSet
NAME                  READY  AGE
elasticsearch-data    0/2    1s
elasticsearch-master  0/3    1s


NOTES:
The elasticsearch cluster has been installed.

Elasticsearch can be accessed:

  * Within your cluster, at the following DNS name at port 9200:

    elasticsearch-client.efk.svc

  * From outside the cluster, run these commands in the same shell:

    export NODE_PORT=$(kubectl get --namespace efk -o jsonpath="{.spec.ports[0].nodePort}" services elasticsearch-client)
    export NODE_IP=$(kubectl get nodes --namespace efk -o jsonpath="{.items[0].status.addresses[0].address}")
    echo http://$NODE_IP:$NODE_PORT
    
    
[root@ops1 test]# kubectl get all -n efk
NAME                                        READY   STATUS    RESTARTS   AGE
pod/elasticsearch-client-65bfdd647c-f4mmc   1/1     Running   0          5m32s
pod/elasticsearch-client-65bfdd647c-kl9zb   1/1     Running   0          5m32s
pod/elasticsearch-data-0                    1/1     Running   0          5m32s
pod/elasticsearch-data-1                    1/1     Running   0          2m56s
pod/elasticsearch-master-0                  1/1     Running   0          5m32s
pod/elasticsearch-master-1                  1/1     Running   0          3m29s
pod/elasticsearch-master-2                  1/1     Running   0          2m46s

NAME                              TYPE        CLUSTER-IP     EXTERNAL-IP   PORT(S)          AGE
service/elasticsearch-client      NodePort    10.96.29.121   <none>        9200:30920/TCP   5m33s
service/elasticsearch-discovery   ClusterIP   None           <none>        9300/TCP         5m33s

NAME                                   READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/elasticsearch-client   2/2     2            2           5m33s

NAME                                              DESIRED   CURRENT   READY   AGE
replicaset.apps/elasticsearch-client-65bfdd647c   2         2         2       5m33s

NAME                                    READY   AGE
statefulset.apps/elasticsearch-data     2/2     5m33s
statefulset.apps/elasticsearch-master   3/3     5m33s

# 等到全部pod显示正常后，访问k8s工作节点
[root@ops1 test]# curl http://127.0.0.1:30920/               
{
  "name" : "elasticsearch-client-65bfdd647c-kl9zb",
  "cluster_name" : "elasticsearch",
  "cluster_uuid" : "a5Bf3MkTSXWPGYn_mIC1bg",
  "version" : {
    "number" : "6.7.0",
    "build_flavor" : "oss",
    "build_type" : "docker",
    "build_hash" : "8453f77",
    "build_date" : "2019-03-21T15:32:29.844721Z",
    "build_snapshot" : false,
    "lucene_version" : "7.7.0",
    "minimum_wire_compatibility_version" : "5.6.0",
    "minimum_index_compatibility_version" : "5.0.0"
  },
  "tagline" : "You Know, for Search"
}
                                      

 

3.2 helm部署Fluent
查看fluents的chart文件
[root@ops1 test]# helm search Fluent
stable/fluentd-elasticsearch    2.0.7           2.3.2                           DEPRECATED! - A Fluentd Helm chart for Kubernetes with El...
[root@ops1 test]# helm fetch stable/fluentd-elasticsearch --version 2.0.7
[root@ops1 test]# tar -zxf fluentd-elasticsearch-2.0.7.tgz 
fluentd-elasticsearch/Chart.yaml
tar: fluentd-elasticsearch/Chart.yaml：不可信的旧时间戳 1970-01-01 08:00:00
[root@ops1 test]# cd fluentd-elasticsearch/
[root@ops1 fluentd-elasticsearch]# ls
Chart.yaml  OWNERS  README.md  templates  values.yaml


 helm部署fluent
# 配置可不加，我是为了日志量太大，和添加监控显示的
[root@ops1 test]# cat <<EOF> fluentd-values.yaml
image:
  repository: gcr.io/google-containers/fluentd-elasticsearch # 默认地址可能不可用
  # repository: registry.cn-beijing.aliyuncs.com/wangzt/kubernetes/fluentd-elasticsearch
elasticsearch:
  buffer_chunk_limit: 32M  # 内存缓冲区
service:  #启动监控monitor-agent
  type: NodePort
  ports:
    - name: "monitor-agent"
      port: 24231
env:      
  OUTPUT_BUFFER_CHUNK_LIMIT: "32M" # 设置buffer缓存区大小
podAnnotations: # 让prometheus监控monitor-agent
  prometheus.io/scrape: "true"
  prometheus.io/port: "24231"
tolerations:  #监控master
  - key: node-role.kubernetes.io/master
    operator: Exists
    effect: NoSchedule
EOF
[root@ops1 test]# helm install --name fluentd-elasticsearch -f fluentd-values.yaml \
            --namespace=efk --version=2.0.7 stable/fluentd-elasticsearch 
NAME:   fluentd-elasticsearch
LAST DEPLOYED: Mon Mar 30 21:07:44 2020
NAMESPACE: efk
STATUS: DEPLOYED

RESOURCES:
==> v1/ClusterRole
NAME                   AGE
fluentd-elasticsearch  0s

==> v1/ClusterRoleBinding
NAME                   AGE
fluentd-elasticsearch  0s

==> v1/ConfigMap
NAME                   DATA  AGE
fluentd-elasticsearch  6     0s

==> v1/DaemonSet
NAME                   DESIRED  CURRENT  READY  UP-TO-DATE  AVAILABLE  NODE SELECTOR  AGE
fluentd-elasticsearch  4        4        0      4           0          <none>         0s

==> v1/Pod(related)
NAME                         READY  STATUS             RESTARTS  AGE
fluentd-elasticsearch-6hw4z  0/1    ContainerCreating  0         0s
fluentd-elasticsearch-9zwnz  0/1    ContainerCreating  0         0s
fluentd-elasticsearch-k69rb  0/1    ContainerCreating  0         0s
fluentd-elasticsearch-ww8t9  0/1    ContainerCreating  0         0s

==> v1/Service
NAME                   TYPE      CLUSTER-IP     EXTERNAL-IP  PORT(S)          AGE
fluentd-elasticsearch  NodePort  10.96.175.162  <none>       24231:23431/TCP  0s

==> v1/ServiceAccount
NAME                   SECRETS  AGE
fluentd-elasticsearch  1        0s


NOTES:
1. To verify that Fluentd has started, run:

  kubectl --namespace=efk get pods -l "app.kubernetes.io/name=fluentd-elasticsearch,app.kubernetes.io/instance=fluentd-elasticsearch"

THIS APPLICATION CAPTURES ALL CONSOLE OUTPUT AND FORWARDS IT TO elasticsearch . Anything that might be identifying,
including things like IP addresses, container images, and object names will NOT be anonymized.
2. Get the application URL by running these commands:
  export NODE_PORT=$(kubectl get --namespace efk -o jsonpath="{.spec.ports[0].nodePort}" services fluentd-elasticsearch)
  export NODE_IP=$(kubectl get nodes --namespace efk -o jsonpath="{.items[0].status.addresses[0].address}")
  echo http://$NODE_IP:$NODE_PORT
 
[root@ops1 test]# kubectl get pod -n efk | grep fluentd
fluentd-elasticsearch-6hw4z             1/1     Running   0          117s
fluentd-elasticsearch-9zwnz             1/1     Running   0          117s
fluentd-elasticsearch-k69rb             1/1     Running   0          117s
fluentd-elasticsearch-ww8t9             1/1     Running   0          117s 
[root@ops1 test]# curl http://127.0.0.1:30920/_cat/indices
green open logstash-2020.03.18 om-LUsRXQUGcBfww4ioa3w 5 1  26071 0  27.9mb  13.9mb
green open logstash-2020.03.16 3RAWut3DQkqlLWgQu9DxSQ 5 1  22269 0  23.7mb  11.8mb  


3.3 helm部署Kibana
[root@ops1 test]#  helm search Kibana
NAME            CHART VERSION   APP VERSION     DESCRIPTION                                                 
stable/kibana   3.2.6           6.7.0           Kibana is an open source data visualization plugin for El...
[root@ops1 test]# helm fetch stable/kibana --version 3.2.6
[root@ops1 test]# tar -zxf kibana-3.2.6.tgz 
tar: kibana/Chart.yaml：不可信的旧时间戳 1970-01-01 08:00:00
tar: kibana/values.yaml：不可信的旧时间戳 1970-01-01 08:00:00
[root@ops1 test]# cd kibana/
[root@ops1 kibana]# ls
Chart.yaml  ci  OWNERS  README.md  templates  values.yaml



[root@ops1 test]# cat <<EOF> kibana-values.yaml
files:
  kibana.yml:
    elasticsearch.hosts: http://elasticsearch-client:9200
service:
  type: NodePort 
  nodePort: 30922    
persistentVolumeClaim:
  enabled: true  # 如果不使用pvc永久存储，只做测试就改为false
  storageClass: "nfs2"      
EOF
[root@ops1 test]# helm install --name kibana -f kibana-values.yaml  --namespace=efk --version=3.2.6 stable/kibana     
NAME:   kibana
LAST DEPLOYED: Mon Mar 30 21:25:49 2020
NAMESPACE: efk
STATUS: DEPLOYED

RESOURCES:
==> v1/ConfigMap
NAME         DATA  AGE
kibana       1     1s
kibana-test  1     1s

==> v1/Deployment
NAME    READY  UP-TO-DATE  AVAILABLE  AGE
kibana  0/1    1           0          0s

==> v1/Pod(related)
NAME                    READY  STATUS             RESTARTS  AGE
kibana-7bf95fb48-nb2z4  0/1    ContainerCreating  0         0s

==> v1/Service
NAME    TYPE      CLUSTER-IP    EXTERNAL-IP  PORT(S)        AGE
kibana  NodePort  10.96.128.34  <none>       443:30922/TCP  0s


NOTES:
To verify that kibana has started, run:

  kubectl --namespace=efk get pods -l "app=kibana"

Kibana can be accessed:

  * From outside the cluster, run these commands in the same shell:

    export NODE_PORT=$(kubectl get --namespace efk -o jsonpath="{.spec.ports[0].nodePort}" services kibana)
    export NODE_IP=$(kubectl get nodes --namespace efk -o jsonpath="{.items[0].status.addresses[0].address}")
    echo http://$NODE_IP:$NODE_PORT

[root@ops1 test]# kubectl get pod -n efk | grep kibana
kibana-7bf95fb48-nb2z4                  1/1     Running            0          36s 

    
3.4等待服务都起来，访问界面即可
    http://192.168.70.122:30922/app/kibana#/home?_g=()

简介

对于单机版的集群式应用来说，如果遇到故障我们可以登录上服务器上通过查看日志文件的方式进行查看日志，但对多规模的分布式多节点应用来说，日志分散在多个节点，这显然通过登录上每个节点服务器通过命令查看日志不可取的，就需要统一的日志管理平台收集各各节点上的日志集中管理

一、kubernetes和docker都有哪些日志

以上日志都是默认日志，对日志不进行配置就这样
kubectl logs和docker logs一样的，都是查看容器内部应用的日志
对于容器内部应用产生stdout和stderr日志一定会被引擎拦截，如果在应用中通过代码把日志保存到容器内部，那么会产生两份日志；一份原生的日志（代码生成在项目中），一份经过加工过的日志比如json-file保存在/var/lib/docker/containers//.log中。

二、kubernetes日志管理最佳实践EFK(采集)

kubernetes efk三大组件配方

https://github.com/kubernetes/kubernetes/tree/v1.16.1/cluster/addons/fluentd-elasticsearch 复制代码

其实只需要安装fluentd-es到你的k8s集群即可，完全可以将elasticsearch和kibana单独部署，如果有票子可以直接买云厂商现成的，需要的修改fluentd-es的configMap中output：

output.conf: |-
    <match **>
      @id elasticsearch
      @type elasticsearch
      ...
      ...
      host elasticsearch-logging # 替换集群外部的host
      port 9200
      user elastic # 如果开启账户验证
      password elastic # 如果开启账户验证
      ...
      ...
    </match>
复制代码

三、问题排查

1、fluentd-es缓冲区

查看fluentd-es pod日志文件，有个[warn]：

[elasticsearch] failed to write data into buffer by buffer overflow action=:block\n

fluent报错2:buffer flush took longer time than slow_flush_log_threshold，所以增大buffer值

2020-03-30 03:27:16 +0000 [warn]: [elasticsearch] failed to flush the buffer. retry_time=18 next_retry_seconds=2020-03-30 03:27:45 +0000 chunk="5a2088ffc6b9665d350a939c7e599407" error_class=Fluent::Plugin::ElasticsearchOutput::ConnectionFailure error="Can not reach Elasticsearch cluster ({:host=>\"elasticsearch-client\", :port=>9200, :scheme=>\"http\"})! connect_write timeout reached"
2020-03-30 03:27:47 +0000 [info]: [elasticsearch] Connection opened to Elasticsearch cluster => {:host=>"elasticsearch-client", :port=>9200, :scheme=>"http"}
2020-03-30 03:27:47 +0000 [info]: [elasticsearch] Connection opened to Elasticsearch cluster => {:host=>"elasticsearch-client", :port=>9200, :scheme=>"http"}
2020-03-30 03:27:48 +0000 [warn]: [elasticsearch] retry succeeded. chunk_id="5a208901af1d77c4c45817d20e3d640a"
2020-03-30 03:50:05 +0000 [warn]: [elasticsearch] buffer flush took longer time than slow_flush_log_threshold: elapsed_time=29.358517162967473 slow_flush_log_threshold=20.0 plugin_id="elasticsearch"
2020-03-30 03:50:05 +0000 [warn]: [elasticsearch] buffer flush took longer time than slow_flush_log_threshold: elapsed_time=27.023319482104853 slow_flush_log_threshold=20.0 plugin_id="elasticsearch"

我们看采集具体流程

我们可以修改fluentd-es configMap配置文件中的buffer.chunk_limit_size

output.conf: |-
    <match **>
      @id elasticsearch
      @type elasticsearch
      @log_level info
      type_name _doc
      include_tag_key true
      host es-cn-36200000.public.elasticsearch.aliyuncs.com
      port 9200
      user elastic
      password elastic
      logstash_format true
      <buffer>
        @type file
        path /var/log/fluentd-buffers/kubernetes.system.buffer
        flush_mode interval
        ...
        chunk_limit_size 100M # 由原来的2M增加到100M，一般2M就够了，只是为了测试 ：）
        ...
        overflow_action block # 我们看到，这个block就是溢出标志
      </buffer>
    </match>
复制代码

kibana报错1：版本不匹配

{"statusCode":503,"error":"Service Unavailable","message":"Service Unavailable"}

{"publish_address":"10.100.252.234:9200"},"ip":"10.100.252.234"},{"version":"6.8.6","http":{"publish_address":"10.100.35.208:9200"},"ip":"10.100.35.208"},{"version":"6.8.6","http":{"publish_address":"10.100.252.235:9200"},"ip":"10.100.252.235"},{"version":"6.8.6","http":{"publish_address":"10.100.35.205:9200"},"ip":"10.100.35.205"},{"version":"6.8.6","http":{"publish_address":"10.100.35.211:9200"},"ip":"10.100.35.211"}],"message":"You're running Kibana 6.7.0 with some different versions of Elasticsearch. Update Kibana or Elasticsearch to the same version to prevent compatibility issues: v6.8.6 @ 10.100.35.206:9200 (10.100.35.206), v6.8.6 @ 10.100.252.237:9200 (10.100.252.237), v6.8.6 @ 10.100.252.234:9200 (10.100.252.234), v6.8.6 @ 10.100.35.208:9200 (10.100.35.208), v6.8.6 @ 10.100.252.235:9200 (10.100.252.235), v6.8.6 @ 10.100.35.205:9200 (10.100.35.205), v6.8.6 @ 10.100.35.211:9200 (10.100.35.211)"}

elasticsearch入库错误：gc overhead导致数据节点脱离集群

K8S/Kubernetes

K8S/Kubernetes社区为您提供最前沿的新闻资讯和知识内容

更多推荐

【深度】阿里巴巴万级规模 K8s 集群全局高可用体系之美

作者 | 韩堂、柘远、沉醉来源 | 阿里巴巴云原生公众号前言台湾作家林清玄在接受记者采访的时候，如此评价自己 30 多年写作生涯：“第一个十年我才华横溢，‘贼光闪现’，令周边黯然失色；第二个十年，我终于‘宝光现形’，不再去抢风头，反而与身边的美丽相得益彰；进入第三个十年，繁华落尽见真醇，我进入了‘醇光初现’的阶段，真正体味到了境界之美”。长夜有穷，真水无香。领略过了 K8s“身在江

K8S/Kubernetes

如何基于 K8s 构建下一代 DevOps 平台？

作者 | 孙健波（天元）导读：当前云原生 DevOps 体系现状如何？面临哪些挑战？如何通过 OAM 解决云原生 DevOps 场景下的诸多问题？云原生开发应用模型 OAM(Open Application Model) 社区核心成员孙健波将为大家一一解答，并分享如何基于 OAM 和 Kubernetes 打造无限能力的下一代 DevOps 平台。什么是 DevOps？为什么基于 Kub