k8s Helm安装EFK
之前相信大家听过ELK工具栈,E: elasticsearch、L:logstash、K: kabana,其中logstash的缺点太重量级,Logstash性能低、资源消耗比较多、并且不支持消息队列缓存及存在数据丢失等问题,随后就有Fluentd出现,相比它更易用、资源消耗更少、性能更高,在数据处理上更高效可靠,受到企业欢迎,成为logstash的替代方案,亚马逊称其为数据收集的最佳方案EFK。
1. 系统环境
- 系统版本号CentOS 7.6
- docker Client版本号18.09.7, Server版本号18.09.7
- k8s版本号v1.16.2
- helm Client版本号v2.13.1,Server版本号v2.13.1
确认heml镜像源并更新镜像仓库
1 2 3 4 5 6 7 8 |
|
2. 快速安装EFK
注意我用的存储是storageClass "nfs2",请注意修改。
2.1 helm安装elasticsearch
[root@ops1 test]# cat <<EOF> elasticsearch-values.yaml
image:
repository:
"docker.elastic.co/elasticsearch/elasticsearch-oss"
# repository: "registry.cn-beijing.aliyuncs.com/wangzt/k8s/elasticsearch-oss:6.7.0" 个人镜像仓库
tag:
"6.7.0"
client:
serviceType:
"NodePort"
httpNodePort: 30920
master:
persistence:
enabled: false
# elasticsearch-master使用pvc永久存储,如果是测试,可以换成
false
storageClass:
"nfs2"
data:
persistence:
enabled: false
#elasticsearch-data使用pvc永久存储,如果是测试,可以换成
false
storageClass:
"nfs2"
EOF
[root@ops1 test]# helm install elasticsearch -f elasticsearch-values.yaml --
namespace
=efk --version=1.32.4 stable/elasticsearch
[root@ops1 test]# kubectl
get
all -n efk
# 等到全部pod显示正常后,访问k8s工作节点
[root@ops1 test]# curl http:
//127.0.0.1:30920/
{
"name"
:
"elasticsearch-client-65bfdd647c-kl9zb"
,
"cluster_name"
:
"elasticsearch"
,
...
"tagline"
:
"You Know, for Search"
}
2.2 helm安装fluented
# 配置可不加,我是为了日志量太大,和添加监控显示的
[root@ops1 test]# cat <<EOF> fluentd-values.yaml
image:
repository: gcr.io/google-containers/fluentd-elasticsearch # 默认地址可能不可用
# repository: registry.cn-beijing.aliyuncs.com/wangzt/kubernetes/fluentd-elasticsearch
elasticsearch:
buffer_chunk_limit: 32M # 内存缓冲区
service: #启动监控monitor-agent
type: NodePort
ports:
- name:
"monitor-agent"
port: 24231
env:
OUTPUT_BUFFER_CHUNK_LIMIT:
"32M"
# 设置buffer缓存区大小
podAnnotations: # 让prometheus监控monitor-agent
prometheus.io/scrape:
"true"
prometheus.io/port:
"24231"
tolerations: #监控master
- key: node-role.kubernetes.io/master
operator
: Exists
effect: NoSchedule
EOF
[root@ops1 test]# helm install fluentd-elasticsearch -f fluentd-values.yaml --
namespace
=efk --version=2.0.7 stable/fluentd-elasticsearch
[root@ops1 test]# kubectl
get
pod -n efk | grep fluentd
#等服务全部正常后,可以看到有索引产生
[root@ops1 test]# curl http:
//127.0.0.1:30920/_cat/indices
green open logstash-2020.03.18 om-LUsRXQUGcBfww4ioa3w 5 1 26071 0 27.9mb 13.9mb
green open logstash-2020.03.16 3RAWut3DQkqlLWgQu9DxSQ 5 1 22269 0 23.7mb 11.8mb
2.3 helm安装kibana
[root@ops1 test]# cat <<EOF> kibana-values.yaml
files:
kibana.yml:
elasticsearch.hosts: http:
//elasticsearch-client:9200
service:
type: NodePort
nodePort: 30922
persistentVolumeClaim:
enabled:
true
# 如果不使用pvc永久存储,只做测试就改为
false
storageClass:
"nfs2"
EOF
[root@ops1 test]# helm install kibana -f kibana-values.yaml --
namespace
=efk --version=3.2.6 stable/kibana
[root@ops1 test]# kubectl
get
pod -n efk | grep kibana
kibana-7bf95fb48-nb2z4 1/1 Running 0 36s
2.4等待服务都起来,访问界面即可
http://192.168.70.122:30922/app/kibana#/home?_g=()
简介
对于单机版的集群式应用来说,如果遇到故障我们可以登录上服务器上通过查看日志文件的方式进行查看日志,但对多规模的分布式多节点应用来说,日志分散在多个节点,这显然通过登录上每个节点服务器通过命令查看日志不可取的,就需要统一的日志管理平台收集各各节点上的日志集中管理
之前相信大家听过ELK工具栈,E: elasticsearch、L:logstash、K: kabana,其中logstash的缺点太重量级,Logstash性能低、资源消耗比较多、并且不支持消息队列缓存及存在数据丢失等问题,随后就有Fluentd出现,相比它更易用、资源消耗更少、性能更高,在数据处理上更高效可靠,受到企业欢迎,成为logstash的替代方案,亚马逊称其为数据收集的最佳方案EFK。
一、kubernetes和docker都有哪些日志
以上日志都是默认日志,对日志不进行配置就这样
kubectl logs和docker logs一样的,都是查看容器内部应用的日志
对于容器内部应用产生stdout和stderr日志一定会被引擎拦截,如果在应用中通过代码把日志保存到容器内部,那么会产生两份日志;一份原生的日志(代码生成在项目中),一份经过加工过的日志比如json-file保存在/var/lib/docker/containers/<容器>/<容器>.log中。
二、kubernetes日志管理最佳实践EFK(采集)
kubernetes efk三大组件配方
https://github.com/kubernetes/kubernetes/tree/v1.16.1/cluster/addons/fluentd-elasticsearch
复制代码
其实只需要安装fluentd-es到你的k8s集群即可,完全可以将elasticsearch和kibana单独部署,如果有票子可以直接买云厂商现成的,需要的修改fluentd-es的configMap中output:
output.conf: |-
<match **>
@id elasticsearch
@type elasticsearch
...
...
host elasticsearch-logging # 替换集群外部的host
port 9200
user elastic # 如果开启账户验证
password elastic # 如果开启账户验证
...
...
</match>
复制代码
参考www.cnblogs.com/cocowool/p/…
三、问题排查
1、fluentd-es缓冲区
查看fluentd-es pod日志文件,有个[warn]:
[elasticsearch] failed to write data into buffer by buffer overflow action=:block\n
fluent报错2:buffer flush took longer time than slow_flush_log_threshold,所以增大buffer值
2020-03-30 03:27:16 +0000 [warn]: [elasticsearch] failed to flush the buffer. retry_time=18 next_retry_seconds=2020-03-30 03:27:45 +0000 chunk="5a2088ffc6b9665d350a939c7e599407" error_class=Fluent::Plugin::ElasticsearchOutput::ConnectionFailure error="Can not reach Elasticsearch cluster ({:host=>\"elasticsearch-client\", :port=>9200, :scheme=>\"http\"})! connect_write timeout reached"
2020-03-30 03:27:47 +0000 [info]: [elasticsearch] Connection opened to Elasticsearch cluster => {:host=>"elasticsearch-client", :port=>9200, :scheme=>"http"}
2020-03-30 03:27:47 +0000 [info]: [elasticsearch] Connection opened to Elasticsearch cluster => {:host=>"elasticsearch-client", :port=>9200, :scheme=>"http"}
2020-03-30 03:27:48 +0000 [warn]: [elasticsearch] retry succeeded. chunk_id="5a208901af1d77c4c45817d20e3d640a"
2020-03-30 03:50:05 +0000 [warn]: [elasticsearch] buffer flush took longer time than slow_flush_log_threshold: elapsed_time=29.358517162967473 slow_flush_log_threshold=20.0 plugin_id="elasticsearch"
2020-03-30 03:50:05 +0000 [warn]: [elasticsearch] buffer flush took longer time than slow_flush_log_threshold: elapsed_time=27.023319482104853 slow_flush_log_threshold=20.0 plugin_id="elasticsearch"
我们看采集具体流程
我们可以修改fluentd-es configMap配置文件中的buffer.chunk_limit_size
output.conf: |-
<match **>
@id elasticsearch
@type elasticsearch
@log_level info
type_name _doc
include_tag_key true
host es-cn-36200000.public.elasticsearch.aliyuncs.com
port 9200
user elastic
password elastic
logstash_format true
<buffer>
@type file
path /var/log/fluentd-buffers/kubernetes.system.buffer
flush_mode interval
...
chunk_limit_size 100M # 由原来的2M增加到100M,一般2M就够了,只是为了测试 :)
...
overflow_action block # 我们看到,这个block就是溢出标志
</buffer>
</match>
复制代码
kibana报错1:版本不匹配
{"statusCode":503,"error":"Service Unavailable","message":"Service Unavailable"}
{"publish_address":"10.100.252.234:9200"},"ip":"10.100.252.234"},{"version":"6.8.6","http":{"publish_address":"10.100.35.208:9200"},"ip":"10.100.35.208"},{"version":"6.8.6","http":{"publish_address":"10.100.252.235:9200"},"ip":"10.100.252.235"},{"version":"6.8.6","http":{"publish_address":"10.100.35.205:9200"},"ip":"10.100.35.205"},{"version":"6.8.6","http":{"publish_address":"10.100.35.211:9200"},"ip":"10.100.35.211"}],"message":"You're running Kibana 6.7.0 with some different versions of Elasticsearch. Update Kibana or Elasticsearch to the same version to prevent compatibility issues: v6.8.6 @ 10.100.35.206:9200 (10.100.35.206), v6.8.6 @ 10.100.252.237:9200 (10.100.252.237), v6.8.6 @ 10.100.252.234:9200 (10.100.252.234), v6.8.6 @ 10.100.35.208:9200 (10.100.35.208), v6.8.6 @ 10.100.252.235:9200 (10.100.252.235), v6.8.6 @ 10.100.35.205:9200 (10.100.35.205), v6.8.6 @ 10.100.35.211:9200 (10.100.35.211)"}
elasticsearch入库错误:gc overhead导致数据节点脱离集群
[root@ops1 test]# [root@ops1 test]# cat <<EOF> elasticsearch-values.yaml
image:
repository: "docker.elastic.co/elasticsearch/elasticsearch-oss"
# repository: "registry.cn-beijing.aliyuncs.com/wangzt/k8s/elasticsearch-oss:6.7.0" 个人镜像仓库
tag: "6.7.0"
client:
serviceType: "NodePort"
httpNodePort: 30920
master:
persistence:
enabled: true # elasticsearch-master使用pvc永久存储,如果是测试,可以换成false
storageClass: "nfs2"
data:
persistence:
enabled: true #elasticsearch-data使用pvc永久存储,如果是测试,可以换成false
storageClass: "nfs2"
EOF
[root@ops1 test]# helm install --name elasticsearch -f elasticsearch-values.yaml --namespace=efk --version=1.32.4 stable/elasticsearch
NAME: elasticsearch
LAST DEPLOYED: Mon Mar 30 20:41:19 2020
NAMESPACE: efk
STATUS: DEPLOYED
RESOURCES:
==> v1/ConfigMap
NAME DATA AGE
elasticsearch 4 2s
elasticsearch-test 1 1s
==> v1/Deployment
NAME READY UP-TO-DATE AVAILABLE AGE
elasticsearch-client 0/2 2 0 1s
==> v1/Pod(related)
NAME READY STATUS RESTARTS AGE
elasticsearch-client-65bfdd647c-f4mmc 0/1 Init:0/1 0 1s
elasticsearch-client-65bfdd647c-kl9zb 0/1 Init:0/1 0 1s
elasticsearch-data-0 0/1 Pending 0 1s
elasticsearch-master-0 0/1 Pending 0 1s
==> v1/Service
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
elasticsearch-client NodePort 10.96.29.121 <none> 9200:30920/TCP 1s
elasticsearch-discovery ClusterIP None <none> 9300/TCP 1s
==> v1/ServiceAccount
NAME SECRETS AGE
elasticsearch-client 1 1s
elasticsearch-data 1 1s
elasticsearch-master 1 1s
==> v1/StatefulSet
NAME READY AGE
elasticsearch-data 0/2 1s
elasticsearch-master 0/3 1s
NOTES:
The elasticsearch cluster has been installed.
Elasticsearch can be accessed:
* Within your cluster, at the following DNS name at port 9200:
elasticsearch-client.efk.svc
* From outside the cluster, run these commands in the same shell:
export NODE_PORT=$(kubectl get --namespace efk -o jsonpath="{.spec.ports[0].nodePort}" services elasticsearch-client)
export NODE_IP=$(kubectl get nodes --namespace efk -o jsonpath="{.items[0].status.addresses[0].address}")
echo http://$NODE_IP:$NODE_PORT
[root@ops1 test]# kubectl get all -n efk
NAME READY STATUS RESTARTS AGE
pod/elasticsearch-client-65bfdd647c-f4mmc 1/1 Running 0 5m32s
pod/elasticsearch-client-65bfdd647c-kl9zb 1/1 Running 0 5m32s
pod/elasticsearch-data-0 1/1 Running 0 5m32s
pod/elasticsearch-data-1 1/1 Running 0 2m56s
pod/elasticsearch-master-0 1/1 Running 0 5m32s
pod/elasticsearch-master-1 1/1 Running 0 3m29s
pod/elasticsearch-master-2 1/1 Running 0 2m46s
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/elasticsearch-client NodePort 10.96.29.121 <none> 9200:30920/TCP 5m33s
service/elasticsearch-discovery ClusterIP None <none> 9300/TCP 5m33s
NAME READY UP-TO-DATE AVAILABLE AGE
deployment.apps/elasticsearch-client 2/2 2 2 5m33s
NAME DESIRED CURRENT READY AGE
replicaset.apps/elasticsearch-client-65bfdd647c 2 2 2 5m33s
NAME READY AGE
statefulset.apps/elasticsearch-data 2/2 5m33s
statefulset.apps/elasticsearch-master 3/3 5m33s
# 等到全部pod显示正常后,访问k8s工作节点
[root@ops1 test]# curl http://127.0.0.1:30920/
{
"name" : "elasticsearch-client-65bfdd647c-kl9zb",
"cluster_name" : "elasticsearch",
"cluster_uuid" : "a5Bf3MkTSXWPGYn_mIC1bg",
"version" : {
"number" : "6.7.0",
"build_flavor" : "oss",
"build_type" : "docker",
"build_hash" : "8453f77",
"build_date" : "2019-03-21T15:32:29.844721Z",
"build_snapshot" : false,
"lucene_version" : "7.7.0",
"minimum_wire_compatibility_version" : "5.6.0",
"minimum_index_compatibility_version" : "5.0.0"
},
"tagline" : "You Know, for Search"
}
3. helm安装EFk详细步骤
3.1 helm安装elasticsearch详细步骤
3. helm安装EFk详细步骤
3.1 helm安装elasticsearch详细步骤
[root@ops1 test]# helm search elasticsearch
stable/elasticsearch 1.32.4 6.8.6 Flexible and powerful open source, distributed real-time ...
[root@ops1 test]# helm fetch stable/elasticsearch --version 1.32.4
[root@ops1 test]# tar -zxf elasticsearch-1.32.4.tgz
[root@ops1 test]# cd elasticsearch/ # 查看具体的elasticsearch配置文件吧
这个-oss表示不包括X-Pack的ES镜像,这也是在6.0+版本后划分的,剩下两种类型是basic(默认)和platinum
[root@ops1 test]# [root@ops1 test]# cat <<EOF> elasticsearch-values.yaml
image:
repository: "docker.elastic.co/elasticsearch/elasticsearch-oss"
# repository: "registry.cn-beijing.aliyuncs.com/wangzt/k8s/elasticsearch-oss:6.7.0" 个人镜像仓库
tag: "6.7.0"
client:
serviceType: "NodePort"
httpNodePort: 30920
master:
persistence:
enabled: true # elasticsearch-master使用pvc永久存储,如果是测试,可以换成false
storageClass: "nfs2"
data:
persistence:
enabled: true #elasticsearch-data使用pvc永久存储,如果是测试,可以换成false
storageClass: "nfs2"
EOF
[root@ops1 test]# helm install --name elasticsearch -f elasticsearch-values.yaml --namespace=efk --version=1.32.4 stable/elasticsearch
NAME: elasticsearch
LAST DEPLOYED: Mon Mar 30 20:41:19 2020
NAMESPACE: efk
STATUS: DEPLOYED
RESOURCES:
==> v1/ConfigMap
NAME DATA AGE
elasticsearch 4 2s
elasticsearch-test 1 1s
==> v1/Deployment
NAME READY UP-TO-DATE AVAILABLE AGE
elasticsearch-client 0/2 2 0 1s
==> v1/Pod(related)
NAME READY STATUS RESTARTS AGE
elasticsearch-client-65bfdd647c-f4mmc 0/1 Init:0/1 0 1s
elasticsearch-client-65bfdd647c-kl9zb 0/1 Init:0/1 0 1s
elasticsearch-data-0 0/1 Pending 0 1s
elasticsearch-master-0 0/1 Pending 0 1s
==> v1/Service
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
elasticsearch-client NodePort 10.96.29.121 <none> 9200:30920/TCP 1s
elasticsearch-discovery ClusterIP None <none> 9300/TCP 1s
==> v1/ServiceAccount
NAME SECRETS AGE
elasticsearch-client 1 1s
elasticsearch-data 1 1s
elasticsearch-master 1 1s
==> v1/StatefulSet
NAME READY AGE
elasticsearch-data 0/2 1s
elasticsearch-master 0/3 1s
NOTES:
The elasticsearch cluster has been installed.
Elasticsearch can be accessed:
* Within your cluster, at the following DNS name at port 9200:
elasticsearch-client.efk.svc
* From outside the cluster, run these commands in the same shell:
export NODE_PORT=$(kubectl get --namespace efk -o jsonpath="{.spec.ports[0].nodePort}" services elasticsearch-client)
export NODE_IP=$(kubectl get nodes --namespace efk -o jsonpath="{.items[0].status.addresses[0].address}")
echo http://$NODE_IP:$NODE_PORT
[root@ops1 test]# kubectl get all -n efk
NAME READY STATUS RESTARTS AGE
pod/elasticsearch-client-65bfdd647c-f4mmc 1/1 Running 0 5m32s
pod/elasticsearch-client-65bfdd647c-kl9zb 1/1 Running 0 5m32s
pod/elasticsearch-data-0 1/1 Running 0 5m32s
pod/elasticsearch-data-1 1/1 Running 0 2m56s
pod/elasticsearch-master-0 1/1 Running 0 5m32s
pod/elasticsearch-master-1 1/1 Running 0 3m29s
pod/elasticsearch-master-2 1/1 Running 0 2m46s
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/elasticsearch-client NodePort 10.96.29.121 <none> 9200:30920/TCP 5m33s
service/elasticsearch-discovery ClusterIP None <none> 9300/TCP 5m33s
NAME READY UP-TO-DATE AVAILABLE AGE
deployment.apps/elasticsearch-client 2/2 2 2 5m33s
NAME DESIRED CURRENT READY AGE
replicaset.apps/elasticsearch-client-65bfdd647c 2 2 2 5m33s
NAME READY AGE
statefulset.apps/elasticsearch-data 2/2 5m33s
statefulset.apps/elasticsearch-master 3/3 5m33s
# 等到全部pod显示正常后,访问k8s工作节点
[root@ops1 test]# curl http://127.0.0.1:30920/
{
"name" : "elasticsearch-client-65bfdd647c-kl9zb",
"cluster_name" : "elasticsearch",
"cluster_uuid" : "a5Bf3MkTSXWPGYn_mIC1bg",
"version" : {
"number" : "6.7.0",
"build_flavor" : "oss",
"build_type" : "docker",
"build_hash" : "8453f77",
"build_date" : "2019-03-21T15:32:29.844721Z",
"build_snapshot" : false,
"lucene_version" : "7.7.0",
"minimum_wire_compatibility_version" : "5.6.0",
"minimum_index_compatibility_version" : "5.0.0"
},
"tagline" : "You Know, for Search"
}
3.2 helm部署Fluent
查看fluents的chart文件
[root@ops1 test]# helm search Fluent
stable/fluentd-elasticsearch 2.0.7 2.3.2 DEPRECATED! - A Fluentd Helm chart for Kubernetes with El...
[root@ops1 test]# helm fetch stable/fluentd-elasticsearch --version 2.0.7
[root@ops1 test]# tar -zxf fluentd-elasticsearch-2.0.7.tgz
fluentd-elasticsearch/Chart.yaml
tar: fluentd-elasticsearch/Chart.yaml:不可信的旧时间戳 1970-01-01 08:00:00
[root@ops1 test]# cd fluentd-elasticsearch/
[root@ops1 fluentd-elasticsearch]# ls
Chart.yaml OWNERS README.md templates values.yaml
helm部署fluent
# 配置可不加,我是为了日志量太大,和添加监控显示的
[root@ops1 test]# cat <<EOF> fluentd-values.yaml
image:
repository: gcr.io/google-containers/fluentd-elasticsearch # 默认地址可能不可用
# repository: registry.cn-beijing.aliyuncs.com/wangzt/kubernetes/fluentd-elasticsearch
elasticsearch:
buffer_chunk_limit: 32M # 内存缓冲区
service: #启动监控monitor-agent
type: NodePort
ports:
- name: "monitor-agent"
port: 24231
env:
OUTPUT_BUFFER_CHUNK_LIMIT: "32M" # 设置buffer缓存区大小
podAnnotations: # 让prometheus监控monitor-agent
prometheus.io/scrape: "true"
prometheus.io/port: "24231"
tolerations: #监控master
- key: node-role.kubernetes.io/master
operator: Exists
effect: NoSchedule
EOF
[root@ops1 test]# helm install --name fluentd-elasticsearch -f fluentd-values.yaml \
--namespace=efk --version=2.0.7 stable/fluentd-elasticsearch
NAME: fluentd-elasticsearch
LAST DEPLOYED: Mon Mar 30 21:07:44 2020
NAMESPACE: efk
STATUS: DEPLOYED
RESOURCES:
==> v1/ClusterRole
NAME AGE
fluentd-elasticsearch 0s
==> v1/ClusterRoleBinding
NAME AGE
fluentd-elasticsearch 0s
==> v1/ConfigMap
NAME DATA AGE
fluentd-elasticsearch 6 0s
==> v1/DaemonSet
NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE
fluentd-elasticsearch 4 4 0 4 0 <none> 0s
==> v1/Pod(related)
NAME READY STATUS RESTARTS AGE
fluentd-elasticsearch-6hw4z 0/1 ContainerCreating 0 0s
fluentd-elasticsearch-9zwnz 0/1 ContainerCreating 0 0s
fluentd-elasticsearch-k69rb 0/1 ContainerCreating 0 0s
fluentd-elasticsearch-ww8t9 0/1 ContainerCreating 0 0s
==> v1/Service
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
fluentd-elasticsearch NodePort 10.96.175.162 <none> 24231:23431/TCP 0s
==> v1/ServiceAccount
NAME SECRETS AGE
fluentd-elasticsearch 1 0s
NOTES:
1. To verify that Fluentd has started, run:
kubectl --namespace=efk get pods -l "app.kubernetes.io/name=fluentd-elasticsearch,app.kubernetes.io/instance=fluentd-elasticsearch"
THIS APPLICATION CAPTURES ALL CONSOLE OUTPUT AND FORWARDS IT TO elasticsearch . Anything that might be identifying,
including things like IP addresses, container images, and object names will NOT be anonymized.
2. Get the application URL by running these commands:
export NODE_PORT=$(kubectl get --namespace efk -o jsonpath="{.spec.ports[0].nodePort}" services fluentd-elasticsearch)
export NODE_IP=$(kubectl get nodes --namespace efk -o jsonpath="{.items[0].status.addresses[0].address}")
echo http://$NODE_IP:$NODE_PORT
[root@ops1 test]# kubectl get pod -n efk | grep fluentd
fluentd-elasticsearch-6hw4z 1/1 Running 0 117s
fluentd-elasticsearch-9zwnz 1/1 Running 0 117s
fluentd-elasticsearch-k69rb 1/1 Running 0 117s
fluentd-elasticsearch-ww8t9 1/1 Running 0 117s
[root@ops1 test]# curl http://127.0.0.1:30920/_cat/indices
green open logstash-2020.03.18 om-LUsRXQUGcBfww4ioa3w 5 1 26071 0 27.9mb 13.9mb
green open logstash-2020.03.16 3RAWut3DQkqlLWgQu9DxSQ 5 1 22269 0 23.7mb 11.8mb
3.3 helm部署Kibana
[root@ops1 test]# helm search Kibana
NAME CHART VERSION APP VERSION DESCRIPTION
stable/kibana 3.2.6 6.7.0 Kibana is an open source data visualization plugin for El...
[root@ops1 test]# helm fetch stable/kibana --version 3.2.6
[root@ops1 test]# tar -zxf kibana-3.2.6.tgz
tar: kibana/Chart.yaml:不可信的旧时间戳 1970-01-01 08:00:00
tar: kibana/values.yaml:不可信的旧时间戳 1970-01-01 08:00:00
[root@ops1 test]# cd kibana/
[root@ops1 kibana]# ls
Chart.yaml ci OWNERS README.md templates values.yaml
[root@ops1 test]# cat <<EOF> kibana-values.yaml
files:
kibana.yml:
elasticsearch.hosts: http://elasticsearch-client:9200
service:
type: NodePort
nodePort: 30922
persistentVolumeClaim:
enabled: true # 如果不使用pvc永久存储,只做测试就改为false
storageClass: "nfs2"
EOF
[root@ops1 test]# helm install --name kibana -f kibana-values.yaml --namespace=efk --version=3.2.6 stable/kibana
NAME: kibana
LAST DEPLOYED: Mon Mar 30 21:25:49 2020
NAMESPACE: efk
STATUS: DEPLOYED
RESOURCES:
==> v1/ConfigMap
NAME DATA AGE
kibana 1 1s
kibana-test 1 1s
==> v1/Deployment
NAME READY UP-TO-DATE AVAILABLE AGE
kibana 0/1 1 0 0s
==> v1/Pod(related)
NAME READY STATUS RESTARTS AGE
kibana-7bf95fb48-nb2z4 0/1 ContainerCreating 0 0s
==> v1/Service
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
kibana NodePort 10.96.128.34 <none> 443:30922/TCP 0s
NOTES:
To verify that kibana has started, run:
kubectl --namespace=efk get pods -l "app=kibana"
Kibana can be accessed:
* From outside the cluster, run these commands in the same shell:
export NODE_PORT=$(kubectl get --namespace efk -o jsonpath="{.spec.ports[0].nodePort}" services kibana)
export NODE_IP=$(kubectl get nodes --namespace efk -o jsonpath="{.items[0].status.addresses[0].address}")
echo http://$NODE_IP:$NODE_PORT
[root@ops1 test]# kubectl get pod -n efk | grep kibana
kibana-7bf95fb48-nb2z4 1/1 Running 0 36s
3.4等待服务都起来,访问界面即可
http://192.168.70.122:30922/app/kibana#/home?_g=()
简介
对于单机版的集群式应用来说,如果遇到故障我们可以登录上服务器上通过查看日志文件的方式进行查看日志,但对多规模的分布式多节点应用来说,日志分散在多个节点,这显然通过登录上每个节点服务器通过命令查看日志不可取的,就需要统一的日志管理平台收集各各节点上的日志集中管理
之前相信大家听过ELK工具栈,E: elasticsearch、L:logstash、K: kabana,其中logstash的缺点太重量级,Logstash性能低、资源消耗比较多、并且不支持消息队列缓存及存在数据丢失等问题,随后就有Fluentd出现,相比它更易用、资源消耗更少、性能更高,在数据处理上更高效可靠,受到企业欢迎,成为logstash的替代方案,亚马逊称其为数据收集的最佳方案EFK。
一、kubernetes和docker都有哪些日志
- 以上日志都是默认日志,对日志不进行配置就这样
- kubectl logs和docker logs一样的,都是查看容器内部应用的日志
- 对于容器内部应用产生stdout和stderr日志一定会被引擎拦截,如果在应用中通过代码把日志保存到容器内部,那么会产生两份日志;一份原生的日志(代码生成在项目中),一份经过加工过的日志比如json-file保存在/var/lib/docker/containers//.log中。
二、kubernetes日志管理最佳实践EFK(采集)
kubernetes efk三大组件配方
https://github.com/kubernetes/kubernetes/tree/v1.16.1/cluster/addons/fluentd-elasticsearch 复制代码
其实只需要安装fluentd-es到你的k8s集群即可,完全可以将elasticsearch和kibana单独部署,如果有票子可以直接买云厂商现成的,需要的修改fluentd-es的configMap中output:
output.conf: |-
<match **>
@id elasticsearch
@type elasticsearch
...
...
host elasticsearch-logging # 替换集群外部的host
port 9200
user elastic # 如果开启账户验证
password elastic # 如果开启账户验证
...
...
</match>
复制代码
三、问题排查
1、fluentd-es缓冲区
查看fluentd-es pod日志文件,有个[warn]:
[elasticsearch] failed to write data into buffer by buffer overflow action=:block\n
fluent报错2:buffer flush took longer time than slow_flush_log_threshold,所以增大buffer值
2020-03-30 03:27:16 +0000 [warn]: [elasticsearch] failed to flush the buffer. retry_time=18 next_retry_seconds=2020-03-30 03:27:45 +0000 chunk="5a2088ffc6b9665d350a939c7e599407" error_class=Fluent::Plugin::ElasticsearchOutput::ConnectionFailure error="Can not reach Elasticsearch cluster ({:host=>\"elasticsearch-client\", :port=>9200, :scheme=>\"http\"})! connect_write timeout reached"
2020-03-30 03:27:47 +0000 [info]: [elasticsearch] Connection opened to Elasticsearch cluster => {:host=>"elasticsearch-client", :port=>9200, :scheme=>"http"}
2020-03-30 03:27:47 +0000 [info]: [elasticsearch] Connection opened to Elasticsearch cluster => {:host=>"elasticsearch-client", :port=>9200, :scheme=>"http"}
2020-03-30 03:27:48 +0000 [warn]: [elasticsearch] retry succeeded. chunk_id="5a208901af1d77c4c45817d20e3d640a"
2020-03-30 03:50:05 +0000 [warn]: [elasticsearch] buffer flush took longer time than slow_flush_log_threshold: elapsed_time=29.358517162967473 slow_flush_log_threshold=20.0 plugin_id="elasticsearch"
2020-03-30 03:50:05 +0000 [warn]: [elasticsearch] buffer flush took longer time than slow_flush_log_threshold: elapsed_time=27.023319482104853 slow_flush_log_threshold=20.0 plugin_id="elasticsearch"
我们看采集具体流程
我们可以修改fluentd-es configMap配置文件中的buffer.chunk_limit_size
output.conf: |-
<match **>
@id elasticsearch
@type elasticsearch
@log_level info
type_name _doc
include_tag_key true
host es-cn-36200000.public.elasticsearch.aliyuncs.com
port 9200
user elastic
password elastic
logstash_format true
<buffer>
@type file
path /var/log/fluentd-buffers/kubernetes.system.buffer
flush_mode interval
...
chunk_limit_size 100M # 由原来的2M增加到100M,一般2M就够了,只是为了测试 :)
...
overflow_action block # 我们看到,这个block就是溢出标志
</buffer>
</match>
复制代码
kibana报错1:版本不匹配
{"statusCode":503,"error":"Service Unavailable","message":"Service Unavailable"}
{"publish_address":"10.100.252.234:9200"},"ip":"10.100.252.234"},{"version":"6.8.6","http":{"publish_address":"10.100.35.208:9200"},"ip":"10.100.35.208"},{"version":"6.8.6","http":{"publish_address":"10.100.252.235:9200"},"ip":"10.100.252.235"},{"version":"6.8.6","http":{"publish_address":"10.100.35.205:9200"},"ip":"10.100.35.205"},{"version":"6.8.6","http":{"publish_address":"10.100.35.211:9200"},"ip":"10.100.35.211"}],"message":"You're running Kibana 6.7.0 with some different versions of Elasticsearch. Update Kibana or Elasticsearch to the same version to prevent compatibility issues: v6.8.6 @ 10.100.35.206:9200 (10.100.35.206), v6.8.6 @ 10.100.252.237:9200 (10.100.252.237), v6.8.6 @ 10.100.252.234:9200 (10.100.252.234), v6.8.6 @ 10.100.35.208:9200 (10.100.35.208), v6.8.6 @ 10.100.252.235:9200 (10.100.252.235), v6.8.6 @ 10.100.35.205:9200 (10.100.35.205), v6.8.6 @ 10.100.35.211:9200 (10.100.35.211)"}
elasticsearch入库错误:gc overhead导致数据节点脱离集群
更多推荐
所有评论(0)