Kubernetes Fluentd＋Elasticsearch＋Kibana统一日志管理平台搭建的填坑指南

时间 2019-11-09

标签 kubernetes fluentd+elasticsearch+kibana fluentd elasticsearch kibana 统一日志管理平台搭建指南栏目日志分析繁體版

原文原文链接

在初步完成Kubernetes集群架构的创建后,经过搭建一些监控组件,咱们已经可以实现html

图形化的监控每一个node,pod的状态信息和资源状况
经过scale进行replicateSet的扩展和伸缩
经过kubectl logs 或dashboard去查看每一个Pod的运行日志

可是,在分布式架构中节点的规模每每是很庞大的,一个典型的生产环境可能有几十上百个minion节点,在这种状况下就须要创建一套集中的日志监控和管理系统,在本人前期的思考中,也想经过volumn外挂到存储的方式实现weblogic的日志输出到共享存储,但这种方式的问题在于:java

咱们经过WebLogic单域的模式进行Docker的扩展服务, 这就意味着全部的日志路径和名字都是一致的,也就是都存放在pod的container节点的统一的路径下(/u01/oracle/user_projects/domains/base_domain/servers/AdminServer/logs/AdminServer.log),若是经过volumnMount映射到存储,会存在文件的冲突问题.
没法获取pod和container的信息
没法获取集群中其余节点的运行信息

所以仍是须要寻找平台级别的架构方案.在kubernetes的官方文档中,https://kubernetes.io/docs/concepts/cluster-administration/logging/node

Kubernetes给出了几种日志方案,并给出Cluster-level logging的参考架构：linux

也就是说，咱们本身启动运行的Pod的内部容器进程经过streaming的方式把日志输出到minion主机，而后由运行在相同主机的另一个pod,logging-agent-pod把日志获取到，同时把日志传回Backend, Bankend其实是基于不一样的实现，好比Elasticsearch-logging,以及展现的kibana平台。git

Kubernetes建议采用这种结点级别的logging-agent,并提供了其中的两种,一种用于Google Cloud Platform的Stackdriver Logging,另外一种就是Elasticsearch,两种都是采用fluentd作为在结点上运行的Agent(日志代理)github

Using a node-level logging agent is the most common and encouraged approach for a Kubernetes cluster, because it creates only one agent per node, and it doesn’t require any changes to the applications running on the node. However, node-level logging only works for applications’ standard output and standard error.golang

Kubernetes doesn’t specify a logging agent, but two optional logging agents are packaged with the Kubernetes release: Stackdriver Logging for use with Google Cloud Platform, and Elasticsearch. You can find more information and instructions in the dedicated documents. Both use fluentd with custom configuration as an agent on the node.web

好了,下文便开始咱们的填坑指南docker

1. 准备工做express

先交代一下环境

操做系统: CentOS 7.3

Kubernetes version: 1.5

[root@k8s-master fluentd-elasticsearch]# kubectl version
Client Version: version.Info{Major:"1", Minor:"5", GitVersion:"v1.5.2", GitCommit:"a55267932d501b9fbd6d73e5ded47d79b5763ce5", GitTreeState:"clean", BuildDate:"2017-04-14T13:36:25Z", GoVersion:"go1.7.4", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"5", GitVersion:"v1.5.2", GitCommit:"a55267932d501b9fbd6d73e5ded47d79b5763ce5", GitTreeState:"clean", BuildDate:"2017-04-14T13:36:25Z", GoVersion:"go1.7.4", Compiler:"gc", Platform:"linux/amd64"}

将github中kubernetes代码都下栽到master本地。

git clone https://github.com/kubernetes/kubernetes

配置ServiceAccount,这是由于以后下载的fluentd images须要用到SSL的方式去链接API Server,若是不许备修改并生成新的images的话,仍是须要配置好, 配置指南参考

http://www.cnblogs.com/ericnie/p/6894688.html

配置dns,kibana组件须要经过dns找到elasticsearch-logging的Service,若是不配置dns,那就须要修改kibana-controller.yaml中的地址为固定的service ip,配置指南参考

http://www.cnblogs.com/ericnie/p/6897142.html

下载images

进入/root/kubernetes/cluster/addons/fluentd-elasticsearch目录,看到全部的yaml文件

其中fluentd-es-ds.yaml用于构建运行在每一个结点的fluentd DamonSet负责logging Agent角色,es-controller.yaml和es-service.yaml用于构建elasticsearch logging,负责logging backend的日志汇总,而kibana-controller和kibana-service用于展现

把几个conroller.yaml文件中的images下载到各个minion节点

docker pull gcr.io/google_containers/elasticsearch:v2.4.1-2
docker pull gcr.io/google_containers/fluentd-elasticsearch:1.22
docker pull gcr.io/google_containers/kibana:v4.6.1-1

2.启动fluentd DaemonSet

打Label(坑1)

Fluentd会运行在每个minion节点上,经过

# kubectl create -f fluentd-es-ds.yaml 
daemonset "fluentd-es-v1.22" created

而后在minion节点上经过tail -f /var/log/fluentd.log中查看,结果在minion节点上根本没有fluentd.log文件啊!

笔者经过

kubectl get pods -n kube-system

看了一下,发现根本没有fluentd相关的Pod在运行或者在pending! :(

经过

kubectl get -f fluentd-es-ds.yaml
NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE-SELECTOR 
AGE fluentd-es-v1.22 0 0 0 0 0 beta.kubernetes.io/fluentd-ds-ready=true 2m

查看一下,发现有个NODE-SELECTOR(正常的以下),beta.kubernetes.io/fluentd-ds-ready=true

再

kubectl describe nodes k8s-node-1

发现个人minion节点根本没有这个Label,经过命令打label

kubectl label node k8s-node-1 beta.kubernetes.io/fluentd-ds-ready=true

从新create后,就发如今k8s-node-1中已经存在/var/log/fluentd.log文件了.

建立configmap(坑2)

# tail -f /var/log/fluentd.log

2017-03-02 02:27:01 +0000 [info]: reading config file path="/etc/td-agent/td-agent.conf"
2017-03-02 02:27:01 +0000 [info]: starting fluentd-0.12.31
2017-03-02 02:27:01 +0000 [info]: gem 'fluent-mixin-config-placeholders' version '0.4.0'
2017-03-02 02:27:01 +0000 [info]: gem 'fluent-mixin-plaintextformatter' version '0.2.6'
2017-03-02 02:27:01 +0000 [info]: gem 'fluent-plugin-docker_metadata_filter' version '0.1.3'
2017-03-02 02:27:01 +0000 [info]: gem 'fluent-plugin-elasticsearch' version '1.5.0'
2017-03-02 02:27:01 +0000 [info]: gem 'fluent-plugin-kafka' version '0.4.1'
2017-03-02 02:27:01 +0000 [info]: gem 'fluent-plugin-kubernetes_metadata_filter' version '0.24.0'
2017-03-02 02:27:01 +0000 [info]: gem 'fluent-plugin-mongo' version '0.7.16'
2017-03-02 02:27:01 +0000 [info]: gem 'fluent-plugin-rewrite-tag-filter' version '1.5.5'
2017-03-02 02:27:01 +0000 [info]: gem 'fluent-plugin-s3' version '0.8.0'
2017-03-02 02:27:01 +0000 [info]: gem 'fluent-plugin-scribe' version '0.10.14'
2017-03-02 02:27:01 +0000 [info]: gem 'fluent-plugin-td' version '0.10.29'
2017-03-02 02:27:01 +0000 [info]: gem 'fluent-plugin-td-monitoring' version '0.2.2'
2017-03-02 02:27:01 +0000 [info]: gem 'fluent-plugin-webhdfs' version '0.4.2'
2017-03-02 02:27:01 +0000 [info]: gem 'fluentd' version '0.12.31'
2017-03-02 02:27:01 +0000 [info]: adding match pattern="fluent.**" type="null"
2017-03-02 02:27:01 +0000 [info]: adding filter pattern="kubernetes.**" type="kubernetes_metadata"
2017-03-02 02:27:02 +0000 [error]: config error file="/etc/td-agent/td-agent.conf" error="Invalid Kubernetes API v1 endpoint https://192.168.0.105:443/api: 401 Unauthorized"
2017-03-02 02:27:02 +0000 [info]: process finished code=256
2017-03-02 02:27:02 +0000 [warn]: process died within 1 second. exit.

发现fluentd image是经过443端口去连个人ApiServer的，API Server开启了安全机制，所以须要配置ca_file、client_cert、client_key等key,若是不想从新作images,Kubernetes提供了ConfigMap这一强大的武器，咱们能够将新版td-agent.conf制做成kubernetes的configmap资源，并挂载到fluentd pod的相应位置以替换image中默认的td-agent.conf。

td-agent.conf的目录是

/root/kubernetes/cluster/addons/fluentd-elasticsearch/fluentd-es-image

加入ca,client等后如

// td-agent.conf
... ...
<filter kubernetes.**>
  type kubernetes_metadata
  ca_file /srv/kubernetes/ca.crt
  client_cert /srv/kubernetes/kubecfg.crt
  client_key /srv/kubernetes/kubecfg.key
</filter>
... ...

须要注意的是

在基于td-agent.conf建立configmap资源以前，须要将td-agent.conf中的注释行都删掉,建议先备份一份(后来发挥了做用)
fluentd pod将建立在kube-system下，所以configmap资源也须要建立在kube-system namespace下面，不然kubectl create没法找到对应的configmap。

# kubectl create configmap td-agent-config --from-file=./td-agent.conf -n kube-system
configmap "td-agent-config" created

# kubectl get configmaps td-agent-config -o yaml
apiVersion: v1
data:
  td-agent.conf: |
    <match fluent.**>
      type null
    </match>

    <source>
      type tail
      path /var/log/containers/*.log
      pos_file /var/log/es-containers.log.pos
      time_format %Y-%m-%dT%H:%M:%S.%NZ
      tag kubernetes.*
      format json
      read_from_head true
    </source>
... ...

fluentd-es-ds.yaml也要随之作一些改动，主要是增长两个mount:

一个是mount 上面的configmap td-agent-config

另一个就是mount hostpath：/srv/kubernetes以获取到相关client端的数字证书：

[root@k8s-master fluentd-elasticsearch]# cat fluentd-es-ds.yaml 
apiVersion: extensions/v1beta1
kind: DaemonSet
metadata:
  name: fluentd-es-v1.22
  namespace: kube-system
  labels:
    k8s-app: fluentd-es
    kubernetes.io/cluster-service: "true"
    addonmanager.kubernetes.io/mode: Reconcile
    version: v1.22
spec:
  template:
    metadata:
      labels:
        k8s-app: fluentd-es
        kubernetes.io/cluster-service: "true"
        version: v1.22
      # This annotation ensures that fluentd does not get evicted if the node
      # supports critical pod annotation based priority scheme.
      # Note that this does not guarantee admission on the nodes (#40573).
      annotations:
        scheduler.alpha.kubernetes.io/critical-pod: ''
    spec:
      containers:
      - name: fluentd-es
        image: gcr.io/google_containers/fluentd-elasticsearch:1.22
        command:
          - '/bin/sh'
          - '-c'
          - '/usr/sbin/td-agent 2>&1 >> /var/log/fluentd.log'
        resources:
          limits:
            memory: 200Mi
          requests:
            cpu: 100m
            memory: 200Mi
        volumeMounts:
        - name: varlog
          mountPath: /var/log
        - name: varlibdockercontainers
          mountPath: /var/lib/docker/containers
          readOnly: true
        - name: td-agent-config
          mountPath: /etc/td-agent
        - name: tls-files
          mountPath: /srv/kubernetes
      nodeSelector:
        beta.kubernetes.io/fluentd-ds-ready: "true"
      terminationGracePeriodSeconds: 30
      volumes:
      - name: varlog
        hostPath:
          path: /var/log
      - name: varlibdockercontainers
        hostPath:
          path: /var/lib/docker/containers
      - name: td-agent-config
        configMap:
          name: td-agent-config
      - name: tls-files
        hostPath:
          path: /srv/kubernetes
[root@k8s-master fluentd-elasticsearch]#

再次建立fluent-es-ds.yaml,而后看minion的/var/log/fluentd.log

......
    client_cert /srv/kubernetes/kubecfg.crt
    client_key /srv/kubernetes/kubecfg.key
  </filter>
  <match **>
    type elasticsearch
    log_level info
    include_tag_key true
    host elasticsearch-logging
    port 9200
    logstash_format true
    buffer_chunk_limit 2M
    buffer_queue_limit 32
    flush_interval 5s
    max_retry_wait 30
    disable_retry_limit 
    num_threads 8
  </match>
</ROOT>

出现这个基本算是成功了，貌似没问题吧，其实有个坑，能够接下来继续配置elasticsearch log.

3.配置elasticsearch

建立elasticsearch,

# kubectl create -f es-controller.yaml
replicationcontroller "elasticsearch-logging-v1" created

# kubectl create -f es-service.yaml
service "elasticsearch-logging" created

get pods：

kube-system                  elasticsearch-logging-v1-3bzt6          1/1       Running    0          7s        172.16.57.8    10.46.181.146
kube-system                  elasticsearch-logging-v1-nvbe1          1/1       Running    0          7s        172.16.99.10   10.47.136.60

查看日志

# kubectl logs -f elasticsearch-logging-v1-3bzt6 -n kube-system
F0302 03:59:41.036697       8 elasticsearch_logging_discovery.go:60] kube-system namespace doesn't exist: the server has asked for the client to provide credentials (get namespaces kube-system)
goroutine 1 [running]:
k8s.io/kubernetes/vendor/github.com/golang/glog.stacks(0x19a8100, 0xc400000000, 0xc2, 0x186)
... ...
main.main()
    elasticsearch_logging_discovery.go:60 +0xb53

[2017-03-02 03:59:42,587][INFO ][node                     ] [elasticsearch-logging-v1-3bzt6] version[2.4.1], pid[16], build[c67dc32/2016-09-27T18:57:55Z]
[2017-03-02 03:59:42,588][INFO ][node                     ] [elasticsearch-logging-v1-3bzt6] initializing ...
[2017-03-02 03:59:44,396][INFO ][plugins                  ] [elasticsearch-logging-v1-3bzt6] modules [reindex, lang-expression, lang-groovy], plugins [], sites []
... ...
[2017-03-02 03:59:44,441][INFO ][env                      ] [elasticsearch-logging-v1-3bzt6] heap size [1007.3mb], compressed ordinary object pointers [true]
[2017-03-02 03:59:48,355][INFO ][node                     ] [elasticsearch-logging-v1-3bzt6] initialized
[2017-03-02 03:59:48,355][INFO ][node                     ] [elasticsearch-logging-v1-3bzt6] starting ...
[2017-03-02 03:59:48,507][INFO ][transport                ] [elasticsearch-logging-v1-3bzt6] publish_address {172.16.57.8:9300}, bound_addresses {[::]:9300}
[2017-03-02 03:59:48,547][INFO ][discovery                ] [elasticsearch-logging-v1-3bzt6] kubernetes-logging/7_f_M2TKRZWOw4NhBc4EqA
[2017-03-02 04:00:18,552][WARN ][discovery                ] [elasticsearch-logging-v1-3bzt6] waited for 30s and no initial state was set by the discovery
[2017-03-02 04:00:18,562][INFO ][http                     ] [elasticsearch-logging-v1-3bzt6] publish_address {172.16.57.8:9200}, bound_addresses {[::]:9200}
[2017-03-02 04:00:18,562][INFO ][node                     ] [elasticsearch-logging-v1-3bzt6] started

发现错误，没法提供安全的credential,经过在网上参考Tony Bai的技术文档，发现是默认的Service Account的问题，其中原理还须要研究一下。

先run起来再说，解决方案以下：

建立一个新的serviceaccount在kube-system namespace下：

/serviceaccount.yaml
apiVersion: v1
kind: ServiceAccount
metadata:
  name: k8s-efk

# kubectl create -f serviceaccount.yaml -n kube-system
serviceaccount "k8s-efk" created

# kubectl get serviceaccount -n kube-system
NAME      SECRETS   AGE
default   1         139d
k8s-efk   1         17s

修改es-controller.yaml使用service account “k8s-efk”：

[root@k8s-master fluentd-elasticsearch]# cat es-controller.yaml 
apiVersion: v1
kind: ReplicationController
metadata:
  name: elasticsearch-logging-v1
  namespace: kube-system
  labels:
    k8s-app: elasticsearch-logging
    version: v1
    kubernetes.io/cluster-service: "true"
    addonmanager.kubernetes.io/mode: Reconcile
spec:
  replicas: 2
  selector:
    k8s-app: elasticsearch-logging
    version: v1
  template:
    metadata:
      labels:
        k8s-app: elasticsearch-logging
        version: v1
        kubernetes.io/cluster-service: "true"
    spec:
      serviceAccount: k8s-efk
      containers:
      - image: gcr.io/google_containers/elasticsearch:v2.4.1-2
        name: elasticsearch-logging
        resources:
          # need more cpu upon initialization, therefore burstable class
          limits:
            cpu: 1000m
          requests:
            cpu: 100m
        ports:
        - containerPort: 9200
          name: db
          protocol: TCP
        - containerPort: 9300
          name: transport
          protocol: TCP
        volumeMounts:
        - name: es-persistent-storage
          mountPath: /data
        env:
        - name: "NAMESPACE"
          valueFrom:
            fieldRef:
              fieldPath: metadata.namespace
      volumes:
      - name: es-persistent-storage
        emptyDir: {}

从新建立elasticsearch logging service后，咱们再来查看elasticsearch-logging pod的日志，貌似OK,其实也是个坑,呆会我继续说：

[2017-05-22 06:09:12,155][INFO ][node                     ] [elasticsearch-logging-v1-9jjf1] version[2.4.1], pid[1], build[c67dc32/2016-09-27T18:57:55Z]
[2017-05-22 06:09:12,156][INFO ][node                     ] [elasticsearch-logging-v1-9jjf1] initializing ...
[2017-05-22 06:09:13,657][INFO ][plugins                  ] [elasticsearch-logging-v1-9jjf1] modules [reindex, lang-expression, lang-groovy], plugins [], sites []
[2017-05-22 06:09:13,733][INFO ][env                      ] [elasticsearch-logging-v1-9jjf1] using [1] data paths, mounts [[/data (/dev/mapper/cl-root)]], net usable_space [25gb], net total_space [37.2gb], spins? [possibly], types [xfs]
[2017-05-22 06:09:13,738][INFO ][env                      ] [elasticsearch-logging-v1-9jjf1] heap size [1015.6mb], compressed ordinary object pointers [true]
[2017-05-22 06:09:21,946][INFO ][node                     ] [elasticsearch-logging-v1-9jjf1] initialized
[2017-05-22 06:09:21,980][INFO ][node                     ] [elasticsearch-logging-v1-9jjf1] starting ...
[2017-05-22 06:09:22,442][INFO ][transport                ] [elasticsearch-logging-v1-9jjf1] publish_address {192.168.10.6:9300}, bound_addresses {[::]:9300}
[2017-05-22 06:09:22,560][INFO ][discovery                ] [elasticsearch-logging-v1-9jjf1] kubernetes-logging/RY_IOcwSSSeuJNtC2E0W7A
[2017-05-22 06:09:30,446][INFO ][cluster.service          ] [elasticsearch-logging-v1-9jjf1] detected_master {elasticsearch-logging-v1-sbcgt}{9--uDYJOTqegj5ctbbCx_A}{192.168.10.8}{192.168.10.8:9300}{master=true}, added {{elasticsearch-logging-v1-sbcgt}{9--uDYJOTqegj5ctbbCx_A}{192.168.10.8}{192.168.10.8:9300}{master=true},}, reason: zen-disco-receive(from master [{elasticsearch-logging-v1-sbcgt}{9--uDYJOTqegj5ctbbCx_A}{192.168.10.8}{192.168.10.8:9300}{master=true}])
[2017-05-22 06:09:30,453][INFO ][http                     ] [elasticsearch-logging-v1-9jjf1] publish_address {192.168.10.6:9200}, bound_addresses {[::]:9200}
[2017-05-22 06:09:30,465][INFO ][node                     ] [elasticsearch-logging-v1-9jjf1] started

好了，继续.....

4.配置kibana

根据前辈们的经验，把上面新建立的serviceaccount：k8s-efk显式赋值给kibana-controller.yaml:

[root@k8s-master fluentd-elasticsearch]# cat kibana-controller.yaml 
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  name: kibana-logging
  namespace: kube-system
  labels:
    k8s-app: kibana-logging
    kubernetes.io/cluster-service: "true"
    addonmanager.kubernetes.io/mode: Reconcile
spec:
  replicas: 1
  selector:
    matchLabels:
      k8s-app: kibana-logging
  template:
    metadata:
      labels:
        k8s-app: kibana-logging
    spec:
      serviceAccount: k8s-efk
      containers:
      - name: kibana-logging
        image: gcr.io/google_containers/kibana:v4.6.1-1
        resources:
          # keep request = limit to keep this container in guaranteed class
          limits:
            cpu: 100m
          requests:
            cpu: 100m
        env:
          - name: "ELASTICSEARCH_URL"
            value: "http://elasticsearch-logging:9200"
          - name: "KIBANA_BASE_URL"
            value: "/api/v1/proxy/namespaces/kube-system/services/kibana-logging"
        ports:
        - containerPort: 5601
          name: ui
          protocol: TCP
[root@k8s-master fluentd-elasticsearch]#

启动kibana，并观察pod日志：

# kubectl logs -f kibana-logging-3604961973-jby53 -n kube-system
ELASTICSEARCH_URL=http://elasticsearch-logging:9200
server.basePath: /api/v1/proxy/namespaces/kube-system/services/kibana-logging
{"type":"log","@timestamp":"2017-03-02T08:30:15Z","tags":["info","optimize"],"pid":6,"message":"Optimizing and caching bundles for kibana and statusPage. This may take a few minutes"}

kibana启动须要十几分钟。抱歉，本人是在一台8G的笔记本电脑的virtualbox虚拟机上作，以后你将会看到以下日志：

# kubectl logs -f kibana-logging-3604961973-jby53 -n kube-system
ELASTICSEARCH_URL=http://elasticsearch-logging:9200
server.basePath: /api/v1/proxy/namespaces/kube-system/services/kibana-logging
{"type":"log","@timestamp":"2017-03-02T08:30:15Z","tags":["info","optimize"],"pid":6,"message":"Optimizing and caching bundles for kibana and statusPage. This may take a few minutes"}
{"type":"log","@timestamp":"2017-03-02T08:40:04Z","tags":["info","optimize"],"pid":6,"message":"Optimization of bundles for kibana and statusPage complete in 588.60 seconds"}
{"type":"log","@timestamp":"2017-03-02T08:40:04Z","tags":["status","plugin:kibana@1.0.0","info"],"pid":6,"state":"green","message":"Status changed from uninitialized to green - Ready","prevState":"uninitialized","prevMsg":"uninitialized"}
{"type":"log","@timestamp":"2017-03-02T08:40:05Z","tags":["status","plugin:elasticsearch@1.0.0","info"],"pid":6,"state":"yellow","message":"Status changed from uninitialized to yellow - Waiting for Elasticsearch","prevState":"uninitialized","prevMsg":"uninitialized"}
{"type":"log","@timestamp":"2017-03-02T08:40:05Z","tags":["status","plugin:kbn_vislib_vis_types@1.0.0","info"],"pid":6,"state":"green","message":"Status changed from uninitialized to green - Ready","prevState":"uninitialized","prevMsg":"uninitialized"}
{"type":"log","@timestamp":"2017-03-02T08:40:05Z","tags":["status","plugin:markdown_vis@1.0.0","info"],"pid":6,"state":"green","message":"Status changed from uninitialized to green - Ready","prevState":"uninitialized","prevMsg":"uninitialized"}
{"type":"log","@timestamp":"2017-03-02T08:40:05Z","tags":["status","plugin:metric_vis@1.0.0","info"],"pid":6,"state":"green","message":"Status changed from uninitialized to green - Ready","prevState":"uninitialized","prevMsg":"uninitialized"}
{"type":"log","@timestamp":"2017-03-02T08:40:06Z","tags":["status","plugin:spyModes@1.0.0","info"],"pid":6,"state":"green","message":"Status changed from uninitialized to green - Ready","prevState":"uninitialized","prevMsg":"uninitialized"}
{"type":"log","@timestamp":"2017-03-02T08:40:06Z","tags":["status","plugin:statusPage@1.0.0","info"],"pid":6,"state":"green","message":"Status changed from uninitialized to green - Ready","prevState":"uninitialized","prevMsg":"uninitialized"}
{"type":"log","@timestamp":"2017-03-02T08:40:06Z","tags":["status","plugin:table_vis@1.0.0","info"],"pid":6,"state":"green","message":"Status changed from uninitialized to green - Ready","prevState":"uninitialized","prevMsg":"uninitialized"}
{"type":"log","@timestamp":"2017-03-02T08:40:06Z","tags":["listening","info"],"pid":6,"message":"Server running at http://0.0.0.0:5601"}
{"type":"log","@timestamp":"2017-03-02T08:40:11Z","tags":["status","plugin:elasticsearch@1.0.0","info"],"pid":6,"state":"yellow","message":"Status changed from yellow to yellow - No existing Kibana index found","prevState":"yellow","prevMsg":"Waiting for Elasticsearch"}
{"type":"log","@timestamp":"2017-03-02T08:40:14Z","tags":["status","plugin:elasticsearch@1.0.0","info"],"pid":6,"state":"green","message":"Status changed from yellow to green - Kibana index ready","prevState":"yellow","prevMsg":"No existing Kibana index found"}

须要注意的是:(这也是坑阿)

必须配置dns,不然会出现http://elasticsearch-logging:9200没法链接的状态
若是不配置dns,那就只能修改control文件把elasticsearch-logging修改为具体的elasticsearch-logging服务的集群ip

经过

kubectl cluster-info

能够获取kibana服务的地址，其实也就是在

https://{API Server external IP}:{API Server secure port}/api/v1/proxy/namespaces/kube-system/services/kibana-logging/app/kibana#/settings/indices/

在下面这个界面中发现不管怎么搞create都不出现，没法添加index，固然除了直接输入*能够create,可是进去没有任何pod的信息，问题大了!!!!

5.定位问题过程

仔细对照了Tony Bai的搭建文档，有参考了无数前辈的无数的帖子，都想换个CentOS 6.5版本从新来过了，无奈CentOS 6.5暂时也没装上kubernetes集群，因此放弃。

对照了一下日志，出问题的地方极可能是:

根本没有日志
fluentd服务运行问题
elasticsearch logging日志收集问题

修改elasticsearch logging的images(坑阿)

仔细看fluentd的日志/etc/log/fluent.log,发现根本就是没有任何日志的输出，排除fluentd链接elasticsearch logging:9200链接不上的问题，

感受是elasticseach logging的本身的问题，而后对比tonybai的elasticsearch的日志，发现个人只有

[2017-05-22 06:09:30,446][INFO ][cluster.service          ] [elasticsearch-logging-v1-9jjf1] detected_master {elasticsearch-logging-v1-sbcgt}{9--uDYJOTqegj5ctbbCx_A}{192.168.10.8}{192.168.10.8:9300}{master=true}, added {{elasticsearch-logging-v1-sbcgt}{9--uDYJOTqegj5ctbbCx_A}{192.168.10.8}{192.168.10.8:9300}{master=true},}, reason: zen-disco-receive(from master [{elasticsearch-logging-v1-sbcgt}{9--uDYJOTqegj5ctbbCx_A}{192.168.10.8}{192.168.10.8:9300}{master=true}])
[2017-05-22 06:09:30,453][INFO ][http                     ] [elasticsearch-logging-v1-9jjf1] publish_address {192.168.10.6:9200}, bound_addresses {[::]:9200}
[2017-05-22 06:09:30,465][INFO ][node                     ] [elasticsearch-logging-v1-9jjf1] started

就结束了，而tonybai的是

[2017-03-02 08:26:56,955][INFO ][http                     ] [elasticsearch-logging-v1-dklui] publish_address {172.16.57.8:9200}, bound_addresses {[::]:9200}
[2017-03-02 08:26:56,956][INFO ][node                     ] [elasticsearch-logging-v1-dklui] started
[2017-03-02 08:26:57,157][INFO ][gateway                  ] [elasticsearch-logging-v1-dklui] recovered [0] indices into cluster_state
[2017-03-02 08:27:05,378][INFO ][cluster.metadata         ] [elasticsearch-logging-v1-dklui] [logstash-2017.03.02] creating index, cause [auto(bulk api)], templates [], shards [5]/[1], mappings []
[2017-03-02 08:27:06,360][INFO ][cluster.metadata         ] [elasticsearch-logging-v1-dklui] [logstash-2017.03.01] creating index, cause [auto(bulk api)], templates [], shards [5]/[1], mappings []
[2017-03-02 08:27:07,163][INFO ][cluster.routing.allocation] [elasticsearch-logging-v1-dklui] Cluster health status changed from [RED] to [YELLOW] (reason: [shards started [[logstash-2017.03.01][3], [logstash-2017.03.01][3]] ...]).
[2017-03-02 08:27:07,354][INFO ][cluster.metadata         ] [elasticsearch-logging-v1-dklui] [logstash-2017.03.02] create_mapping [fluentd]
[2017-03-02 08:27:07,988][INFO ][cluster.metadata         ] [elasticsearch-logging-v1-dklui] [logstash-2017.03.01] create_mapping [fluentd]
[2017-03-02 08:27:09,578][INFO ][cluster.routing.allocation] [elasticsearch-logging-v1-dklui] Cluster health status changed from [YELLOW] to [GREEN] (reason: [shards started [[logstash-2017.03.02][4]] ...]).

区别以下：

recover cluster的信息
有logstash的信息

感受颇有多是image问题，所以把原来的image修改为和tonybai保持一致，把es-control.yaml中官方的v2.4.1-2修改成

bigwhite/elasticsearch:v2.4.1-1

而后启动，发现果真有了recover cluster的信息。但仍然没有logstash的信息。

定位日志问题(又是个大坑)

又绕回到第一个问题，明明经过kubectl logs满屏幕的日志

[root@k8s-master fluentd-elasticsearch]# kubectl logs helloworld-service-4d72j
.
.
JAVA Memory arguments: -Djava.security.egd=file:/dev/./urandom
.
CLASSPATH=/u01/oracle/wlserver/../oracle_common/modules/javax.persistence_2.1.jar:/u01/oracle/wlserver/../wlserver/modules/com.oracle.weblogic.jpa21support_1.0.0.0_2-1.jar:/usr/java/jdk1.8.0_101/lib/tools.jar:/u01/oracle/wlserver/server/lib/weblogic_sp.jar:/u01/oracle/wlserver/server/lib/weblogic.jar:/u01/oracle/wlserver/../oracle_common/modules/net.sf.antcontrib_1.1.0.0_1-0b3/lib/ant-contrib.jar:/u01/oracle/wlserver/modules/features/oracle.wls.common.nodemanager_2.0.0.0.jar:/u01/oracle/wlserver/../oracle_common/modules/com.oracle.cie.config-wls-online_8.1.0.0.jar:/u01/oracle/wlserver/common/derby/lib/derbyclient.jar:/u01/oracle/wlserver/common/derby/lib/derby.jar:/u01/oracle/wlserver/server/lib/xqrl.jar
.
PATH=/u01/oracle/wlserver/server/bin:/u01/oracle/wlserver/../oracle_common/modules/org.apache.ant_1.9.2/bin:/usr/java/jdk1.8.0_101/jre/bin:/usr/java/jdk1.8.0_101/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/java/default/bin:/u01/oracle/oracle_common/common/bin:/u01/oracle/oracle_common/common/bin:/u01/oracle/wlserver/common/bin:/u01/oracle/user_projects/domains/base_domain/bin:/u01/oracle
.
***************************************************
*  To start WebLogic Server, use a username and   *
*  password assigned to an admin-level user.  For *
*  server administration, use the WebLogic Server *
*  console at http://hostname:port/console        *
***************************************************
starting weblogic with Java version:
java version "1.8.0_101"
Java(TM) SE Runtime Environment (build 1.8.0_101-b13)
Java HotSpot(TM) 64-Bit Server VM (build 25.101-b13, mixed mode)
Starting WLS with line:
/usr/java/jdk1.8.0_101/bin/java -server   -Djava.security.egd=file:/dev/./urandom -Dweblogic.Name=AdminServer -Djava.security.policy=/u01/oracle/wlserver/server/lib/weblogic.policy  -Dweblogic.ProductionModeEnabled=true   -Djava.endorsed.dirs=/usr/java/jdk1.8.0_101/jre/lib/endorsed:/u01/oracle/wlserver/../oracle_common/modules/endorsed  -da -Dwls.home=/u01/oracle/wlserver/server -Dweblogic.home=/u01/oracle/wlserver/server     -Dweblogic.utils.cmm.lowertier.ServiceDisabled=true  weblogic.Server
<May 24, 2017 2:27:39 AM GMT> <Info> <Security> <BEA-090905> <Disabling the CryptoJ JCE Provider self-integrity check for better startup performance. To enable this check, specify -Dweblogic.security.allowCryptoJDefaultJCEVerification=true.> 
<May 24, 2017 2:27:41 AM GMT> <Info> <Security> <BEA-090906> <Changing the default Random Number Generator in RSA CryptoJ from ECDRBG128 to FIPS186PRNG. To disable this change, specify -Dweblogic.security.allowCryptoJDefaultPRNG=true.> 
<May 24, 2017 2:27:44 AM GMT> <Info> <WebLogicServer> <BEA-000377> <Starting WebLogic Server with Java HotSpot(TM) 64-Bit Server VM Version 25.101-b13 from Oracle Corporation.> 
<May 24, 2017 2:27:47 AM GMT> <Info> <Management> <BEA-141107> <Version: WebLogic Server 12.1.3.0.0  Wed May 21 18:53:34 PDT 2014 1604337 > 
<May 24, 2017 2:27:59 AM GMT> <Notice> <WebLogicServer> <BEA-000365> <Server state changed to STARTING.> 
<May 24, 2017 2:28:00 AM GMT> <Info> <WorkManager> <BEA-002900> <Initializing self-tuning thread pool.> 
<May 24, 2017 2:28:00 AM GMT> <Info> <WorkManager> <BEA-002942> <CMM memory level becomes 0. Setting standby thread pool size to 256.> 
<May 24, 2017 2:28:02 AM GMT> <Notice> <Log Management> <BEA-170019> <The server log file /u01/oracle/user_projects/domains/base_domain/servers/AdminServer/logs/AdminServer.log is opened. All server side log events will be written to this file.> 
<May 24, 2017 2:28:18 AM GMT> <Notice> <Security> <BEA-090082> <Security initializing using security realm myrealm.> 
<May 24, 2017 2:28:31 AM GMT> <Notice> <WebLogicServer> <BEA-000365> <Server state changed to STANDBY.> 
<May 24, 2017 2:28:31 AM GMT> <Notice> <WebLogicServer> <BEA-000365> <Server state changed to STARTING.> 
May 24, 2017 2:28:31 AM weblogic.wsee.WseeCoreMessages logWseeServiceStarting
INFO: The Wsee Service is starting
<May 24, 2017 2:28:34 AM GMT> <Warning> <Munger> <BEA-2156203> <A version attribute was not found in element "web-app" in the deployment descriptor /u01/oracle/user_projects/domains/base_domain/servers/AdminServer/upload/HelloWorld.war/WEB-INF/web.xml. A version attribute is required, but this version of the WebLogic Server will assume that the latest version is used. Future versions of WebLogic Server will reject descriptors that do not specify the Java EE version. To eliminate this warning, add an appropriate "version=" to element "web-app" in the deployment descriptor.> 
<May 24, 2017 2:28:39 AM GMT> <Notice> <Log Management> <BEA-170027> <The server has successfully established a connection with the Domain level Diagnostic Service.> 
<May 24, 2017 2:28:41 AM GMT> <Notice> <WebLogicServer> <BEA-000365> <Server state changed to ADMIN.> 
<May 24, 2017 2:28:41 AM GMT> <Notice> <WebLogicServer> <BEA-000365> <Server state changed to RESUMING.> 
<May 24, 2017 2:28:41 AM GMT> <Notice> <Server> <BEA-002613> <Channel "Default[3]" is now listening on 127.0.0.1:7001 for protocols iiop, t3, ldap, snmp, http.> 
<May 24, 2017 2:28:41 AM GMT> <Notice> <Server> <BEA-002613> <Channel "Default" is now listening on 192.168.53.3:7001 for protocols iiop, t3, ldap, snmp, http.> 
<May 24, 2017 2:28:41 AM GMT> <Notice> <Server> <BEA-002613> <Channel "Default[2]" is now listening on 0:0:0:0:0:0:0:1:7001 for protocols iiop, t3, ldap, snmp, http.> 
<May 24, 2017 2:28:41 AM GMT> <Notice> <WebLogicServer> <BEA-000329> <Started the WebLogic Server Administration Server "AdminServer" for domain "base_domain" running in production mode.> 
<May 24, 2017 2:28:41 AM GMT> <Error> <Server> <BEA-002606> <The server is unable to create a server socket for listening on channel "Default[1]". The address fe80:0:0:0:42:c0ff:fea8:3503 might be incorrect or another process is using port 7001: java.net.BindException: Cannot assign requested address> 
<May 24, 2017 2:28:41 AM GMT> <Warning> <Server> <BEA-002611> <The hostname "localhost", maps to multiple IP addresses: 127.0.0.1, 0:0:0:0:0:0:0:1.> 
<May 24, 2017 2:28:41 AM GMT> <Notice> <WebLogicServer> <BEA-000360> <The server started in RUNNING mode.> 
<May 24, 2017 2:28:41 AM GMT> <Notice> <WebLogicServer> <BEA-000365> <Server state changed to RUNNING.> 
[root@k8s-master fluentd-elasticsearch]#

再去minion机器上经过docker logs id也是满屏的日志阿!

[root@k8s-node-1 ~]# docker logs bec3e02b2490
.
.
JAVA Memory arguments: -Djava.security.egd=file:/dev/./urandom
.
CLASSPATH=/u01/oracle/wlserver/../oracle_common/modules/javax.persistence_2.1.jar:/u01/oracle/wlserver/../wlserver/modules/com.oracle.weblogic.jpa21support_1.0.0.0_2-1.jar:/usr/java/jdk1.8.0_101/lib/tools.jar:/u01/oracle/wlserver/server/lib/weblogic_sp.jar:/u01/oracle/wlserver/server/lib/weblogic.jar:/u01/oracle/wlserver/../oracle_common/modules/net.sf.antcontrib_1.1.0.0_1-0b3/lib/ant-contrib.jar:/u01/oracle/wlserver/modules/features/oracle.wls.common.nodemanager_2.0.0.0.jar:/u01/oracle/wlserver/../oracle_common/modules/com.oracle.cie.config-wls-online_8.1.0.0.jar:/u01/oracle/wlserver/common/derby/lib/derbyclient.jar:/u01/oracle/wlserver/common/derby/lib/derby.jar:/u01/oracle/wlserver/server/lib/xqrl.jar
.
PATH=/u01/oracle/wlserver/server/bin:/u01/oracle/wlserver/../oracle_common/modules/org.apache.ant_1.9.2/bin:/usr/java/jdk1.8.0_101/jre/bin:/usr/java/jdk1.8.0_101/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/java/default/bin:/u01/oracle/oracle_common/common/bin:/u01/oracle/oracle_common/common/bin:/u01/oracle/wlserver/common/bin:/u01/oracle/user_projects/domains/base_domain/bin:/u01/oracle
.
***************************************************
*  To start WebLogic Server, use a username and   *
*  password assigned to an admin-level user.  For *
*  server administration, use the WebLogic Server *
*  console at http://hostname:port/console        *
***************************************************
starting weblogic with Java version:
java version "1.8.0_101"
Java(TM) SE Runtime Environment (build 1.8.0_101-b13)
Java HotSpot(TM) 64-Bit Server VM (build 25.101-b13, mixed mode)
Starting WLS with line:
/usr/java/jdk1.8.0_101/bin/java -server   -Djava.security.egd=file:/dev/./urandom -Dweblogic.Name=AdminServer -Djava.security.policy=/u01/oracle/wlserver/server/lib/weblogic.policy  -Dweblogic.ProductionModeEnabled=true   -Djava.endorsed.dirs=/usr/java/jdk1.8.0_101/jre/lib/endorsed:/u01/oracle/wlserver/../oracle_common/modules/endorsed  -da -Dwls.home=/u01/oracle/wlserver/server -Dweblogic.home=/u01/oracle/wlserver/server     -Dweblogic.utils.cmm.lowertier.ServiceDisabled=true  weblogic.Server
...........

由于/var/log/fluentd.log总是停留在配置上没有刷动，怀疑是configmap的配置问题，找出以前备份的td-agent.conf一看

# The Kubernetes fluentd plugin is used to write the Kubernetes metadata to the log
# record & add labels to the log record if properly configured. This enables users
# to filter & search logs on any metadata.
# For example a Docker container's logs might be in the directory:
#
#  /var/lib/docker/containers/997599971ee6366d4a5920d25b79286ad45ff37a74494f262e3bc98d909d0a7b
#
# and in the file:
#
#  997599971ee6366d4a5920d25b79286ad45ff37a74494f262e3bc98d909d0a7b-json.log
#
# where 997599971ee6... is the Docker ID of the running container.
# The Kubernetes kubelet makes a symbolic link to this file on the host machine
# in the /var/log/containers directory which includes the pod name and the Kubernetes
# container name:
#
#    synthetic-logger-0.25lps-pod_default_synth-lgr-997599971ee6366d4a5920d25b79286ad45ff37a74494f262e3bc98d909d0a7b.log

终于发现问题, 原来都是经过/var/lib/docker/containers/目录去找，然而个人docker下面容器没有任何log文件。

仔细研究了一下docker,原来全部的docker日志都journal到系统日志 /var/log/messages下了.为何呢？由于常常有人说docker日志太多致使container容器增加比较快，因此都经过系统的journal进行统一处理。

修改/etc/sysconfig/docker配置文件，把原来的journal改回到当前json.log方式.

#OPTIONS='--selinux-enabled --log-driver=journald --signature-verification=false'
OPTIONS='--selinux-enabled --log-driver=json-file --signature-verification=false'

改完后就发现container下面有不少log文件了.

而后回到/var/log/fluentd.log文件，发现终于满屏的日志滚动，输出正常了

2017-05-24 05:44:17 +0000 [info]: following tail of /var/log/containers/fluentd-es-v1.22-351lz_kube-system_POD-aca728523bc307598917d78b2526e718e6c7fdbb38b70c05900d2439399efa10.log
2017-05-24 05:44:17 +0000 [info]: following tail of /var/log/containers/helloworld-service-n5f0s_default_POD-ca013e9ab31b825cd4b85ab4700fad2fcaafd5f39c572778d10d438012ea4435.log
2017-05-24 05:44:17 +0000 [info]: following tail of /var/log/containers/fluentd-es-v1.22-351lz_kube-system_POD-2eb78ece8c2b5c222313ab4cfb53ea6ec32f54e1b7616f729daf48b01d393b65.log
2017-05-24 05:44:17 +0000 [info]: following tail of /var/log/containers/helloworld-service-4d72j_default_POD-1dcbbc2ef71f7f542018069a1043a122117a97378c19f03ddb95b8a71dab4637.log
2017-05-24 05:44:17 +0000 [info]: following tail of /var/log/containers/helloworld-service-n5f0s_default_weblogichelloworld-d7229e5c23c6bf7582ed6559417ba24d99e33e44a68a6079159b4792fe05a673.log
2017-05-24 05:44:17 +0000 [info]: following tail of /var/log/containers/helloworld-service-4d72j_default_weblogichelloworld-71d1d7252dd7504fd45351d714d21c3c615facc5e2650553c68c0bf359e8434a.log
2017-05-24 05:44:17 +0000 [info]: following tail of /var/log/containers/kube-dns-v11-x0vr3_kube-system_kube2sky-c77121c354459f22712b0a99623eff1590f4fdb1a5d3ad2db09db000755f9c2c.log
2017-05-24 05:44:17 +0000 [info]: following tail of /var/log/containers/kube-dns-v11-x0vr3_kube-system_skydns-f3c0fbf4ea5cd840c968a807a40569042c90de08f7722e7344282845d5782a20.log
2017-05-24 05:44:17 +0000 [info]: following tail of /var/log/containers/fluentd-es-v1.22-351lz_kube-system_fluentd-es-93795904ff4870758441dd7288972d4967ffac18f2f25272b12e99ea6b692d44.log
2017-05-24 05:45:03 +0000 [warn]: temporarily failed to flush the buffer. next_retry=2017-05-24 05:44:24 +0000 error_class="Fluent::ElasticsearchOutput::ConnectionFailure" error="Can not reach Elasticsearch cluster ({:host=>\"elasticsearch-logging\", :port=>9200, :scheme=>\"http\"})!" plugin_id="object:3f986d0a5150"
  2017-05-24 05:45:03 +0000 [warn]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluent-plugin-elasticsearch-1.5.0/lib/fluent/plugin/out_elasticsearch.rb:122:in `client'

把全部的组件都启动起来

[root@k8s-master fluentd-elasticsearch]# kubectl get pods -n kube-system
NAME                              READY     STATUS    RESTARTS   AGE
elasticsearch-logging-v1-1xwnq    1/1       Running   0          29s
elasticsearch-logging-v1-gx6lc    1/1       Running   0          29s
fluentd-es-v1.22-351lz            1/1       Running   1          3h
kibana-logging-3659310023-gcwrn   1/1       Running   0          15s
kube-dns-v11-x0vr3                4/4       Running   28         1d

访问kibana

[root@k8s-master fluentd-elasticsearch]# kubectl cluster-info
Kubernetes master is running at http://localhost:8080
Elasticsearch is running at http://localhost:8080/api/v1/proxy/namespaces/kube-system/services/elasticsearch-logging
Kibana is running at http://localhost:8080/api/v1/proxy/namespaces/kube-system/services/kibana-logging
KubeDNS is running at http://localhost:8080/api/v1/proxy/namespaces/kube-system/services/kube-dns

To further debug and diagnose cluster problems, use 'kubectl cluster-info dump'.

终于出现绿色的create

create完后进入

点开kubernetes pod name,能够看到集群内部全部的pod的日志

若是咱们只是关注helloworld-service,选择旁边的+,而后能够看到每条weblogic日志的输出.

好了，到目前为止配置完成。

6.填坑总结:

此次为了配置EFK把DNS,ServiceAccount都配置了一便，也算是收获啦。
对kubernetes内部集群的机制有更深的理解，也慢慢的帮助咱们加速解决问题吧。

最后真心感谢网上那些还没有谋面的大师的帮助:

http://tonybai.com/2017/03/03/implement-kubernetes-cluster-level-logging-with-fluentd-and-elasticsearch-stack/

http://rootsongjc.github.io/blogs/kubernetes-fluentd-elasticsearch-installation/

http://www.tothenew.com/blog/how-to-install-kubernetes-on-centos/

http://blog.csdn.net/wenwst/article/details/53908144