如下咱们描述如何使用 Elastic 技术栈来为 Kubernetes 构建监控环境。可观测性的目标是为生产环境提供运维工具来检测服务不可用的状况(好比服务宕机、错误或者响应变慢等),而且保留一些能够排查的信息,以帮助咱们定位问题。总的来讲主要包括3个方面:html
本文咱们就将在 Kubernetes 集群中使用由 ElasticSearch、Kibana、Filebeat、Metricbeat 和 APM-Server 组成的 Elastic 技术栈来监控系统环境。为了更好地去了解这些组件的配置,咱们这里将采用手写资源清单文件的方式来安装这些组件,固然咱们也可使用 Helm 等其余工具来快速安装配置。java
接下来咱们就来学习下如何使用 Elastic 技术构建 Kubernetes 监控栈。咱们这里的试验环境是 Kubernetes v1.16.3 版本的集群(已经配置完成),为方便管理,咱们将全部的资源对象都部署在一个名为 elastic 的命名空间中:node
$ kubectl create ns elastic namespace/elastic created
这里咱们先部署一个使用 SpringBoot 和 MongoDB 开发的示例应用。首先部署一个 MongoDB 应用,对应的资源清单文件以下所示:git
# mongo.yml --- apiVersion: v1 kind: Service metadata: name: mongo namespace: elastic labels: app: mongo spec: ports: - port: 27017 protocol: TCP selector: app: mongo --- apiVersion: apps/v1 kind: StatefulSet metadata: namespace: elastic name: mongo labels: app: mongo spec: serviceName: "mongo" selector: matchLabels: app: mongo template: metadata: labels: app: mongo spec: containers: - name: mongo image: mongo ports: - containerPort: 27017 volumeMounts: - name: data mountPath: /data/db volumeClaimTemplates: - metadata: name: data spec: accessModes: [ "ReadWriteOnce" ] storageClassName: rook-ceph-block # 使用支持 RWO 的 StorageClass resources: requests: storage: 1Gi
这里咱们使用了一个名为 rook-ceph-block 的 StorageClass 对象来自动建立 PV,能够替换成本身集群中支持 RWO 的 StorageClass 对象便可。存储采用rook-ceph 实践配置,直接使用上面的资源清单建立便可:github
$ kubectl apply -f mongo.yml service/mongo created statefulset.apps/mongo created $ kubectl get pods -n elastic -l app=mongo NAME READY STATUS RESTARTS AGE mongo-0 1/1 Running 0 34m
直到 Pod 变成 Running 状态证实 mongodb 部署成功了。接下来部署 SpringBoot 的 API 应用,这里咱们经过 NodePort 类型的 Service 服务来暴露该服务,对应的资源清单文件以下所示:web
# spring-boot-simple.yml --- apiVersion: v1 kind: Service metadata: namespace: elastic name: spring-boot-simple labels: app: spring-boot-simple spec: type: NodePort ports: - port: 8080 protocol: TCP selector: app: spring-boot-simple --- apiVersion: apps/v1 kind: Deployment metadata: namespace: elastic name: spring-boot-simple labels: app: spring-boot-simple spec: replicas: 1 selector: matchLabels: app: spring-boot-simple template: metadata: labels: app: spring-boot-simple spec: containers: - image: cnych/spring-boot-simple:0.0.1-SNAPSHOT name: spring-boot-simple env: - name: SPRING_DATA_MONGODB_HOST # 指定MONGODB地址 value: mongo ports: - containerPort: 8080
一样直接建立上面的应用的应用便可:spring
$ kubectl apply -f spring-boot-simple.yaml service/spring-boot-simple created deployment.apps/spring-boot-simple created $ kubectl get pods -n elastic -l app=spring-boot-simple NAME READY STATUS RESTARTS AGE spring-boot-simple-64795494bf-hqpcj 1/1 Running 0 24m $ kubectl get svc -n elastic -l app=spring-boot-simple NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE spring-boot-simple NodePort 10.109.55.134 <none> 8080:31847/TCP 84s
当应用部署完成后,咱们就能够经过地址 http://:31847 访问应用,能够经过以下命令进行简单测试:mongodb
$ curl -X GET http://k8s.qikqiak.com:31847/ Greetings from Spring Boot!
发送一个 POST 请求:docker
$ curl -X POST http://k8s.qikqiak.com:31847/message -d 'hello world' {"id":"5ef55c130d53190001bf74d2","message":"hello+world=","postedAt":"2020-06-26T02:23:15.860+0000"}
获取因此消息数据:shell
$ curl -X GET http://k8s.qikqiak.com:31847/message [{"id":"5ef55c130d53190001bf74d2","message":"hello+world=","postedAt":"2020-06-26T02:23:15.860+0000"}]
要创建一个 Elastic 技术的监控栈,固然首先咱们须要部署 ElasticSearch,它是用来存储全部的指标、日志和追踪的数据库,这里咱们经过3个不一样角色的可扩展的节点组成一个集群。
设置集群的第一个节点为 Master 主节点,来负责控制整个集群。首先建立一个 ConfigMap 对象,用来描述集群的一些配置信息,以方便将 ElasticSearch 的主节点配置到集群中并开启安全认证功能。对应的资源清单文件以下所示:
# elasticsearch-master.configmap.yaml --- apiVersion: v1 kind: ConfigMap metadata: namespace: elastic name: elasticsearch-master-config labels: app: elasticsearch role: master data: elasticsearch.yml: |- cluster.name: ${CLUSTER_NAME} node.name: ${NODE_NAME} discovery.seed_hosts: ${NODE_LIST} cluster.initial_master_nodes: ${MASTER_NODES} network.host: 0.0.0.0 node: master: true data: false ingest: false xpack.security.enabled: true xpack.monitoring.collection.enabled: true ---
而后建立一个 Service 对象,在 Master 节点下,咱们只须要经过用于集群通讯的 9300 端口进行通讯。资源清单文件以下所示:
# elasticsearch-master.service.yaml --- apiVersion: v1 kind: Service metadata: namespace: elastic name: elasticsearch-master labels: app: elasticsearch role: master spec: ports: - port: 9300 name: transport selector: app: elasticsearch role: master ---
最后使用一个 Deployment 对象来定义 Master 节点应用,资源清单文件以下所示:
# elasticsearch-master.deployment.yaml --- apiVersion: apps/v1 kind: Deployment metadata: namespace: elastic name: elasticsearch-master labels: app: elasticsearch role: master spec: replicas: 1 selector: matchLabels: app: elasticsearch role: master template: metadata: labels: app: elasticsearch role: master spec: containers: - name: elasticsearch-master image: docker.elastic.co/elasticsearch/elasticsearch:7.8.0 env: - name: CLUSTER_NAME value: elasticsearch - name: NODE_NAME value: elasticsearch-master - name: NODE_LIST value: elasticsearch-master,elasticsearch-data,elasticsearch-client - name: MASTER_NODES value: elasticsearch-master - name: "ES_JAVA_OPTS" value: "-Xms512m -Xmx512m" ports: - containerPort: 9300 name: transport volumeMounts: - name: config mountPath: /usr/share/elasticsearch/config/elasticsearch.yml readOnly: true subPath: elasticsearch.yml - name: storage mountPath: /data volumes: - name: config configMap: name: elasticsearch-master-config - name: "storage" emptyDir: medium: "" ---
直接建立上面的3个资源对象便可:
$ kubectl apply -f elasticsearch-master.configmap.yaml \ -f elasticsearch-master.service.yaml \ -f elasticsearch-master.deployment.yaml configmap/elasticsearch-master-config created service/elasticsearch-master created deployment.apps/elasticsearch-master created $ kubectl get pods -n elastic -l app=elasticsearch NAME READY STATUS RESTARTS AGE elasticsearch-master-6f666cbbd-r9vtx 1/1 Running 0 111m
直到 Pod 变成 Running 状态就代表 master 节点安装成功。
如今咱们须要安装的是集群的数据节点,它主要来负责集群的数据托管和执行查询。 和 master 节点同样,咱们使用一个 ConfigMap 对象来配置咱们的数据节点:
# elasticsearch-data.configmap.yaml --- apiVersion: v1 kind: ConfigMap metadata: namespace: elastic name: elasticsearch-data-config labels: app: elasticsearch role: data data: elasticsearch.yml: |- cluster.name: ${CLUSTER_NAME} node.name: ${NODE_NAME} discovery.seed_hosts: ${NODE_LIST} cluster.initial_master_nodes: ${MASTER_NODES} network.host: 0.0.0.0 node: master: false data: true ingest: false xpack.security.enabled: true xpack.monitoring.collection.enabled: true ---
能够看到和上面的 master 配置很是相似,不过须要注意的是属性 node.data=true。
一样只须要经过 9300 端口和其余节点进行通讯:
# elasticsearch-data.service.yaml --- apiVersion: v1 kind: Service metadata: namespace: elastic name: elasticsearch-data labels: app: elasticsearch role: data spec: ports: - port: 9300 name: transport selector: app: elasticsearch role: data ---
最后建立一个 StatefulSet 的控制器,由于可能会有多个数据节点,每个节点的数据不是同样的,须要单独存储,因此也使用了一个 volumeClaimTemplates 来分别建立存储卷,对应的资源清单文件以下所示:
# elasticsearch-data.statefulset.yaml --- apiVersion: apps/v1 kind: StatefulSet metadata: namespace: elastic name: elasticsearch-data labels: app: elasticsearch role: data spec: serviceName: "elasticsearch-data" selector: matchLabels: app: elasticsearch role: data template: metadata: labels: app: elasticsearch role: data spec: containers: - name: elasticsearch-data image: docker.elastic.co/elasticsearch/elasticsearch:7.8.0 env: - name: CLUSTER_NAME value: elasticsearch - name: NODE_NAME value: elasticsearch-data - name: NODE_LIST value: elasticsearch-master,elasticsearch-data,elasticsearch-client - name: MASTER_NODES value: elasticsearch-master - name: "ES_JAVA_OPTS" value: "-Xms1024m -Xmx1024m" ports: - containerPort: 9300 name: transport volumeMounts: - name: config mountPath: /usr/share/elasticsearch/config/elasticsearch.yml readOnly: true subPath: elasticsearch.yml - name: elasticsearch-data-persistent-storage mountPath: /data/db volumes: - name: config configMap: name: elasticsearch-data-config volumeClaimTemplates: - metadata: name: elasticsearch-data-persistent-storage spec: accessModes: [ "ReadWriteOnce" ] storageClassName: rook-ceph-block resources: requests: storage: 50Gi ---
直接建立上面的资源对象便可:
$ kubectl apply -f elasticsearch-data.configmap.yaml \ -f elasticsearch-data.service.yaml \ -f elasticsearch-data.statefulset.yaml configmap/elasticsearch-data-config created service/elasticsearch-data created statefulset.apps/elasticsearch-data created
直到 Pod 变成 Running 状态证实节点启动成功:
$ kubectl get pods -n elastic -l app=elasticsearch NAME READY STATUS RESTARTS AGE elasticsearch-data-0 1/1 Running 0 90m elasticsearch-master-6f666cbbd-r9vtx 1/1 Running 0 111m
最后来安装配置 ElasticSearch 的客户端节点,该节点主要负责暴露一个 HTTP 接口将查询数据传递给数据节点获取数据。
一样使用一个 ConfigMap 对象来配置该节点:
# elasticsearch-client.configmap.yaml --- apiVersion: v1 kind: ConfigMap metadata: namespace: elastic name: elasticsearch-client-config labels: app: elasticsearch role: client data: elasticsearch.yml: |- cluster.name: ${CLUSTER_NAME} node.name: ${NODE_NAME} discovery.seed_hosts: ${NODE_LIST} cluster.initial_master_nodes: ${MASTER_NODES} network.host: 0.0.0.0 node: master: false data: false ingest: true xpack.security.enabled: true xpack.monitoring.collection.enabled: true ---
客户端节点须要暴露两个端口,9300端口用于与集群的其余节点进行通讯,9200端口用于 HTTP API。对应的 Service 对象以下所示:
# elasticsearch-client.service.yaml --- apiVersion: v1 kind: Service metadata: namespace: elastic name: elasticsearch-client labels: app: elasticsearch role: client spec: ports: - port: 9200 name: client - port: 9300 name: transport selector: app: elasticsearch role: client ---
使用一个 Deployment 对象来描述客户端节点:
# elasticsearch-client.deployment.yaml --- apiVersion: apps/v1 kind: Deployment metadata: namespace: elastic name: elasticsearch-client labels: app: elasticsearch role: client spec: selector: matchLabels: app: elasticsearch role: client template: metadata: labels: app: elasticsearch role: client spec: containers: - name: elasticsearch-client image: docker.elastic.co/elasticsearch/elasticsearch:7.8.0 env: - name: CLUSTER_NAME value: elasticsearch - name: NODE_NAME value: elasticsearch-client - name: NODE_LIST value: elasticsearch-master,elasticsearch-data,elasticsearch-client - name: MASTER_NODES value: elasticsearch-master - name: "ES_JAVA_OPTS" value: "-Xms256m -Xmx256m" ports: - containerPort: 9200 name: client - containerPort: 9300 name: transport volumeMounts: - name: config mountPath: /usr/share/elasticsearch/config/elasticsearch.yml readOnly: true subPath: elasticsearch.yml - name: storage mountPath: /data volumes: - name: config configMap: name: elasticsearch-client-config - name: "storage" emptyDir: medium: "" ---
一样直接建立上面的资源对象来部署 client 节点:
$ kubectl apply -f elasticsearch-client.configmap.yaml \ -f elasticsearch-client.service.yaml \ -f elasticsearch-client.deployment.yaml configmap/elasticsearch-client-config created service/elasticsearch-client created deployment.apps/elasticsearch-client created
直到全部的节点都部署成功后证实集群安装成功:
$ kubectl get pods -n elastic -l app=elasticsearch NAME READY STATUS RESTARTS AGE elasticsearch-client-788bffcc98-hh2s8 1/1 Running 0 83m elasticsearch-data-0 1/1 Running 0 91m elasticsearch-master-6f666cbbd-r9vtx 1/1 Running 0 112m
能够经过以下所示的命令来查看集群的状态变化:
$ kubectl logs -f -n elastic \ $(kubectl get pods -n elastic | grep elasticsearch-master | sed -n 1p | awk '{print $1}') \ | grep "Cluster health status changed from" {"type": "server", "timestamp": "2020-06-26T03:31:21,353Z", "level": "INFO", "component": "o.e.c.r.a.AllocationService", "cluster.name": "elasticsearch", "node.name": "elasticsearch-master", "message": "Cluster health status changed from [RED] to [GREEN] (reason: [shards started [[.monitoring-es-7-2020.06.26][0]]]).", "cluster.uuid": "SS_nyhNiTDSCE6gG7z-J4w", "node.id": "BdVScO9oQByBHR5rfw-KDA" }
咱们启用了 xpack 安全模块来保护咱们的集群,因此咱们须要一个初始化的密码。咱们能够执行以下所示的命令,在客户端节点容器内运行 bin/elasticsearch-setup-passwords
命令来生成默认的用户名和密码:
$ kubectl exec $(kubectl get pods -n elastic | grep elasticsearch-client | sed -n 1p | awk '{print $1}') \ -n elastic \ -- bin/elasticsearch-setup-passwords auto -b Changed password for user apm_system PASSWORD apm_system = 3Lhx61s6woNLvoL5Bb7t Changed password for user kibana_system PASSWORD kibana_system = NpZv9Cvhq4roFCMzpja3 Changed password for user kibana PASSWORD kibana = NpZv9Cvhq4roFCMzpja3 Changed password for user logstash_system PASSWORD logstash_system = nNnGnwxu08xxbsiRGk2C Changed password for user beats_system PASSWORD beats_system = fen759y5qxyeJmqj6UPp Changed password for user remote_monitoring_user PASSWORD remote_monitoring_user = mCP77zjCATGmbcTFFgOX Changed password for user elastic PASSWORD elastic = wmxhvsJFeti2dSjbQEAH
注意须要将 elastic 用户名和密码也添加到 Kubernetes 的 Secret 对象中(后续会进行调用):
$ kubectl create secret generic elasticsearch-pw-elastic \ -n elastic \ --from-literal password=wmxhvsJFeti2dSjbQEAH secret/elasticsearch-pw-elastic created
ElasticSearch 集群安装完成后,接着咱们能够来部署 Kibana,这是 ElasticSearch 的数据可视化工具,它提供了管理 ElasticSearch 集群和可视化数据的各类功能。
一样首先咱们使用 ConfigMap 对象来提供一个文件文件,其中包括对 ElasticSearch 的访问(主机、用户名和密码),这些都是经过环境变量配置的。对应的资源清单文件以下所示:
# kibana.configmap.yaml --- apiVersion: v1 kind: ConfigMap metadata: namespace: elastic name: kibana-config labels: app: kibana data: kibana.yml: |- server.host: 0.0.0.0 elasticsearch: hosts: ${ELASTICSEARCH_HOSTS} username: ${ELASTICSEARCH_USER} password: ${ELASTICSEARCH_PASSWORD} ---
而后经过一个 NodePort 类型的服务来暴露 Kibana 服务:
# kibana.service.yaml --- apiVersion: v1 kind: Service metadata: namespace: elastic name: kibana labels: app: kibana spec: type: NodePort ports: - port: 5601 name: webinterface selector: app: kibana ---
最后经过 Deployment 来部署 Kibana 服务,因为须要经过环境变量提供密码,这里咱们使用上面建立的 Secret 对象来引用:
# kibana.deployment.yaml --- apiVersion: apps/v1 kind: Deployment metadata: namespace: elastic name: kibana labels: app: kibana spec: selector: matchLabels: app: kibana template: metadata: labels: app: kibana spec: containers: - name: kibana image: docker.elastic.co/kibana/kibana:7.8.0 ports: - containerPort: 5601 name: webinterface env: - name: ELASTICSEARCH_HOSTS value: "http://elasticsearch-client.elastic.svc.cluster.local:9200" - name: ELASTICSEARCH_USER value: "elastic" - name: ELASTICSEARCH_PASSWORD valueFrom: secretKeyRef: # 调用前面建立的secret密码文件,将密码赋值成为变量使用 name: elasticsearch-pw-elastic key: password volumeMounts: - name: config mountPath: /usr/share/kibana/config/kibana.yml readOnly: true subPath: kibana.yml volumes: - name: config configMap: name: kibana-config ---
一样直接建立上面的资源清单便可部署:
$ kubectl apply -f kibana.configmap.yaml \ -f kibana.service.yaml \ -f kibana.deployment.yaml configmap/kibana-config created service/kibana created deployment.apps/kibana created
部署成功后,能够经过查看 Pod 的日志来了解 Kibana 的状态:
$ kubectl logs -f -n elastic $(kubectl get pods -n elastic | grep kibana | sed -n 1p | awk '{print $1}') \ | grep "Status changed from yellow to green" {"type":"log","@timestamp":"2020-06-26T04:20:38Z","tags":["status","plugin:elasticsearch@7.8.0","info"],"pid":6,"state":"green","message":"Status changed from yellow to green - Ready","prevState":"yellow","prevMsg":"Waiting for Elasticsearch"}
当状态变成 green
后,咱们就能够经过 NodePort 端口 30474 去浏览器中访问 Kibana 服务了:
$ kubectl get svc kibana -n elastic NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE kibana NodePort 10.101.121.31 <none> 5601:30474/TCP 8m18s
以下图所示,使用上面咱们建立的 Secret 对象的 elastic 用户和生成的密码便可登陆:
登陆成功后会自动跳转到 Kibana 首页:
一样也能够本身建立一个新的超级用户,Management → Stack Management → Create User:
使用新的用户名和密码,选择 superuser
这个角色来建立新的用户:
建立成功后就可使用上面新建的用户登陆 Kibana,最后还能够经过 Management → Stack Monitoring 页面查看整个集群的健康状态:
到这里咱们就安装成功了 ElasticSearch 与 Kibana,它们将为咱们来存储和可视化咱们的应用数据(监控指标、日志和追踪)服务。
上面咱们已经安装配置了 ElasticSearch 的集群,接下来咱们未来使用 Metricbeat 对 Kubernetes 集群进行监控。Metricbeat 是一个服务器上的轻量级采集器,用于按期收集主机和服务的监控指标。这也是咱们构建 Kubernetes 全栈监控的第一个部分。
Metribeat 默认采集系统的指标,可是也包含了大量的其余模块来采集有关服务的指标,好比 Nginx、Kafka、MySQL、Redis 等等,支持的完整模块能够在 Elastic 官方网站上查看到 https://www.elastic.co/guide/en/beats/metricbeat/current/metricbeat-modules.html。
首先,咱们须要安装 kube-state-metrics,这个组件是一个监听 Kubernetes API 的服务,能够暴露每一个资源对象状态的相关指标数据。
要安装 kube-state-metrics 也很是简单,在对应的 GitHub 仓库下就有对应的安装资源清单文件:
$ git clone https://github.com/kubernetes/kube-state-metrics.git $ cd kube-state-metrics # 执行安装命令 $ kubectl apply -f examples/standard/ clusterrolebinding.rbac.authorization.k8s.io/kube-state-metrics configured clusterrole.rbac.authorization.k8s.io/kube-state-metrics configured deployment.apps/kube-state-metrics configured serviceaccount/kube-state-metrics configured service/kube-state-metrics configured $ kubectl get pods -n kube-system -l app.kubernetes.io/name=kube-state-metrics NAME READY STATUS RESTARTS AGE kube-state-metrics-6d7449fc78-mgf4f 1/1 Running 0 88s
当 Pod 变成 Running 状态后证实安装成功。
因为咱们须要监控全部的节点,因此咱们须要使用一个 DaemonSet 控制器来安装 Metricbeat。
首先,使用一个 ConfigMap 来配置 Metricbeat,而后经过 Volume 将该对象挂载到容器中的 /etc/metricbeat.yaml
中去。配置文件中包含了 ElasticSearch 的地址、用户名和密码,以及 Kibana 配置,咱们要启用的模块与抓取频率等信息。
# metricbeat.settings.configmap.yml --- apiVersion: v1 kind: ConfigMap metadata: namespace: elastic name: metricbeat-config labels: app: metricbeat data: metricbeat.yml: |- # 模块配置 metricbeat.modules: - module: system period: ${PERIOD} # 设置一个抓取数据的间隔 metricsets: ["cpu", "load", "memory", "network", "process", "process_summary", "core", "diskio", "socket"] processes: ['.*'] process.include_top_n: by_cpu: 5 # 根据 CPU 计算的前5个进程 by_memory: 5 # 根据内存计算的前5个进程 - module: system period: ${PERIOD} metricsets: ["filesystem", "fsstat"] processors: - drop_event.when.regexp: # 排除一些系统目录的监控 system.filesystem.mount_point: '^/(sys|cgroup|proc|dev|etc|host|lib)($|/)' - module: docker # 抓取docker应用,可是不支持containerd period: ${PERIOD} hosts: ["unix:///var/run/docker.sock"] metricsets: ["container", "cpu", "diskio", "healthcheck", "info", "memory", "network"] - module: kubernetes # 抓取 kubelet 监控指标 period: ${PERIOD} node: ${NODE_NAME} hosts: ["https://${NODE_NAME}:10250"] # 链接kubelet的监控端口,若是须要监控api-server/controller-manager等其余组件的监控,也须要链接端口 metricsets: ["node", "system", "pod", "container", "volume"] bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token ssl.verification_mode: "none" - module: kubernetes # 抓取 kube-state-metrics 数据 period: ${PERIOD} node: ${NODE_NAME} metricsets: ["state_node", "state_deployment", "state_replicaset", "state_pod", "state_container"] hosts: ["kube-state-metrics.kube-system.svc.cluster.local:8080"] # 根据 k8s deployment 配置具体的服务模块mongo metricbeat.autodiscover: providers: - type: kubernetes node: ${NODE_NAME} templates: - condition.equals: kubernetes.labels.app: mongo config: - module: mongodb period: ${PERIOD} hosts: ["mongo.elastic:27017"] metricsets: ["dbstats", "status", "collstats", "metrics", "replstatus"] # ElasticSearch 链接配置 output.elasticsearch: hosts: ['${ELASTICSEARCH_HOST:elasticsearch}:${ELASTICSEARCH_PORT:9200}'] username: ${ELASTICSEARCH_USERNAME} password: ${ELASTICSEARCH_PASSWORD} # 链接到 Kibana setup.kibana: host: '${KIBANA_HOST:kibana}:${KIBANA_PORT:5601}' # 导入已经存在的 Dashboard setup.dashboards.enabled: true # 配置 indice 生命周期 setup.ilm: policy_file: /etc/indice-lifecycle.json ---
ElasticSearch 的 indice 生命周期表示一组规则,能够根据 indice 的大小或者时长应用到你的 indice 上。好比能够天天或者每次超过 1GB 大小的时候对 indice 进行轮转,咱们也能够根据规则配置不一样的阶段。因为监控会产生大量的数据,颇有可能一天就超过几十G的数据,因此为了防止大量的数据存储,咱们能够利用 indice 的生命周期来配置数据保留,这个在 Prometheus 中也有相似的操做。 以下所示的文件中,咱们配置成天天或每次超过5GB的时候就对 indice 进行轮转,并删除全部超过10天的 indice 文件,咱们这里只保留10天监控数据彻底足够了。
# metricbeat.indice-lifecycle.configmap.yml --- apiVersion: v1 kind: ConfigMap metadata: namespace: elastic name: metricbeat-indice-lifecycle labels: app: metricbeat data: indice-lifecycle.json: |- { "policy": { "phases": { "hot": { "actions": { "rollover": { "max_size": "5GB" , "max_age": "1d" } } }, "delete": { "min_age": "10d", "actions": { "delete": {} } } } } } ---
接下来就能够来编写 Metricbeat 的 DaemonSet 资源对象清单,以下所示:
# metricbeat.daemonset.yml --- apiVersion: apps/v1 kind: DaemonSet metadata: namespace: elastic name: metricbeat labels: app: metricbeat spec: selector: matchLabels: app: metricbeat template: metadata: labels: app: metricbeat spec: serviceAccountName: metricbeat terminationGracePeriodSeconds: 30 hostNetwork: true dnsPolicy: ClusterFirstWithHostNet containers: - name: metricbeat image: docker.elastic.co/beats/metricbeat:7.8.0 args: [ "-c", "/etc/metricbeat.yml", "-e", "-system.hostfs=/hostfs" ] env: - name: ELASTICSEARCH_HOST value: elasticsearch-client.elastic.svc.cluster.local - name: ELASTICSEARCH_PORT value: "9200" - name: ELASTICSEARCH_USERNAME value: elastic - name: ELASTICSEARCH_PASSWORD valueFrom: secretKeyRef: # 调用前面建立的secret密码文件 name: elasticsearch-pw-elastic key: password - name: KIBANA_HOST value: kibana.elastic.svc.cluster.local - name: KIBANA_PORT value: "5601" - name: NODE_NAME valueFrom: fieldRef: fieldPath: spec.nodeName - name: PERIOD value: "10s" securityContext: runAsUser: 0 resources: limits: memory: 200Mi requests: cpu: 100m memory: 100Mi volumeMounts: - name: config mountPath: /etc/metricbeat.yml readOnly: true subPath: metricbeat.yml - name: indice-lifecycle mountPath: /etc/indice-lifecycle.json readOnly: true subPath: indice-lifecycle.json - name: dockersock mountPath: /var/run/docker.sock - name: proc mountPath: /hostfs/proc readOnly: true - name: cgroup mountPath: /hostfs/sys/fs/cgroup readOnly: true volumes: - name: proc hostPath: path: /proc - name: cgroup hostPath: path: /sys/fs/cgroup - name: dockersock hostPath: path: /var/run/docker.sock - name: config configMap: defaultMode: 0600 name: metricbeat-config - name: indice-lifecycle configMap: defaultMode: 0600 name: metricbeat-indice-lifecycle - name: data hostPath: path: /var/lib/metricbeat-data type: DirectoryOrCreate ---
须要注意的将上面的两个 ConfigMap 挂载到容器中去,因为须要 Metricbeat 获取宿主机的相关信息,因此咱们这里也挂载了一些宿主机的文件到容器中去,好比 proc
目录,cgroup
目录以及 dockersock
文件。
因为 Metricbeat 须要去获取 Kubernetes 集群的资源对象信息,因此一样须要对应的 RBAC 权限声明,因为是全局做用域的,因此这里咱们使用 ClusterRole 进行声明:
# metricbeat.permissions.yml --- apiVersion: rbac.authorization.k8s.io/v1beta1 kind: ClusterRoleBinding metadata: name: metricbeat subjects: - kind: ServiceAccount name: metricbeat namespace: elastic roleRef: kind: ClusterRole name: metricbeat apiGroup: rbac.authorization.k8s.io --- apiVersion: rbac.authorization.k8s.io/v1beta1 kind: ClusterRole metadata: name: metricbeat labels: app: metricbeat rules: - apiGroups: [""] resources: - nodes - namespaces - events - pods verbs: ["get", "list", "watch"] - apiGroups: ["extensions"] resources: - replicasets verbs: ["get", "list", "watch"] - apiGroups: ["apps"] resources: - statefulsets - deployments - replicasets verbs: ["get", "list", "watch"] - apiGroups: - "" resources: - nodes/stats verbs: - get --- apiVersion: v1 kind: ServiceAccount metadata: namespace: elastic name: metricbeat labels: app: metricbeat ---
直接建立上面的几个资源对象便可:
$ kubectl apply -f metricbeat.settings.configmap.yml \ -f metricbeat.indice-lifecycle.configmap.yml \ -f metricbeat.daemonset.yml \ -f metricbeat.permissions.yml configmap/metricbeat-config configured configmap/metricbeat-indice-lifecycle configured daemonset.extensions/metricbeat created clusterrolebinding.rbac.authorization.k8s.io/metricbeat created clusterrole.rbac.authorization.k8s.io/metricbeat created serviceaccount/metricbeat created $ kubectl get pods -n elastic -l app=metricbeat NAME READY STATUS RESTARTS AGE metricbeat-2gstq 1/1 Running 0 18m metricbeat-99rdb 1/1 Running 0 18m metricbeat-9bb27 1/1 Running 0 18m metricbeat-cgbrg 1/1 Running 0 18m metricbeat-l2csd 1/1 Running 0 18m metricbeat-lsrgv 1/1 Running 0 18m
当 Metricbeat 的 Pod 变成 Running 状态后,正常咱们就能够在 Kibana 中去查看对应的监控信息了。
在 Kibana 左侧页面 Observability → Metrics 进入指标监控页面,正常就能够看到一些监控数据了:
也能够根据本身的需求进行筛选,好比咱们能够按照 Kubernetes Namespace 进行分组做为视图查看监控信息:
因为咱们在配置文件中设置了属性 setup.dashboards.enabled=true,因此 Kibana 会导入预先已经存在的一些 Dashboard。咱们能够在左侧菜单进入 Kibana → Dashboard 页面,咱们会看到一个大约有 50 个 Metricbeat 的 Dashboard 列表,咱们能够根据须要筛选 Dashboard,好比咱们要查看集群节点的信息,能够查看 [Metricbeat Kubernetes] Overview ECS
这个 Dashboard:
咱们还单独启用了 mongodb 模块,咱们可使用 [Metricbeat MongoDB] Overview ECS 这个 Dashboard 来查看监控信息:
咱们还启用了 docker 这个模块,也可使用 [Metricbeat Docker] Overview ECS 这个 Dashboard 来查看监控信息:
到这里咱们就完成了使用 Metricbeat 来监控 Kubernetes 集群信息,在下面咱们学习如何使用 Filebeat 来收集日志以监控 Kubernetes 集群。
咱们将要安装配置 Filebeat 来收集 Kubernetes 集群中的日志数据,而后发送到 ElasticSearch 去中,Filebeat 是一个轻量级的日志采集代理,还能够配置特定的模块来解析和可视化应用(好比数据库、Nginx 等)的日志格式。
和 Metricbeat 相似,Filebeat 也须要一个配置文件来设置和 ElasticSearch 的连接信息、和 Kibana 的链接已经日志采集和解析的方式。
以下所示的 ConfigMap 资源对象就是咱们这里用于日志采集的配置信息(能够从官方网站上获取完整的可配置信息):
# filebeat.settings.configmap.yml --- apiVersion: v1 kind: ConfigMap metadata: namespace: elastic name: filebeat-config labels: app: filebeat data: filebeat.yml: |- filebeat.inputs: - type: container enabled: true paths: - /var/log/containers/*.log processors: - add_kubernetes_metadata: in_cluster: true host: ${NODE_NAME} matchers: - logs_path: logs_path: "/var/log/containers/" filebeat.autodiscover: providers: - type: kubernetes templates: - condition.equals: kubernetes.labels.app: mongo config: - module: mongodb enabled: true log: input: type: docker containers.ids: - ${data.kubernetes.container.id} processors: - drop_event: when.or: - and: - regexp: message: '^\d+\.\d+\.\d+\.\d+ ' - equals: fileset.name: error - and: - not: regexp: message: '^\d+\.\d+\.\d+\.\d+ ' - equals: fileset.name: access - add_cloud_metadata: - add_kubernetes_metadata: matchers: - logs_path: logs_path: "/var/log/containers/" - add_docker_metadata: output.elasticsearch: hosts: ['${ELASTICSEARCH_HOST:elasticsearch}:${ELASTICSEARCH_PORT:9200}'] username: ${ELASTICSEARCH_USERNAME} password: ${ELASTICSEARCH_PASSWORD} setup.kibana: host: '${KIBANA_HOST:kibana}:${KIBANA_PORT:5601}' setup.dashboards.enabled: true setup.template.enabled: true setup.ilm: policy_file: /etc/indice-lifecycle.json ---
咱们配置采集 /var/log/containers/
下面的全部日志数据,而且使用 inCluster
的模式访问 Kubernetes 的 APIServer,获取日志数据的 Meta 信息,将日志直接发送到 Elasticsearch。
此外还经过 policy_file
定义了 indice 的回收策略:
# filebeat.indice-lifecycle.configmap.yml --- apiVersion: v1 kind: ConfigMap metadata: namespace: elastic name: filebeat-indice-lifecycle labels: app: filebeat data: indice-lifecycle.json: |- { "policy": { "phases": { "hot": { "actions": { "rollover": { "max_size": "5GB" , "max_age": "1d" } } }, "delete": { "min_age": "30d", "actions": { "delete": {} } } } } } ---
一样为了采集每一个节点上的日志数据,咱们这里使用一个 DaemonSet 控制器,使用上面的配置来采集节点的日志。
#filebeat.daemonset.yml --- apiVersion: apps/v1 kind: DaemonSet metadata: namespace: elastic name: filebeat labels: app: filebeat spec: selector: matchLabels: app: filebeat template: metadata: labels: app: filebeat spec: serviceAccountName: filebeat terminationGracePeriodSeconds: 30 containers: - name: filebeat image: docker.elastic.co/beats/filebeat:7.8.0 args: [ "-c", "/etc/filebeat.yml", "-e", ] env: - name: ELASTICSEARCH_HOST value: elasticsearch-client.elastic.svc.cluster.local - name: ELASTICSEARCH_PORT value: "9200" - name: ELASTICSEARCH_USERNAME value: elastic - name: ELASTICSEARCH_PASSWORD valueFrom: secretKeyRef: name: elasticsearch-pw-elastic key: password - name: KIBANA_HOST value: kibana.elastic.svc.cluster.local - name: KIBANA_PORT value: "5601" - name: NODE_NAME valueFrom: fieldRef: fieldPath: spec.nodeName securityContext: runAsUser: 0 resources: limits: memory: 200Mi requests: cpu: 100m memory: 100Mi volumeMounts: - name: config mountPath: /etc/filebeat.yml readOnly: true subPath: filebeat.yml - name: filebeat-indice-lifecycle mountPath: /etc/indice-lifecycle.json readOnly: true subPath: indice-lifecycle.json - name: data mountPath: /usr/share/filebeat/data - name: varlog mountPath: /var/log readOnly: true - name: varlibdockercontainers mountPath: /var/lib/docker/containers readOnly: true - name: dockersock mountPath: /var/run/docker.sock volumes: - name: config configMap: defaultMode: 0600 name: filebeat-config - name: filebeat-indice-lifecycle configMap: defaultMode: 0600 name: filebeat-indice-lifecycle - name: varlog hostPath: path: /var/log - name: varlibdockercontainers hostPath: path: /var/lib/docker/containers - name: dockersock hostPath: path: /var/run/docker.sock - name: data hostPath: path: /var/lib/filebeat-data type: DirectoryOrCreate ---
咱们这里使用的是 Kubeadm 搭建的集群,默认 Master 节点是有污点的,因此若是还想采集 Master 节点的日志,还必须加上对应的容忍,我这里不采集就没有添加容忍了。 此外因为须要获取日志在 Kubernetes 集群中的 Meta 信息,好比 Pod 名称、所在的命名空间等,因此 Filebeat 须要访问 APIServer,天然就须要对应的 RBAC 权限了,因此还须要进行权限声明:
# filebeat.permission.yml --- apiVersion: rbac.authorization.k8s.io/v1beta1 kind: ClusterRoleBinding metadata: name: filebeat subjects: - kind: ServiceAccount name: filebeat namespace: elastic roleRef: kind: ClusterRole name: filebeat apiGroup: rbac.authorization.k8s.io --- apiVersion: rbac.authorization.k8s.io/v1beta1 kind: ClusterRole metadata: name: filebeat labels: app: filebeat rules: - apiGroups: [""] resources: - namespaces - pods verbs: - get - watch - list --- apiVersion: v1 kind: ServiceAccount metadata: namespace: elastic name: filebeat labels: app: filebeat ---
而后直接安装部署上面的几个资源对象便可:
$ kubectl apply -f filebeat.settings.configmap.yml \ -f filebeat.indice-lifecycle.configmap.yml \ -f filebeat.daemonset.yml \ -f filebeat.permissions.yml configmap/filebeat-config created configmap/filebeat-indice-lifecycle created daemonset.apps/filebeat created clusterrolebinding.rbac.authorization.k8s.io/filebeat created clusterrole.rbac.authorization.k8s.io/filebeat created serviceaccount/filebeat created
当全部的 Filebeat 和 Logstash 的 Pod 都变成 Running 状态后,证实部署完成。如今咱们就能够进入到 Kibana 页面中去查看日志了。左侧菜单 Observability → Logs
此外还能够从上节咱们提到的 Metrics 页面进入查看 Pod 的日志:
点击 Kubernetes Pod logs
获取须要查看的 Pod 日志:
若是集群中要采集的日志数据量太大,直接将数据发送给 ElasticSearch,对 ES 压力比较大,这种状况通常能够加一个相似于 Kafka 这样的中间件来缓冲下,或者经过 Logstash 来收集 Filebeat 的日志。
这里咱们就完成了使用 Filebeat 采集 Kubernetes 集群的日志,在下篇文章中,咱们继续学习如何使用 Elastic APM 来追踪 Kubernetes 集群应用。
Elastic APM 是 Elastic Stack 上用于应用性能监控的工具,它容许咱们经过收集传入请求、数据库查询、缓存调用等方式来实时监控应用性能。这可让咱们更加轻松快速定位性能问题。
Elastic APM 是兼容 OpenTracing 的,因此咱们可使用大量现有的库来跟踪应用程序性能。
好比咱们能够在一个分布式环境(微服务架构)中跟踪一个请求,并轻松找到可能潜在的性能瓶颈。
Elastic APM 经过一个名为 APM-Server 的组件提供服务,用于收集并向 ElasticSearch 以及和应用一块儿运行的 agent 程序发送追踪数据。
首先咱们须要在 Kubernetes 集群上安装 APM-Server 来收集 agent 的追踪数据,并转发给 ElasticSearch,这里一样咱们使用一个 ConfigMap 来配置:
# apm.configmap.yml --- apiVersion: v1 kind: ConfigMap metadata: namespace: elastic name: apm-server-config labels: app: apm-server data: apm-server.yml: |- apm-server: host: "0.0.0.0:8200" output.elasticsearch: hosts: ['${ELASTICSEARCH_HOST:elasticsearch}:${ELASTICSEARCH_PORT:9200}'] username: ${ELASTICSEARCH_USERNAME} password: ${ELASTICSEARCH_PASSWORD} setup.kibana: host: '${KIBANA_HOST:kibana}:${KIBANA_PORT:5601}' ---
APM-Server 须要暴露 8200 端口来让 agent 转发他们的追踪数据,新建一个对应的 Service 对象便可:
# apm.service.yml --- apiVersion: v1 kind: Service metadata: namespace: elastic name: apm-server labels: app: apm-server spec: ports: - port: 8200 name: apm-server selector: app: apm-server ---
而后使用一个 Deployment 资源对象管理便可:
# apm.deployment.yml --- apiVersion: apps/v1 kind: Deployment metadata: namespace: elastic name: apm-server labels: app: apm-server spec: replicas: 1 selector: matchLabels: app: apm-server template: metadata: labels: app: apm-server spec: containers: - name: apm-server image: docker.elastic.co/apm/apm-server:7.8.0 env: - name: ELASTICSEARCH_HOST value: elasticsearch-client.elastic.svc.cluster.local - name: ELASTICSEARCH_PORT value: "9200" - name: ELASTICSEARCH_USERNAME value: elastic - name: ELASTICSEARCH_PASSWORD valueFrom: secretKeyRef: name: elasticsearch-pw-elastic key: password - name: KIBANA_HOST value: kibana.elastic.svc.cluster.local - name: KIBANA_PORT value: "5601" ports: - containerPort: 8200 name: apm-server volumeMounts: - name: config mountPath: /usr/share/apm-server/apm-server.yml readOnly: true subPath: apm-server.yml volumes: - name: config configMap: name: apm-server-config ---
直接部署上面的几个资源对象:
$ kubectl apply -f apm.configmap.yml \ -f apm.service.yml \ -f apm.deployment.yml configmap/apm-server-config created service/apm-server created deployment.extensions/apm-server created
当 Pod 处于 Running 状态证实运行成功:
$ kubectl get pods -n elastic -l app=apm-server NAME READY STATUS RESTARTS AGE apm-server-667bfc5cff-zj8nq 1/1 Running 0 12m
接下来咱们能够在第一节中部署的 Spring-Boot 应用上安装一个 agent 应用。
接下来咱们在示例应用程序 spring-boot-simple 上配置一个 Elastic APM Java agent。 首先咱们须要把 elastic-apm-agent-1.8.0.jar 这个 jar 包程序内置到应用容器中去,在构建镜像的 Dockerfile 文件中添加一行以下所示的命令直接下载该 JAR 包便可:
RUN wget -O /apm-agent.jar https://search.maven.org/remotecontent?filepath=co/elastic/apm/elastic-apm-agent/1.8.0/elastic-apm-agent-1.8.0.jar
完整的 Dockerfile 文件以下所示:
FROM openjdk:8-jdk-alpine ENV ELASTIC_APM_VERSION "1.8.0" RUN wget -O /apm-agent.jar https://search.maven.org/remotecontent?filepath=co/elastic/apm/elastic-apm-agent/$ELASTIC_APM_VERSION/elastic-apm-agent-$ELASTIC_APM_VERSION.jar COPY target/spring-boot-simple.jar /app.jar CMD java -jar /app.jar
而后须要在示例应用中添加上以下依赖关系,这样咱们就能够集成 open-tracing 的依赖库或者使用 Elastic APM API 手动检测。
<dependency> <groupId>co.elastic.apm</groupId> <artifactId>apm-agent-api</artifactId> <version>${elastic-apm.version}</version> </dependency> <dependency> <groupId>co.elastic.apm</groupId> <artifactId>apm-opentracing</artifactId> <version>${elastic-apm.version}</version> </dependency> <dependency> <groupId>io.opentracing.contrib</groupId> <artifactId>opentracing-spring-cloud-mongo-starter</artifactId> <version>${opentracing-spring-cloud.version}</version> </dependency>
而后须要修改第一篇文章中使用 Deployment 部署的 Spring-Boot 应用,须要开启 Java agent 而且要链接到 APM-Server。
# spring-boot-simple.deployment.yml --- apiVersion: apps/v1 kind: Deployment metadata: namespace: elastic name: spring-boot-simple labels: app: spring-boot-simple spec: selector: matchLabels: app: spring-boot-simple template: metadata: labels: app: spring-boot-simple spec: containers: - image: cnych/spring-boot-simple:0.0.1-SNAPSHOT imagePullPolicy: Always name: spring-boot-simple command: - "java" - "-javaagent:/apm-agent.jar" - "-Delastic.apm.active=$(ELASTIC_APM_ACTIVE)" - "-Delastic.apm.server_urls=$(ELASTIC_APM_SERVER)" - "-Delastic.apm.service_name=spring-boot-simple" - "-jar" - "app.jar" env: - name: SPRING_DATA_MONGODB_HOST value: mongo - name: ELASTIC_APM_ACTIVE value: "true" - name: ELASTIC_APM_SERVER value: http://apm-server.elastic.svc.cluster.local:8200 ports: - containerPort: 8080 ---
而后从新部署上面的示例应用:
$ kubectl apply -f spring-boot-simple.yml $ kubectl get pods -n elastic -l app=spring-boot-simple NAME READY STATUS RESTARTS AGE spring-boot-simple-fb5564885-tf68d 1/1 Running 0 5m11s $ kubectl get svc -n elastic -l app=spring-boot-simple NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE spring-boot-simple NodePort 10.109.55.134 <none> 8080:31847/TCP 9d
当示例应用从新部署完成后,执行以下几个请求:
get messages
获取全部发布的 messages 数据:
$ curl -X GET http://k8s.qikqiak.com:31847/message
get messages (慢请求)
使用 sleep=<ms>
来模拟慢请求:
$ curl -X GET http://k8s.qikqiak.com:31847/message?sleep=3000
get messages (error)
使用 error=true 来触发一异常:
$ curl -X GET http://k8s.qikqiak.com:31847/message?error=true
如今咱们去到 Kibana 页面中路由到 APM 页面,咱们应该就能够看到 spring-boot-simple 应用的数据了。
点击应用就能够查看到当前应用的各类性能追踪数据:
能够查看如今的错误数据:
还能够查看 JVM 的监控数据:
除此以外,咱们还能够添加报警信息,就能够在第一时间掌握应用的性能情况了。
到这里咱们就完成了使用 Elastic Stack 进行 Kubernetes 环境的全栈监控,经过监控指标、日志、性能追踪来了解咱们的应用各方面运行状况,加快咱们排查和解决各类问题。
关于kibana调取secret密码时,登入kibana内查看密码变量发现变量是一个乱码值,这个目前只在变量挂入kibana容器中发现。
解决办法:将容器变量调用设置成密码
es 自动生成索引时,使用索引模板,生成默认tag 过多,能够经过修改索引模板的方法来进行减小索引创建