此系列文章一共分为三部分,分为filebeat部分,logstash部分,es部分。这里会按照天天几百亿条的数据量来考虑,去设计、部署、优化这个日志系统,来最大限度的利用资源,并达到一个最优的性能。本篇主要讲解
filebeat
这一块javascript
版本:filebeat-7.12.0java
这是关于k8s的日志采集,部署方式是采用DaemonSet
的方式,采集时按照k8s集群的namespace
进行分类,而后根据namespace
的名称建立不一样的topic
到kafka中node
通常状况下,容器中的日志在输出到标准输出(stdout)时,会以*-json.log
的命名方式保存在/var/lib/docker/containers
目录中,固然若是修改了docker
的数据目录,那就是在修改后的数据目录中了,例如:linux
# tree /data/docker/containers
/data/docker/containers
├── 009227c00e48b051b6f5cb65128fd58412b845e0c6d2bec5904f977ef0ec604d
│ ├── 009227c00e48b051b6f5cb65128fd58412b845e0c6d2bec5904f977ef0ec604d-json.log
│ ├── checkpoints
│ ├── config.v2.json
│ ├── hostconfig.json
│ └── mounts
复制代码
这里能看到,有这么个文件: /data/docker/containers/container id/*-json.log
,而后k8s默认会在/var/log/containers
和/var/log/pods
目录中会生成这些日志文件的软链接,以下所示:git
cattle-node-agent-tvhlq_cattle-system_agent-8accba2d42cbc907a412be9ea3a628a90624fb8ef0b9aa2bc6ff10eab21cf702.log
etcd-k8s-master01_kube-system_etcd-248e250c64d89ee6b03e4ca28ba364385a443cc220af2863014b923e7f982800.log
复制代码
而后,会看到这个目录下存在了此宿主机上的全部容器日志,文件的命名方式为:github
[podName]_[nameSpace]_[depoymentName]-[containerId].log
复制代码
上面这个是deployment
的命名方式,其余的会有些不一样,例如:DaemonSet
,StatefulSet
等,不过全部的都有一个共同点,就是docker
*_[nameSpace]_*.log
复制代码
到这里,知道这个特性,就能够往下来看Filebeat
的部署和配置了。json
部署采用的DaemonSet
方式进行,这里没有啥可说的,参照官方文档直接部署便可api
---
apiVersion: v1
data:
filebeat.yml: |- filebeat.inputs: - type: container enabled: true paths: - /var/log/containers/*_test-1_*log fields: namespace: test-1 env: dev k8s: cluster-dev - type: container enabled: true paths: - /var/log/containers/*_test-2_*log fields: namespace: test-2 env: dev k8s: cluster-dev filebeat.config.modules: path: ${path.config}/modules.d/*.yml reload.enabled: false output.kafka: hosts: ["10.0.105.74:9092","10.0.105.76:9092","10.0.105.96:9092"] topic: '%{[fields.k8s]}-%{[fields.namespace]}' partition.round_robin: reachable_only: true kind: ConfigMap
metadata:
name: filebeat-daemonset-config-test
namespace: default
---
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: filebeat
namespace: kube-system
labels:
k8s-app: filebeat
spec:
selector:
matchLabels:
k8s-app: filebeat
template:
metadata:
labels:
k8s-app: filebeat
spec:
serviceAccountName: filebeat
terminationGracePeriodSeconds: 30
hostNetwork: true
dnsPolicy: ClusterFirstWithHostNet
containers:
- name: filebeat
image: docker.elastic.co/beats/filebeat:7.12.0
args: [
"-c", "/etc/filebeat.yml",
"-e",
]
env:
- name: NODE_NAME
valueFrom:
fieldRef:
fieldPath: spec.nodeName
securityContext:
runAsUser: 0
# If using Red Hat OpenShift uncomment this:
#privileged: true
resources:
limits:
memory: 200Mi
requests:
cpu: 100m
memory: 100Mi
volumeMounts:
- name: config
mountPath: /etc/filebeat.yml
readOnly: true
subPath: filebeat.yml
- name: data
mountPath: /usr/share/filebeat/data
- name: varlibdockercontainers
mountPath: /var/lib/docker/containers
readOnly: true
- name: varlog
mountPath: /var/log
readOnly: true
volumes:
- name: config
configMap:
defaultMode: 0640
name: filebeat-config
- name: varlibdockercontainers
hostPath:
path: /var/lib/docker/containers
- name: varlog
hostPath:
path: /var/log
# data folder stores a registry of read status for all files, so we don't send everything again on a Filebeat pod restart
- name: data
hostPath:
# When filebeat runs as non-root user, this directory needs to be writable by group (g+w).
path: /var/lib/filebeat-data
type: DirectoryOrCreate
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: filebeat
subjects:
- kind: ServiceAccount
name: filebeat
namespace: kube-system
roleRef:
kind: ClusterRole
name: filebeat
apiGroup: rbac.authorization.k8s.io
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: filebeat
labels:
k8s-app: filebeat
rules:
- apiGroups: [""] # "" indicates the core API group
resources:
- namespaces
- pods
- nodes
verbs:
- get
- watch
- list
- apiGroups: ["apps"]
resources:
- replicasets
verbs: ["get", "list", "watch"]
---
apiVersion: v1
kind: ServiceAccount
metadata:
name: filebeat
namespace: kube-system
labels:
k8s-app: filebeat
复制代码
启动的话直接kubectl apply -f
启动便可,部署不是本篇的重点,这里不作过多介绍。bash
官方部署参考:raw.githubusercontent.com/elastic/bea…
这里先简单介绍下filebeat
的配置结构
filebeat.inputs:
filebeat.config.modules:
processors:
output.xxxxx:
复制代码
结构大概是这么个结构,完整的数据流向简单来讲就是下面这个图:
前面也说了,我是根据根据命名空间作分类,每个命名空间就是一个topic,若是要收集多个集群,一样也是使用命名空间作分类,只不过topic的命名就须要加个k8s的集群名,这样方便去区分了,那既然是经过命名空间来获取日志,那么在配置inputs
的时候就须要经过写正则将指定命名空间下的日志文件取出,而后读取,例如:
filebeat.inputs:
- type: container
enabled: true
paths:
- /var/log/containers/*_test-1_*log
fields:
namespace: test-1
env: dev
k8s: cluster-dev
复制代码
这里个人命名空间为bim5d-basic
,而后经过正则*_test-1_*log
来获取带有此命名空间名的日志文件,随后又加了个自定义字段,方便下面建立topic
时使用。
这里是写了一个命名空间,若是有多个,就排开写就好了,以下所示:
filebeat.inputs:
- type: container
enabled: true
paths:
- /var/log/containers/*_test-1_*log
fields:
namespace: test-1
env: dev
k8s: cluster-dev
- type: container
enabled: true
paths:
- /var/log/containers/*_test-2_*log
fields:
namespace: test-2
env: dev
k8s: cluster-dev
复制代码
这种写法有一个很差的地方就是,若是命名空间比较多,那么整个配置就比较多,不要着急,文章下面有更简洁的写法
注意: 日志的类型,要设置成
container
上面说了经过命名空间建立topic
,我这里加了一个自定义的字段namespace
,就是后面的topic
的名称,可是这里有不少的命名空间,那在输出的时候,如何动态去建立呢?
output.kafka:
hosts: ["10.0.105.74:9092","10.0.105.76:9092","10.0.105.96:9092"]
topic: '%{[fields.namespace]}'
partition.round_robin:
reachable_only: true
复制代码
注意这里的写法:%{[fields.namespace]}
那么完整的配置以下所示:
apiVersion: v1
data:
filebeat.yml: |- filebeat.inputs: - type: container enabled: true paths: - /var/log/containers/*_test-1_*log fields: namespace: test-1 env: dev k8s: cluster-dev - type: container enabled: true paths: - /var/log/containers/*_test-2_*log fields: namespace: test-2 env: dev k8s: cluster-dev filebeat.config.modules: path: ${path.config}/modules.d/*.yml reload.enabled: false output.kafka: hosts: ["10.0.105.74:9092","10.0.105.76:9092","10.0.105.96:9092"] topic: '%{[fields.k8s]}-%{[fields.namespace]}' partition.round_robin: reachable_only: true kind: ConfigMap
metadata:
name: filebeat-daemonset-config-test
namespace: default
复制代码
若是是不对日志作任何处理,到这里就结束了,可是这样又视乎在查看日志的时候少了点什么? 没错!到这里你仅仅知道日志内容,和该日志来自于哪一个命名空间,可是你不知道该日志属于哪一个服务,哪一个pod,甚至说想查看该服务的镜像地址等,可是这些信息在咱们上面的配置方式中是没有的,因此须要进一步的添砖加瓦。
这个时候就用到了一个配置项,叫作: processors
, 看下官方的解释
You can define processors in your configuration to process events before they are sent to the configured output
简单来讲就是处理日志
下面来重点讲一下这个地方,很是有用和重要
在采集k8s的日志时,若是按照上面那种配置方式,是没有关于pod的一些信息的,例如:
那么若是想添加这些信息,就要使用processors
中的一个工具,叫作: add_kubernetes_metadata
, 字面意思就是添加k8s的一些元数据信息,使用方法能够先来看一段示例:
processors:
- add_kubernetes_metadata:
host: ${NODE_NAME}
matchers:
- logs_path:
logs_path: "/var/log/containers/"
复制代码
host
: 指定要对filebeat起做用的节点,防止没法准确检测到它,好比在主机网络模式下运行filebeat
matchers
: 匹配器用于构造与索引建立的标识符相匹配的查找键
logs_path
: 容器日志的基本路径,若是未指定,则使用Filebeat运行的平台的默认日志路径
加上这个k8s的元数据信息以后,就能够在日志里面看到k8s的信息了,看一下添加k8s信息后的日志格式:
{
"@timestamp": "2021-04-19T07:07:36.065Z",
"@metadata": {
"beat": "filebeat",
"type": "_doc",
"version": "7.11.2"
},
"log": {
"offset": 377708,
"file": {
"path": "/var/log/containers/test-server-85545c868b-6nsvc_test-1_test-server-885412c0a8af6bfa7b3d7a341c3a9cb79a85986965e363e87529b31cb650aec4.log"
}
},
"fields": {
"env": "dev",
"k8s": "cluster-dev"
"namespace": "test-1"
},
"host": {
"name": "filebeat-fv484"
},
"agent": {
"id": "7afbca43-3ec1-4cee-b5cb-1de1e955b717",
"name": "filebeat-fv484",
"type": "filebeat",
"version": "7.11.2",
"hostname": "filebeat-fv484",
"ephemeral_id": "8fd29dee-da50-4c88-88d5-ebb6bbf20772"
},
"ecs": {
"version": "1.6.0"
},
"stream": "stdout",
"message": "2021-04-19 15:07:36.065 INFO 23 --- [trap-executor-0] c.n.d.s.r.aws.ConfigClusterResolver : Resolving eureka endpoints via configuration",
"input": {
"type": "container"
},
"container": {
"image": {
"name": "hub.test.com/test/test-server:3.3.1-ent-release-SNAPSHOT.20210402191241_87c9b1f841c"
},
"id": "885412c0a8af6bfa7b3d7a341c3a9cb79a85986965e363e87529b31cb650aec4",
"runtime": "docker"
},
"kubernetes": {
"labels": {
"pod-template-hash": "85545c868b",
"app": "geip-gateway-test"
},
"container": {
"name": "test-server",
"image": "hub.test.com/test/test-server:3.3.1-ent-release-SNAPSHOT.20210402191241_87c9b1f841c"
},
"node": {
"uid": "511d9dc1-a84e-4948-b6c8-26d3f1ba2e61",
"labels": {
"kubernetes_io/hostname": "k8s-node-09",
"kubernetes_io/os": "linux",
"beta_kubernetes_io/arch": "amd64",
"beta_kubernetes_io/os": "linux",
"cloudt-global": "true",
"kubernetes_io/arch": "amd64"
},
"hostname": "k8s-node-09",
"name": "k8s-node-09"
},
"namespace_uid": "4fbea846-44b8-4d4a-b03b-56e43cff2754",
"namespace_labels": {
"field_cattle_io/projectId": "p-lgxhz",
"cattle_io/creator": "norman"
},
"pod": {
"name": "test-server-85545c868b-6nsvc",
"uid": "1e678b63-fb3c-40b5-8aad-892596c5bd4d"
},
"namespace": "test-1",
"replicaset": {
"name": "test-server-85545c868b"
}
}
}
复制代码
能够看到kubernetes这个key的value有关于pod的信息,还有node的一些信息,还有namespace信息等,基本上关于k8s的一些关键信息都包含了,很是的多和全。
可是,问题又来了,这一条日志信息有点太多了,有一半多不是咱们想要的信息,因此,咱们须要去掉一些对于咱们没有用的字段
processors:
- drop_fields:
#删除的多余字段
fields:
- host
- ecs
- log
- agent
- input
- stream
- container
ignore_missing: true
复制代码
元信息:@metadata是不能删除的
经过上面的日志信息,能够看到是没有单独的一个关于日志时间的字段的,虽然里面有一个@timestamp
,但不是北京时间,而咱们要的是日志的时间,message
里面却是有时间,可是怎么能把它取到并单独添加一个字段呢,这个时候就须要用到script
了,须要写一个js脚原本替换。
processors:
- script:
lang: javascript
id: format_time
tag: enable
source: > function process(event) { var str=event.Get("message"); var time=str.split(" ").slice(0, 2).join(" "); event.Put("time", time); } - timestamp:
field: time
timezone: Asia/Shanghai
layouts:
- '2006-01-02 15:04:05'
- '2006-01-02 15:04:05.999'
test:
- '2019-06-22 16:33:51'
复制代码
添加完成后,会多一个time
的字段,在后面使用的时候,就可使用这个字段了。
实际上,到这个程度就已经完成了咱们的全部需求了,可是添加完k8s的信息以后,多了不少无用的字段,而咱们若是想去掉那些没用的字段用drop_fields
也能够,例以下面这种写法:
processors:
- drop_fields:
#删除的多余字段
fields:
- kubernetes.pod.uid
- kubernetes.namespace_uid
- kubernetes.namespace_labels
- kubernetes.node.uid
- kubernetes.node.labels
- kubernetes.replicaset
- kubernetes.labels
- kubernetes.node.name
ignore_missing: true
复制代码
这样写也能够把无用的字段去掉,可是结构层级没有变化,嵌套了不少层,最终结果可能变成这个样子
{
"@timestamp": "2021-04-19T07:07:36.065Z",
"@metadata": {
"beat": "filebeat",
"type": "_doc",
"version": "7.11.2"
},
"fields": {
"env": "dev",
"k8s": "cluster-dev"
"namespace": "test-1"
},
"message": "2021-04-19 15:07:36.065 INFO 23 --- [trap-executor-0] c.n.d.s.r.aws.ConfigClusterResolver : Resolving eureka endpoints via configuration",
"kubernetes": {
"container": {
"name": "test-server",
"image": "hub.test.com/test/test-server:3.3.1-ent-release-SNAPSHOT.20210402191241_87c9b1f841c"
},
"node": {
"hostname": "k8s-node-09"
},
"pod": {
"name": "test-server-85545c868b-6nsvc"
},
"namespace": "test-1"
}
}
复制代码
这样在后面使用es建立template的时候,就会嵌套好多层,查询起来也很不方便,既然这样那咱们就优化下这个层级结构,继续script
这个插件
processors:
- script:
lang: javascript
id: format_k8s
tag: enable
source: > function process(event) { var k8s=event.Get("kubernetes"); var newK8s = { podName: k8s.pod.name, nameSpace: k8s.namespace, imageAddr: k8s.container.name, hostName: k8s.node.hostname } event.Put("k8s", newK8s); } 复制代码
这里单首创建了一个字段k8s
,字段里包含:podName
, nameSpace
, imageAddr
, hostName
等关键信息,最后再把kubernetes
这个字段drop掉就能够了。最终结果以下:
{
"@timestamp": "2021-04-19T07:07:36.065Z",
"@metadata": {
"beat": "filebeat",
"type": "_doc",
"version": "7.11.2"
},
"fields": {
"env": "dev",
"k8s": "cluster-dev"
"namespace": "test-1"
},
"time": "2021-04-19 15:07:36.065",
"message": "2021-04-19 15:07:36.065 INFO 23 --- [trap-executor-0] c.n.d.s.r.aws.ConfigClusterResolver : Resolving eureka endpoints via configuration",
"k8s": {
"podName": "test-server-85545c868b-6nsvc",
"nameSpace": "test-1",
"imageAddr": "hub.test.com/test/test-server:3.3.1-ent-release-SNAPSHOT.20210402191241_87c9b1f841c",
"hostName": "k8s-node-09"
}
}
复制代码
这样看起来就很是清爽了。可是仍是有些繁琐,由于若是后面要加新的命名空间,那么每加一次就须要再改一次配置,这样也是很是的繁琐,那么还有没有更好的方式呢?答案屎有的。
既然经过output.kafka
来建立topic
的时候,能够经过指定字段,那么利用这一点,咱们就能够这样设置:
output.kafka:
hosts: ["10.0.105.74:9092","10.0.105.76:9092","10.0.105.96:9092"]
topic: '%{[fields.k8s]}-%{[k8s.nameSpace]}' # 经过往日志里注入k8s的元信息来获取命名空间
partition.round_robin:
reachable_only: true
复制代码
以前还在fields
下面建立了一个k8s
的字段,用来区分来自不通的k8s集群,那么这样,咱们就能够把这个配置文件优化成最终这个样子
apiVersion: v1
data:
filebeat.yml: |- filebeat.inputs: - type: container enabled: true paths: - /var/log/containers/*.log multiline.pattern: '^[0-9]{4}-[0-9]{2}-[0-9]{2} [0-9]{2}|^[1-9]\d*\.[1-9]\d*\.[1-9]\d*\.[1-9]\d*' multiline.negate: true multiline.match: after multiline.timeout: 10s filebeat.config.modules: path: ${path.config}/modules.d/*.yml reload.enabled: false processors: - drop_event.when.regexp: or: kubernetes.pod.name: "filebeat.*" kubernetes.pod.name: "external-dns.*" kubernetes.pod.name: "coredns.*" kubernetes.pod.name: "eureka.*" kubernetes.pod.name: "zookeeper.*" - script: lang: javascript id: format_time tag: enable source: > function process(event) { var str=event.Get("message"); var time=str.split(" ").slice(0, 2).join(" "); event.Put("time", time); } - timestamp: field: time timezone: Asia/Shanghai layouts: - '2006-01-02 15:04:05' - '2006-01-02 15:04:05.999' test: - '2019-06-22 16:33:51' # 下面这个脚本配置能够忽略,本意是想经过获取timestamp的时间转化为时间戳,而后再转换为本地时间 - script: lang: javascript id: format_time_utc tag: enable source: > function process(event) { var utc_time=event.Get("@timestamp"); var T_pos = utc_time.indexOf('T'); var Z_pos = utc_time.indexOf('Z'); var year_month_day = utc_time.substr(0, T_pos); var hour_minute_second = utc_time.substr(T_pos+1, Z_pos-T_pos-1); var new_time = year_month_day + " " + hour_minute_second; timestamp = new Date(Date.parse(new_time)); timestamp = timestamp.getTime(); timestamp = timestamp/1000; var timestamp = timestamp + 8*60*60; var bj_time = new Date(parseInt(timestamp) * 1000 + 8* 3600 * 1000); var bj_time = bj_time.toJSON().substr(0, 19).replace('T', ' '); event.Put("time_utc", bj_time); } - timestamp: field: time_utc layouts: - '2006-01-02 15:04:05' - '2006-01-02 15:04:05.999' test: - '2019-06-22 16:33:51' - add_fields: target: '' fields: env: prod - add_kubernetes_metadata: default_indexers.enabled: true default_matchers.enabled: true host: ${NODE_NAME} matchers: - logs_path: logs_path: "/var/log/containers/" - script: lang: javascript id: format_k8s tag: enable source: > function process(event) { var k8s=event.Get("kubernetes"); var newK8s = { podName: k8s.pod.name, nameSpace: k8s.namespace, imageAddr: k8s.container.name, hostName: k8s.node.hostname, k8sName: "sg-saas-pro-hbali" } event.Put("k8s", newK8s); } - drop_fields: #删除的多余字段 fields: - host - tags - ecs - log - prospector - agent - input - beat - offset - stream - container - kubernetes ignore_missing: true output.kafka: hosts: ["10.127.91.90:9092","10.127.91.91:9092","10.127.91.92:9092"] topic: '%{[k8s.k8sName]}-%{[k8s.nameSpace]}' partition.round_robin: reachable_only: true kind: ConfigMap
metadata:
name: filebeat-daemonset-config
namespace: default
复制代码
作了一些调整,须要记住这里建立topic时的方式%{[k8s.k8sName]}-%{[k8s.nameSpace]}
,后面使用logstash
时还会用到。
我的认为让filebeat在收集日志的第一层作一些处理,能缩短整个过程的处理时间,由于瓶颈大多在es和logstash,因此一些耗时的操做尽可能在filebeat这块去处理,若是处理不了在使用logstash,另一个很是容易忽视的一点就是,对于日志内容的简化,这样能显著下降日志的体积,我作过测试,一样的日志条数,未作简化的体积达到20G,而优化后的体积不到10G,这样的话对于整个es集群来讲能够说是很是的友好和做用巨大了。
欢迎各位朋友关注个人公众号,来一块儿学习进步哦