主机监控是监控Kafka集群Broker所在的节点机器的性能。常见的主机监控指标包括:
(1)机器负载(Load)
(2)CPU使用率
(3)内存使用率,包括空闲内存(Free Memory)和已使用内存(Used Memory)
(4)磁盘I/O使用率,包括读使用率和写使用率网络
(5)I/O使用率
(6)TCP链接数
(7)打开文件数
(8)inode使用状况java
Kafka Broker进程是一个普通的Java进程,所以全部关于JVM的监控方式均可以用于对Kafka Broker进程的监控。
(1)Full GC发生频率和时长,用于评估Full GC对Broker进程的影响。长时间的停顿会令Broker端抛出各类超时异常。
(2)活跃对象大小,是设定堆大小的重要依据,能帮助细粒度地调优JVM各个代的堆大小。
(3)应用线程总数。了解Broker进程对CPU的使用状况。2019-07-30T09:13:03.809+0800: 552.982: [GC cleanup 827M->645M(1024M), 0.0019078 secs]
Broker JVM进程默认使用G1的GC算法,当cleanup步骤结束后,堆上活跃对象大小从827MB缩减成645MB。Kafka 0.9.0.0版本起,默认GC收集器为G1,而G1中的Full GC是由单线程执行的,速度很是慢。所以,须要监控Broker GC日志,即以kafkaServer-gc.log开头的文件。若是发现Broker进程频繁Full GC,能够开启G1的-XX:+PrintAdaptiveSizePolicy开关,让JVM指明是谁引起Full GC。node
(1)查看Broker进程是否启动,端口是否创建。在容器化的Kafka环境中,使用Docker启动Kafka Broker时,Docker容器虽然成功启动,但网络设置若是配置有误,就可能会出现进程已经启动但端口未成功创建监听的情形。
(2)查看Broker端关键日志。Broker端服务器日志server.log,控制器日志controller.log以及主题分区状态变动日志state-change.log。
(3)查看Broker端关键线程的运行状态。Kafka Broker进程会启动十几个甚至是几十个线程。在实际生产环境中,Log Compaction线程是以kafka-log-cleaner-thread开头的,负责日志Compaction;副本拉取消息的线程,一般以ReplicaFetcherThread开头,负责执行Follower副本向Leader副本拉取消息的逻辑。
(4)查看Broker端的关键JMX指标。
BytesIn/BytesOut:即Broker端每秒入站和出站字节数,若是值接近网络带宽,很容易出现网络丢包的情形。
NetworkProcessorAvgIdlePercent:即网络线程池线程平均的空闲比例,一般须要确保其值长期大于30%。若是小于30%,代表网络线程池很是繁忙,须要经过增长网络线程数或将负载转移给其它服务器的方式,来给Broker减负。
RequestHandlerAvgIdlePercent:即I/O线程池线程平均的空闲比例。若是值长期小于30%,须要调整I/O线程池的数量或者减小 Broker端的负载。
UnderReplicatedPartitions:即未充分备份的分区数。所谓未充分备份,是指并不是全部的Follower副本都和Leader副本保持同步。
ISRShrink/ISRExpand:即ISR收缩和扩容的频次指标。若是生产环境中出现ISR中副本频繁进出的情形,其值必定是很高的。须要诊断下副本频繁进出ISR的缘由,并采起适当的措施。
ActiveControllerCount:即当前处于激活状态的控制器的数量。一般,Controller所在Broker上的ActiveControllerCount指标值是1,其它Broker上的值是 0。若是发现存在多台Broker上ActiveControllerCount值都是1,代表Kafka集群出现了脑裂,必须尽快处理,处理方式主要是查看网络连通性。脑裂问题是很是严重的分布式故障,Kafka目前依托ZooKeeper来防止脑裂,一旦出现脑裂,Kafka没法保证正常工做。
(5)监控Kafka客户端。客户端所在的机器与Kafka Broker机器之间的网络往返时延(Round-Trip Time,RTT)。对于生产者,以kafka-producer-network-thread开头的线程负责实际消息发送,一旦挂掉,Producer将没法正常工做,但Producer进程不会自动挂掉。对于消费者,以kafka-coordinator-heartbeat-thread 开头的心跳线程事关Rebalance。
从Producer角度,须要关注的JMX指标是request-latency,即消息生产请求的延时,最直接地表征Producer程序的TPS;从 Consumer角度,records-lag和records-lead是两个重要的JMX 指标。若是使用Consumer Group,须要关注join rate和sync rate指标,其代表Rebalance的频繁程度。ios
JMX(Java Management Extensions)能够管理、监控正在运行中的Java程序,用于管理线程、内存、日志Level、服务重启、系统环境等。git
开启JMX端口的方式有两种:
(1)启动Kafka时设置JMX_PORTexport JMX_PORT=9999 kafka-server-start.sh -daemon config/server.properties
(2)修改kafka-run-class.sh
在kafka-run-class.sh文件开始增长下列行:JMX_PORT=9999
修改kafka-run-class.sh文件后重启Kafka集群。
(3)Kafka Docker容器服务的JMX开启
Kafka容器服务的docker-compose.yml文件导入KAFKA_JMX_OPTS和JMX_PORT环境变量。github
KAFKA_JMX_OPTS: "-Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false -Djava.rmi.server.hostname=192.168.0.105 -Dcom.sun.management.jmxremote.rmi.port=9999" JMX_PORT: 9999
将相应的JMX端口对外暴露。web
ports: - "9999:9999" # 对外暴露端口号
Kafka须要监控Broker和Topic数据时,须要开启JMX_PORT,一般在脚本kafka-run-class.sh里面定义JMX_PORT变量,但JMX_PORT定义完成后,执行bin目录下脚本工具会报错。缘由在于
kafka-run-class.sh是被调用脚本,当被其它脚本调用时,Java会绑定JMX_PORT,致使端口被占用。
解决方法是在执行Kafka启动时指定JMX_PORT。
(1)supervisor启动Kafka,在supervisor服务启动配置文件中加入environment=JMX_PORT=9999。
(2)kafka-server-start.sh脚本启动Kafka,在启动时export JMX_PORT=9999或者在kafka-server-start.sh脚本指定。
(3)修改kafka-run-class.sh脚本
修改Kafka安装目录下的bin/Kafka-run-class.sh文件:算法
JMXTool是Kafka社区的工具,可以实时查看Kafka JMX指标。kafka-run-class.sh kafka.tools.JmxTool
--attributes:指定要查询的JMX属性名称,是以逗号分隔的CSV格式。
--date-format:指定显示的日志格式
--jmx-url:指定要链接的JMX接口,默认格式是service:jmx:rmi:///jndi/rmi://:JMX端口/jmxrmi
。
--object-name:指定要查询的JMX MBean名称。
--reporting-interval:指定实时查询的时间间隔,默认2s。
每秒查询一次过去1分钟的Broker端每秒入站的流量(BytesInPerSec)命令以下:kafka-run-class.sh kafka.tools.JmxTool --object-name kafka.server:type=BrokerTopicMetrics,name=BytesInPerSec --jmx-url service:jmx:rmi:///jndi/rmi://:9999/jmxrmi --date-format "YYYY-MM-dd HH:mm:ss" --attributes OneMinuteRate --reporting-interval 1000
ActiveController JMX指标查看命令以下:kafka-run-class.sh kafka.tools.JmxTool --object-name kafka.controller:type=KafkaController,name=ActiveControllerCount --jmx-url service:jmx:rmi:///jndi/rmi://:9999/jmxrmi --date-format "YYYY-MM-dd HH:mm:ss" --reporting-interval 1000
docker
Kafka Manager是雅虎公司于2015年开源的一个Kafka监控框架,使用Scala语言开发,主要用于管理和监控Kafka集群。
Kafka Manager目前已经更名为CMAK (Cluster Manager for Apache Kafka)。
GitHub地址:
https://github.com/yahoo/CMAK
Kafka Manager Docker镜像:kafkamanager/kafka-manager
若是须要设置Kafka Manager基本安全认证,能够为Kafka Manager设置环境变量:数据库
KAFKA_MANAGER_AUTH_ENABLED: "true" KAFKA_MANAGER_USERNAME: username KAFKA_MANAGER_PASSWORD: password
Kafka-Manager服务部署Docker-Compose.yml文件以下:json
# 定义kafka-manager服务 kafka-manager-test: image: kafkamanager/kafka-manager # kafka-manager镜像 restart: always container_name: kafka-manager-test hostname: kafka-manager-test ports: - "9000:9000" # 对外暴露端口,提供web访问 depends_on: - kafka-test # 依赖 environment: ZK_HOSTS: zookeeper-test:2181 # 宿主机IP KAFKA_BROKERS: kafka-test:9090 # kafka KAFKA_MANAGER_AUTH_ENABLED: "true" KAFKA_MANAGER_USERNAME: admin KAFKA_MANAGER_PASSWORD: password
启动Kafka Manager服务,登陆Kafka Manager Web。
Web地址:http://127.0.0.1:9000
增长Kafka-Manager管理Kafka Broker节点:
一般,监控框架可使用JMXTrans + InfluxDB + Grafana组合,因为Grafana支持对JMX指标的监控,所以很容易将Kafka各类 JMX指标集成进来,对于已经采用JMXTrans + InfluxDB + Grafana监控方案的公司来讲,能够直接复用已有的监控框架,能够极大地节省运维成本。
Control Center可以实时地监控Kafka集群,同时还可以帮助操做和搭建基于Kafka的实时流处理应用。Control Center不是免费的,必须使用Confluent Kafka Platform企业版才能使用。
Jconsole(Java Monitoring and Management Console)是一种基于JMX的可视化监视、管理工具,提供概述、内存、线程、类、VM概要、MBean的监控。
在Linux Terminal执行jsoncole,在弹出的窗口的远程进程中输入service:jmx:rmi:///jndi/rmi://192.168.0.105:9999/jmxrmi
或192.168.0.105:9999
。
选择MBeans选项卡,
JMXTrans是一个经过JMX采集Java应用程序的数据采集器,只要Java应用程序开启JMX端口,就能够进行采集。
JMXTrans之后台deamon形式运行,每隔1分钟采集一次数据。
GitHub地址:https://github.com/jmxtrans/jmxtrans
JMXTrans Docker容器镜像下载:docker pull jmxtrans/jmxtrans
JMXTrans默认读取/var/lib/jmxtran
s目录下全部数据源配置文件(json格式文件),实时从数据源中获取数据,解析数据后存储到InfluxDB中。
JMXTrans配置JSON文件以下:
{ "servers": [{ "port": "9901", "host": "192.168.0.105", "queries": [{ "obj": "kafka.server:type=BrokerTopicMetrics,name=MessagesInPerSec", "attr": ["MeanRate", "OneMinuteRate", "FiveMinuteRate", "FifteenMinuteRate"], "resultAlias": "kafkaServer", "outputWriters": [{ "@class": "com.googlecode.jmxtrans.model.output.InfluxDbWriterFactory", "url": "http://192.168.0.105:8086/", "username": "admin", "password": "123456", "database": "jmx", "tags": { "application": "kafka_server" } }] }] }] }
servers:数组,数据源配置。 port:字符串,接收jmx的json数据的端口。 host:字符串,接收jmx的json数据的IP地址。 queries:数组,具体监控指标项,按JSON格式列出多个指标项,监控指标能够经过jconsole工具(JDK自带的工具)获取。 obj:字符串,监控指标的名称。 attr:数组,须要存储的指标项字段,是数据目标表的字段名。 resultAlias:字符串,InfluxDB中的表名。 outputWriters:数组,数据目的地。 @class:字符串,数据目的地的类。 url:字符串,数据目的地( InfluxDb )的url。 username:字符串,InfluxDB登陆名。 password:字符串,InfluxDB登陆密码。 database:字符串,InfluxDB数据库名(须要预先创好)。 tags:json,避免指标项在 InfluxDbB表中所对应的字段重名的状况。
Kafka的JMX监控指标能够经过jconsole进行获取。
对于BytesInPerSec监控指标,在jconsole的MBeans选项页找到BytesInPerSe。
ObjectName的值是监控指标obj的值。
ObjectName的属性是"attr"对应的指标值,能够选择一个或多个。
metric名称是resultAlias对应的指标值,在InfluxDB中是MEASUREMENTS名。
"tags" 对应InfluxDB的tag功能,用于与存储在同一个MEASUREMENTS里的不一样监控指标作区分。
{ "obj":"kafka.server:type=BrokerTopicMetrics,name=BytesInPerSec", "attr":[ "Count", "EventType","RateUnit","OneMinuteRate" ], "resultAlias":"BytesInPerSec", "outputWriters": [{ "@class" : "com.googlecode.jmxtrans.model.output.InfluxDbWriterFactory", "url" : "http://192.168.0.105:8086/", "username" : "admin", "password" : "123456", "database" : "jmx", "tags" : { "application" : "BytesInPerSec" } } ] }
对于全局监控,每个监控指标对应一个InfluxDB的MEASUREMENTS,全部的Kafka节点的同一个监控指标数据写同一个MEASUREMENTS;对于Topic的监控指标,同一个Topic的全部Kafka节点写到同一个MEASUREMENTS,而且以Topic名称命名。
{ "servers" : [ { "port" : "9999", "host" : "192.168.0.105", "queries" : [ { "obj" : "java.lang:type=Memory", "attr" : [ "HeapMemoryUsage", "NonHeapMemoryUsage" ], "resultAlias":"jvmMemory", "outputWriters" : [ { "@class" : "com.googlecode.jmxtrans.model.output.InfluxDbWriterFactory", "url" : "http://192.168.0.105:8086/", "username" : "admin", "password" : "123456", "database" : "jmx", "tags" : {"application" : "kafka_server"} } ] },{ "obj" : "kafka.server:type=BrokerTopicMetrics,name=MessagesInPerSec", "attr" : [ "MeanRate","OneMinuteRate","FiveMinuteRate","FifteenMinuteRate" ], "resultAlias":"kafkaServer", "outputWriters" : [ { "@class" : "com.googlecode.jmxtrans.model.output.InfluxDbWriterFactory", "url" : "http://192.168.0.105:8086/", "username" : "admin", "password" : "123456", "database" : "jmx", "tags" : {"application" : "kafka_server"} } ] },{ "obj" : "kafka.server:type=BrokerTopicMetrics,name=BytesInPerSec", "attr" : [ "MeanRate","OneMinuteRate","FiveMinuteRate","FifteenMinuteRate" ], "resultAlias":"kafkaServer", "outputWriters" : [ { "@class" : "com.googlecode.jmxtrans.model.output.InfluxDbWriterFactory", "url" : "http://192.168.0.105:8086/", "username" : "admin", "password" : "123456", "database" : "jmx", "tags" : {"application" : "kafka_server"} } ] },{ "obj" : "kafka.server:type=BrokerTopicMetrics,name=BytesOutPerSec", "attr" : [ "MeanRate","OneMinuteRate","FiveMinuteRate","FifteenMinuteRate" ], "resultAlias":"kafkaServer", "outputWriters" : [ { "@class" : "com.googlecode.jmxtrans.model.output.InfluxDbWriterFactory", "url" : "http://192.168.0.105:8086/", "username" : "admin", "password" : "123456", "database" : "jmx", "tags" : {"application" : "kafka_server"} } ] },{ "obj" : "kafka.server:type=BrokerTopicMetrics,name=FailedProduceRequestsPerSec", "attr" : [ "MeanRate","OneMinuteRate","FiveMinuteRate","FifteenMinuteRate" ], "resultAlias":"kafkaServer", "outputWriters" : [ { "@class" : "com.googlecode.jmxtrans.model.output.InfluxDbWriterFactory", "url" : "http://192.168.0.105:8086/", "username" : "admin", "password" : "123456", "database" : "jmx", "tags" : {"application" : "kafka_server"} } ] },{ "obj" : "kafka.server:type=BrokerTopicMetrics,name=FailedFetchRequestsPerSec", "attr" : [ "MeanRate","OneMinuteRate","FiveMinuteRate","FifteenMinuteRate" ], "resultAlias":"kafkaServer", "outputWriters" : [ { "@class" : "com.googlecode.jmxtrans.model.output.InfluxDbWriterFactory", "url" : "http://192.168.0.105:8086/", "username" : "admin", "password" : "123456", "database" : "jmx", "tags" : {"application" : "kafka_server"} } ] },{ "obj" : "kafka.server:type=BrokerTopicMetrics,name=TotalFetchRequestsPerSec", "attr" : [ "MeanRate","OneMinuteRate","FiveMinuteRate","FifteenMinuteRate" ], "resultAlias":"kafkaServer", "outputWriters" : [ { "@class" : "com.googlecode.jmxtrans.model.output.InfluxDbWriterFactory", "url" : "http://192.168.0.105:8086/", "username" : "admin", "password" : "123456", "database" : "jmx", "tags" : {"application" : "kafka_server"} } ] },{ "obj" : "kafka.server:type=BrokerTopicMetrics,name=TotalProduceRequestsPerSec", "attr" : [ "MeanRate","OneMinuteRate","FiveMinuteRate","FifteenMinuteRate" ], "resultAlias":"kafkaServer", "outputWriters" : [ { "@class" : "com.googlecode.jmxtrans.model.output.InfluxDbWriterFactory", "url" : "http://192.168.0.105:8086/", "username" : "admin", "password" : "123456", "database" : "jmx", "tags" : {"application" : "kafka_server"} } ] },{ "obj" : "kafka.server:type=ReplicaManager,name=UnderReplicatedPartitions", "attr" : [ "Value" ], "resultAlias":"underReplicated", "outputWriters" : [ { "@class" : "com.googlecode.jmxtrans.model.output.InfluxDbWriterFactory", "url" : "http://192.168.0.105:8086/", "username" : "admin", "password" : "123456", "database" : "jmx", "tags" : {"application" : "kafka_server"} } ] },{ "obj" : "kafka.controller:type=KafkaController,name=ActiveControllerCount", "attr" : [ "Value" ], "resultAlias":"activeController", "outputWriters" : [ { "@class" : "com.googlecode.jmxtrans.model.output.InfluxDbWriterFactory", "url" : "http://192.168.0.105:8086/", "username" : "admin", "password" : "123456", "database" : "jmx", "tags" : {"application" : "kafka_server"} } ] },{ "obj" : "java.lang:type=OperatingSystem", "attr" : [ "FreePhysicalMemorySize","SystemCpuLoad","ProcessCpuLoad","SystemLoadAverage" ], "resultAlias":"jvmMemory", "outputWriters" : [ { "@class" : "com.googlecode.jmxtrans.model.output.InfluxDbWriterFactory", "url" : "http://192.168.0.105:8086/", "username" : "admin", "password" : "123456", "database" : "jmx", "tags" : {"application" : "kafka_server"} } ] } ,{ "obj" : "kafka.network:type=SocketServer,name=NetworkProcessorAvgIdlePercent", "attr" : [ "Value" ], "resultAlias":"network", "outputWriters" : [ { "@class" : "com.googlecode.jmxtrans.model.output.InfluxDbWriterFactory", "url" : "http://192.168.0.105:8086/", "username" : "admin", "password" : "123456", "database" : "jmx", "tags" : {"application" : "kafka_server"} } ] },{ "obj" : "kafka.server:type=KafkaRequestHandlerPool,name=RequestHandlerAvgIdlePercent", "attr" : [ "MeanRate","OneMinuteRate","FiveMinuteRate","FifteenMinuteRate" ], "resultAlias":"network", "outputWriters" : [ { "@class" : "com.googlecode.jmxtrans.model.output.InfluxDbWriterFactory", "url" : "http://192.168.0.105:8086/", "username" : "admin", "password" : "123456", "database" : "jmx", "tags" : {"application" : "kafka_server"} } ] },{ "obj" : "java.lang:type=GarbageCollector,name=G1 Young Generation", "attr" : [ "CollectionCount","CollectionTime" ], "resultAlias":"gc", "outputWriters" : [ { "@class" : "com.googlecode.jmxtrans.model.output.InfluxDbWriterFactory", "url" : "http://192.168.0.105:8086/", "username" : "admin", "password" : "123456", "database" : "jmx", "tags" : {"application" : "kafka_server"} } ] }] } ] }
JMX经过网络链接,所以JMXtrans有2种部署方案:
(1)集中式。在一台服务器上部署JMXtrans,分别链接全部的Kafka Broker实例,并将数据写入到InfluxDB。为了减小网络传输,一般部署到InfluxDB所在服务器上。
(2)分布式。每一个Kafka Broker实例部署一个JMXtrans。
JMXTrans配置文件分全局指标(每一个Kafka节点)和Topic指标,全局指标是每一个节点一个配置文件,命名规则:kafka-brokerxx.json,Topic指标是每一个Topic一个配置文件,命名规则:TopicName.json。
监控系统架构一般分为三部分:数据采集、分析与转换、数据展现(可视化)。
(1)数据采集
数据采集一般先开发数据采集程序,而后使用Nagios、Zabbix等监控软件来调度执行,并将采集到的数据进行上报。对于Java程序,可使用JMXTrans采集数据。
(2)分析与转换
Kafka是Java应用程序,所提供的性能指标数据已经很是全面,指标的直方图、次数、最大最小、标准方差都已经计算好,所以不须要再对数据进行分析加工,直接将MBeans数据存储到InfluxDB。
(3)数据可视化
Grafana是一个开源的可视化面板(Dashboard),支持Graphite、Zabbix、InfluxDB、Prometheus和OpenTSDB做为数据源。
InfluxDB是一款用Go语言编写的开源分布式时序、事件和指标数据库,无需外部依赖,主要用于存储涉及大量的时间戳数据,如DevOps监控数据、APP metrics、lOT传感器数据和实时分析数据。docker pull influxdb
influxdb.yml文件:
version: '2' services: influxdb: image: influxdb container_name: influxdb volumes: - /data/influxdb/conf:/etc/influxdb - /data/influxdb/data:/var/lib/influxdb/data - /data/influxdb/meta:/var/lib/influxdb/meta - /data/influxdb/wal:/var/lib/influxdb/wal ports: - "8086:8086" restart: always
结果查看:docker exec -it influxdb influx
JMXTrans是一个经过JMX采集Java应用程序的数据采集器,只要Java应用程序开启JMX端口,就能够进行采集。docker pull jmxtrans/jmxtrans
JMXTrans默认读取/var/lib/jmxtrans目录下全部数据源配置文件(json格式文件),实时从数据源中获取数据,解析数据后存储到InfluxDB中。
version: '2' services: # JMXTrans服务 jmxtrans: image: jmxtrans/jmxtrans container_name: jmxtrans volumes: - ./jmxtrans:/var/lib/jmxtrans
Grafana是一个可视化面板(Dashboard),有很是漂亮的图表和布局展现,功能齐全的度量仪表盘和图形编辑器,支持Graphite、zabbix、InfluxDB、Prometheus和OpenTSDB做为数据源。
Grafana主要特性以下:
(1)展现方式:快速灵活的客户端图表,面板插件有许多不一样方式的可视化指标和日志,官方库中具备丰富的仪表盘插件,好比热图、折线图、图表等多种展现方式。
(2)数据源:Graphite,InfluxDB,OpenTSDB,Prometheus,Elasticsearch,CloudWatch和KairosDB等。
(3)通知提醒:以可视方式定义最重要指标的警报规则,Grafana将不断计算并发送通知,在数据达到阈值时经过Slack、PagerDuty等得到通知。
(4)混合展现:在同一图表中混合使用不一样的数据源,能够基于每一个查询指定数据源,甚至自定义数据源。
(5)注释:使用来自不一样数据源的丰富事件注释图表,将鼠标悬停在事件上会显示完整的事件元数据和标记。
(6)过滤器:Ad-hoc过滤器容许动态建立新的键/值过滤器,这些过滤器会自动应用于使用该数据源的全部查询。
GitHub地址:https://github.com/grafana/grafana
Grafana容器镜像下载:docker pull grafana/grafana:6.5.0
Grafana容器启动:docker run -d --name=grafana -p 3000:3000 grafana/grafana:6.5.0
Web登陆:192.168.0.105:3000
初次登陆默认使用admin/admin登陆,登陆后会强制要求修改密码。
增长数据源:
导入DashBoard模板:
DashBoard模板json文件以下:
{ "__inputs": [ { "name": "DS_KAFKAMONITOR", "label": "KafkaMonitor", "description": "", "type": "datasource", "pluginId": "influxdb", "pluginName": "InfluxDB" } ], "__requires": [ { "type": "grafana", "id": "grafana", "name": "Grafana", "version": "6.7.3" }, { "type": "panel", "id": "graph", "name": "Graph", "version": "" }, { "type": "datasource", "id": "influxdb", "name": "InfluxDB", "version": "1.0.0" } ], "annotations": { "list": [ { "$$hashKey": "object:318", "builtIn": 1, "datasource": "-- Grafana --", "enable": true, "hide": true, "iconColor": "rgba(0, 211, 255, 1)", "name": "Annotations & Alerts", "type": "dashboard" } ] }, "editable": true, "gnetId": null, "graphTooltip": 0, "id": null, "links": [], "panels": [ { "aliasColors": {}, "bars": false, "dashLength": 10, "dashes": false, "datasource": "${DS_KAFKAMONITOR}", "description": "java.lang:type=OperatingSystem", "fill": 1, "fillGradient": 0, "gridPos": { "h": 12, "w": 8, "x": 0, "y": 0 }, "hiddenSeries": false, "id": 6, "legend": { "alignAsTable": true, "avg": true, "current": true, "max": true, "min": true, "show": true, "total": false, "values": true }, "lines": true, "linewidth": 1, "nullPointMode": "null", "options": { "dataLinks": [] }, "percentage": false, "pointradius": 2, "points": false, "renderer": "flot", "seriesOverrides": [], "spaceLength": 10, "stack": false, "steppedLine": false, "targets": [ { "alias": "", "groupBy": [ { "params": [ "1m" ], "type": "time" }, { "params": [ "hostname" ], "type": "tag" }, { "params": [ "null" ], "type": "fill" } ], "measurement": "jvmMemory", "orderByTime": "ASC", "policy": "default", "refId": "A", "resultFormat": "time_series", "select": [ [ { "params": [ "ProcessCpuLoad" ], "type": "field" }, { "params": [], "type": "last" }, { "params": [ "进程CPU使用率" ], "type": "alias" } ] ], "tags": [] } ], "thresholds": [], "timeFrom": null, "timeRegions": [], "timeShift": null, "title": "Kafka进程CPU使用率", "tooltip": { "shared": true, "sort": 0, "value_type": "individual" }, "type": "graph", "xaxis": { "buckets": null, "mode": "time", "name": null, "show": true, "values": [] }, "yaxes": [ { "$$hashKey": "object:1134", "format": "percentunit", "label": null, "logBase": 1, "max": null, "min": null, "show": true }, { "$$hashKey": "object:1135", "format": "short", "label": null, "logBase": 1, "max": null, "min": null, "show": true } ], "yaxis": { "align": false, "alignLevel": null } }, { "aliasColors": {}, "bars": false, "dashLength": 10, "dashes": false, "datasource": "${DS_KAFKAMONITOR}", "description": "服务器CPU使用率", "fill": 1, "fillGradient": 0, "gridPos": { "h": 12, "w": 8, "x": 8, "y": 0 }, "hiddenSeries": false, "id": 2, "legend": { "alignAsTable": true, "avg": true, "current": true, "max": true, "min": true, "show": true, "total": false, "values": true }, "lines": true, "linewidth": 1, "nullPointMode": "null", "options": { "dataLinks": [] }, "percentage": false, "pointradius": 2, "points": false, "renderer": "flot", "seriesOverrides": [], "spaceLength": 10, "stack": false, "steppedLine": false, "targets": [ { "alias": "", "groupBy": [ { "params": [ "1m" ], "type": "time" }, { "params": [ "hostname" ], "type": "tag" }, { "params": [ "null" ], "type": "fill" } ], "measurement": "jvmMemory", "orderByTime": "ASC", "policy": "default", "refId": "A", "resultFormat": "time_series", "select": [ [ { "params": [ "SystemCpuLoad" ], "type": "field" }, { "params": [], "type": "last" }, { "params": [ "CPU使用率" ], "type": "alias" } ] ], "tags": [] } ], "thresholds": [], "timeFrom": null, "timeRegions": [], "timeShift": null, "title": "CPU使用率", "tooltip": { "shared": true, "sort": 0, "value_type": "individual" }, "type": "graph", "xaxis": { "buckets": null, "mode": "time", "name": null, "show": true, "values": [] }, "yaxes": [ { "$$hashKey": "object:369", "format": "percentunit", "label": null, "logBase": 1, "max": null, "min": null, "show": true }, { "$$hashKey": "object:370", "format": "short", "label": null, "logBase": 1, "max": null, "min": null, "show": true } ], "yaxis": { "align": false, "alignLevel": null } }, { "aliasColors": {}, "bars": false, "dashLength": 10, "dashes": false, "datasource": "${DS_KAFKAMONITOR}", "description": "java.lang:type=OperatingSystem\nLinux系统负载", "fill": 1, "fillGradient": 0, "gridPos": { "h": 12, "w": 8, "x": 16, "y": 0 }, "hiddenSeries": false, "id": 4, "legend": { "alignAsTable": true, "avg": false, "current": true, "max": true, "min": false, "show": true, "total": false, "values": true }, "lines": true, "linewidth": 1, "nullPointMode": "null", "options": { "dataLinks": [] }, "percentage": false, "pointradius": 2, "points": false, "renderer": "flot", "seriesOverrides": [], "spaceLength": 10, "stack": false, "steppedLine": false, "targets": [ { "alias": "", "groupBy": [ { "params": [ "1m" ], "type": "time" }, { "params": [ "hostname" ], "type": "tag" }, { "params": [ "null" ], "type": "fill" } ], "measurement": "jvmMemory", "orderByTime": "ASC", "policy": "default", "refId": "A", "resultFormat": "time_series", "select": [ [ { "params": [ "SystemLoadAverage" ], "type": "field" }, { "params": [], "type": "last" }, { "params": [ "系统负载" ], "type": "alias" } ] ], "tags": [] } ], "thresholds": [], "timeFrom": null, "timeRegions": [], "timeShift": null, "title": "系统负载", "tooltip": { "shared": true, "sort": 0, "value_type": "individual" }, "type": "graph", "xaxis": { "buckets": null, "mode": "time", "name": null, "show": true, "values": [] }, "yaxes": [ { "$$hashKey": "object:656", "format": "short", "label": null, "logBase": 1, "max": null, "min": null, "show": true }, { "$$hashKey": "object:657", "format": "short", "label": null, "logBase": 1, "max": null, "min": null, "show": true } ], "yaxis": { "align": false, "alignLevel": null } }, { "aliasColors": {}, "bars": false, "dashLength": 10, "dashes": false, "datasource": "${DS_KAFKAMONITOR}", "description": "Kafka每一个broker每秒中的数据量,包括__consumer_offsets topic", "fill": 1, "fillGradient": 0, "gridPos": { "h": 12, "w": 8, "x": 0, "y": 12 }, "hiddenSeries": false, "id": 34, "legend": { "alignAsTable": true, "avg": false, "current": true, "max": true, "min": true, "show": true, "total": false, "values": true }, "lines": true, "linewidth": 1, "nullPointMode": "null", "options": { "dataLinks": [] }, "percentage": false, "pointradius": 2, "points": false, "renderer": "flot", "seriesOverrides": [], "spaceLength": 10, "stack": false, "steppedLine": false, "targets": [ { "alias": "", "groupBy": [ { "params": [ "1m" ], "type": "time" }, { "params": [ "hostname" ], "type": "tag" } ], "hide": false, "measurement": "kafkaServer", "orderByTime": "ASC", "policy": "default", "refId": "D", "resultFormat": "time_series", "select": [ [ { "params": [ "OneMinuteRate" ], "type": "field" }, { "params": [], "type": "last" }, { "params": [ "平均每秒" ], "type": "alias" } ] ], "tags": [ { "key": "typeName", "operator": "=", "value": "type=BrokerTopicMetrics,name=MessagesInPerSec" } ] }, { "alias": "", "groupBy": [ { "params": [ "1m" ], "type": "time" } ], "hide": false, "measurement": "kafkaServer", "orderByTime": "ASC", "policy": "default", "refId": "A", "resultFormat": "time_series", "select": [ [ { "params": [ "OneMinuteRate" ], "type": "field" }, { "params": [], "type": "sum" }, { "params": [ "全部broker平均每秒" ], "type": "alias" } ] ], "tags": [ { "key": "typeName", "operator": "=", "value": "type=BrokerTopicMetrics,name=MessagesInPerSec" } ] } ], "thresholds": [], "timeFrom": null, "timeRegions": [], "timeShift": null, "title": "Kafka Topic 每秒数据量", "tooltip": { "shared": true, "sort": 0, "value_type": "individual" }, "type": "graph", "xaxis": { "buckets": null, "mode": "time", "name": null, "show": true, "values": [] }, "yaxes": [ { "$$hashKey": "object:2118", "format": "none", "label": null, "logBase": 1, "max": null, "min": null, "show": true }, { "$$hashKey": "object:2119", "format": "short", "label": null, "logBase": 1, "max": null, "min": null, "show": true } ], "yaxis": { "align": false, "alignLevel": null } }, { "aliasColors": {}, "bars": false, "dashLength": 10, "dashes": false, "datasource": "${DS_KAFKAMONITOR}", "description": "java.lang:type=OperatingSystem\n服务器可用物理内存", "fill": 1, "fillGradient": 0, "gridPos": { "h": 12, "w": 8, "x": 8, "y": 12 }, "hiddenSeries": false, "id": 32, "legend": { "alignAsTable": true, "avg": false, "current": true, "max": false, "min": false, "show": true, "total": false, "values": true }, "lines": true, "linewidth": 1, "nullPointMode": "null", "options": { "dataLinks": [] }, "percentage": false, "pointradius": 2, "points": false, "renderer": "flot", "seriesOverrides": [], "spaceLength": 10, "stack": false, "steppedLine": false, "targets": [ { "alias": "", "groupBy": [ { "params": [ "1m" ], "type": "time" }, { "params": [ "hostname" ], "type": "tag" }, { "params": [ "null" ], "type": "fill" } ], "measurement": "jvmMemory", "orderByTime": "ASC", "policy": "default", "refId": "A", "resultFormat": "time_series", "select": [ [ { "params": [ "FreePhysicalMemorySize" ], "type": "field" }, { "params": [], "type": "last" }, { "params": [ "系统剩余物理内存" ], "type": "alias" } ] ], "tags": [] } ], "thresholds": [], "timeFrom": null, "timeRegions": [], "timeShift": null, "title": "可用物理内存", "tooltip": { "shared": true, "sort": 0, "value_type": "individual" }, "type": "graph", "xaxis": { "buckets": null, "mode": "time", "name": null, "show": true, "values": [] }, "yaxes": [ { "$$hashKey": "object:2324", "format": "decbytes", "label": null, "logBase": 1, "max": null, "min": null, "show": true }, { "$$hashKey": "object:2325", "format": "short", "label": null, "logBase": 1, "max": null, "min": null, "show": true } ], "yaxis": { "align": false, "alignLevel": null } }, { "aliasColors": {}, "bars": false, "cacheTimeout": null, "dashLength": 10, "dashes": false, "datasource": "${DS_KAFKAMONITOR}", "description": "kafka.controller:type=KafkaController,name=ActiveControllerCount\n\nKafka控制器数量,每一个集群只有一台机器为1,为1的机器是Kafka控制器Crontroller", "fill": 1, "fillGradient": 0, "gridPos": { "h": 12, "w": 8, "x": 16, "y": 12 }, "hiddenSeries": false, "id": 26, "legend": { "alignAsTable": true, "avg": false, "current": true, "max": false, "min": false, "show": true, "total": false, "values": true }, "lines": true, "linewidth": 1, "links": [], "nullPointMode": "null", "options": { "dataLinks": [] }, "percentage": false, "pluginVersion": "6.7.3", "pointradius": 2, "points": false, "renderer": "flot", "seriesOverrides": [], "spaceLength": 10, "stack": false, "steppedLine": false, "targets": [ { "alias": "", "groupBy": [ { "params": [ "1m" ], "type": "time" }, { "params": [ "hostname" ], "type": "tag" } ], "measurement": "activeController", "orderByTime": "ASC", "policy": "default", "query": "SELECT sum(\"Value\") AS \"获取控制器数量\" FROM \"activeController\" WHERE $timeFilter GROUP BY time($__interval), \"hostname\"", "rawQuery": false, "refId": "A", "resultFormat": "time_series", "select": [ [ { "params": [ "Value" ], "type": "field" }, { "params": [], "type": "last" }, { "params": [ "获取控制器数量" ], "type": "alias" } ] ], "tags": [], "tz": "" } ], "thresholds": [], "timeFrom": null, "timeRegions": [], "timeShift": null, "title": "Kafka控制器数量", "tooltip": { "shared": true, "sort": 0, "value_type": "individual" }, "type": "graph", "xaxis": { "buckets": null, "mode": "time", "name": null, "show": true, "values": [] }, "yaxes": [ { "$$hashKey": "object:4446", "format": "short", "label": null, "logBase": 1, "max": null, "min": null, "show": true }, { "$$hashKey": "object:4447", "format": "short", "label": null, "logBase": 1, "max": null, "min": null, "show": true } ], "yaxis": { "align": false, "alignLevel": null } }, { "aliasColors": {}, "bars": false, "dashLength": 10, "dashes": false, "datasource": "${DS_KAFKAMONITOR}", "description": "监控 kafka.server:type=BrokerTopicMetrics,name=BytesOutPerSec 指标", "fill": 1, "fillGradient": 0, "gridPos": { "h": 9, "w": 8, "x": 0, "y": 24 }, "hiddenSeries": false, "id": 16, "legend": { "alignAsTable": true, "avg": true, "current": true, "max": true, "min": true, "show": true, "total": false, "values": true }, "lines": true, "linewidth": 1, "nullPointMode": "null", "options": { "dataLinks": [] }, "percentage": false, "pointradius": 2, "points": false, "renderer": "flot", "seriesOverrides": [], "spaceLength": 10, "stack": false, "steppedLine": false, "targets": [ { "alias": "", "groupBy": [ { "params": [ "1m" ], "type": "time" }, { "params": [ "hostname" ], "type": "tag" }, { "params": [ "null" ], "type": "fill" } ], "measurement": "kafkaServer", "orderByTime": "ASC", "policy": "default", "refId": "A", "resultFormat": "time_series", "select": [ [ { "params": [ "FiveMinuteRate" ], "type": "field" }, { "params": [], "type": "mean" }, { "params": [ "每秒拉取字节数" ], "type": "alias" } ] ], "tags": [ { "key": "typeName", "operator": "=", "value": "type=BrokerTopicMetrics,name=BytesOutPerSec" } ] } ], "thresholds": [], "timeFrom": null, "timeRegions": [], "timeShift": null, "title": "Kafka每秒拉取流量", "tooltip": { "shared": true, "sort": 0, "value_type": "individual" }, "type": "graph", "xaxis": { "buckets": null, "mode": "time", "name": null, "show": true, "values": [] }, "yaxes": [ { "$$hashKey": "object:77", "format": "decbytes", "label": null, "logBase": 1, "max": null, "min": null, "show": true }, { "$$hashKey": "object:78", "format": "short", "label": null, "logBase": 1, "max": null, "min": null, "show": true } ], "yaxis": { "align": false, "alignLevel": null } }, { "aliasColors": {}, "bars": false, "dashLength": 10, "dashes": false, "datasource": "${DS_KAFKAMONITOR}", "description": "监控 kafka.server:type=BrokerTopicMetrics,name=BytesInPerSec 指标", "fill": 1, "fillGradient": 0, "gridPos": { "h": 9, "w": 8, "x": 8, "y": 24 }, "hiddenSeries": false, "id": 14, "legend": { "alignAsTable": true, "avg": true, "current": true, "max": true, "min": true, "show": true, "total": false, "values": true }, "lines": true, "linewidth": 1, "nullPointMode": "null", "options": { "dataLinks": [] }, "percentage": false, "pointradius": 2, "points": false, "renderer": "flot", "seriesOverrides": [], "spaceLength": 10, "stack": false, "steppedLine": false, "targets": [ { "alias": "", "groupBy": [ { "params": [ "1m" ], "type": "time" }, { "params": [ "hostname" ], "type": "tag" }, { "params": [ "null" ], "type": "fill" } ], "measurement": "kafkaServer", "orderByTime": "ASC", "policy": "default", "refId": "F", "resultFormat": "time_series", "select": [ [ { "params": [ "OneMinuteRate" ], "type": "field" }, { "params": [], "type": "last" }, { "params": [ "平均每秒进入字节数" ], "type": "alias" } ] ], "tags": [ { "key": "typeName", "operator": "=", "value": "type=BrokerTopicMetrics,name=BytesInPerSec" } ] } ], "thresholds": [], "timeFrom": null, "timeRegions": [], "timeShift": null, "title": "Kafka每秒进入流量", "tooltip": { "shared": true, "sort": 0, "value_type": "individual" }, "type": "graph", "xaxis": { "buckets": null, "mode": "time", "name": null, "show": true, "values": [] }, "yaxes": [ { "$$hashKey": "object:77", "format": "decbytes", "label": null, "logBase": 1, "max": null, "min": null, "show": true }, { "$$hashKey": "object:78", "format": "short", "label": null, "logBase": 1, "max": null, "min": null, "show": true } ], "yaxis": { "align": false, "alignLevel": null } }, { "aliasColors": {}, "bars": false, "dashLength": 10, "dashes": false, "datasource": "${DS_KAFKAMONITOR}", "description": "监控 kafka.server:type=BrokerTopicMetrics,name=TotalFetchRequestsPerSec 和 kafka.server:type=BrokerTopicMetrics,name=TotalProduceRequestsPerSec 指标", "fill": 1, "fillGradient": 0, "gridPos": { "h": 9, "w": 8, "x": 16, "y": 24 }, "hiddenSeries": false, "id": 20, "legend": { "alignAsTable": true, "avg": true, "current": true, "max": true, "min": true, "show": true, "total": false, "values": true }, "lines": true, "linewidth": 1, "nullPointMode": "null", "options": { "dataLinks": [] }, "percentage": false, "pointradius": 2, "points": false, "renderer": "flot", "seriesOverrides": [], "spaceLength": 10, "stack": false, "steppedLine": false, "targets": [ { "alias": "", "groupBy": [ { "params": [ "1m" ], "type": "time" }, { "params": [ "hostname" ], "type": "tag" }, { "params": [ "null" ], "type": "fill" } ], "measurement": "kafkaServer", "orderByTime": "ASC", "policy": "default", "refId": "A", "resultFormat": "time_series", "select": [ [ { "params": [ "OneMinuteRate" ], "type": "field" }, { "params": [], "type": "last" }, { "params": [ "每秒Fetch(获取)的请求数量" ], "type": "alias" } ] ], "tags": [ { "key": "typeName", "operator": "=", "value": "type=BrokerTopicMetrics,name=TotalFetchRequestsPerSec" } ] }, { "alias": "", "groupBy": [ { "params": [ "1m" ], "type": "time" }, { "params": [ "hostname" ], "type": "tag" }, { "params": [ "null" ], "type": "fill" } ], "measurement": "kafkaServer", "orderByTime": "ASC", "policy": "default", "refId": "D", "resultFormat": "time_series", "select": [ [ { "params": [ "MeanRate" ], "type": "field" }, { "params": [], "type": "last" }, { "params": [ "每秒Producer发送的请求数量" ], "type": "alias" } ] ], "tags": [ { "key": "typeName", "operator": "=", "value": "type=BrokerTopicMetrics,name=TotalProduceRequestsPerSec" } ] } ], "thresholds": [], "timeFrom": null, "timeRegions": [], "timeShift": null, "title": "Kafka生产、消费每秒请求数量", "tooltip": { "shared": true, "sort": 0, "value_type": "individual" }, "type": "graph", "xaxis": { "buckets": null, "mode": "time", "name": null, "show": true, "values": [] }, "yaxes": [ { "$$hashKey": "object:77", "format": "short", "label": null, "logBase": 1, "max": null, "min": null, "show": true }, { "$$hashKey": "object:78", "format": "short", "label": null, "logBase": 1, "max": null, "min": null, "show": true } ], "yaxis": { "align": false, "alignLevel": null } }, { "aliasColors": {}, "bars": false, "dashLength": 10, "dashes": false, "datasource": "${DS_KAFKAMONITOR}", "description": "java.lang:type=Memory", "fill": 1, "fillGradient": 0, "gridPos": { "h": 13, "w": 8, "x": 0, "y": 33 }, "hiddenSeries": false, "id": 8, "legend": { "alignAsTable": true, "avg": true, "current": true, "max": true, "min": true, "show": true, "total": false, "values": true }, "lines": true, "linewidth": 1, "nullPointMode": "null", "options": { "dataLinks": [] }, "percentage": false, "pointradius": 2, "points": false, "renderer": "flot", "seriesOverrides": [], "spaceLength": 10, "stack": false, "steppedLine": false, "targets": [ { "alias": "", "groupBy": [ { "params": [ "1m" ], "type": "time" }, { "params": [ "hostname" ], "type": "tag" }, { "params": [ "null" ], "type": "fill" } ], "measurement": "jvmMemory", "orderByTime": "ASC", "policy": "default", "refId": "E", "resultFormat": "time_series", "select": [ [ { "params": [ "HeapMemoryUsage_used" ], "type": "field" }, { "params": [], "type": "last" }, { "params": [ "堆内存使用" ], "type": "alias" } ] ], "tags": [] } ], "thresholds": [], "timeFrom": null, "timeRegions": [], "timeShift": null, "title": "Kafka使用堆内存", "tooltip": { "shared": true, "sort": 0, "value_type": "individual" }, "type": "graph", "xaxis": { "buckets": null, "mode": "time", "name": null, "show": true, "values": [] }, "yaxes": [ { "$$hashKey": "object:1850", "format": "decbytes", "label": null, "logBase": 1, "max": null, "min": null, "show": true }, { "$$hashKey": "object:1851", "format": "short", "label": null, "logBase": 1, "max": null, "min": null, "show": true } ], "yaxis": { "align": false, "alignLevel": null } }, { "aliasColors": {}, "bars": false, "dashLength": 10, "dashes": false, "datasource": "${DS_KAFKAMONITOR}", "description": "java.lang:type=Memory", "fill": 1, "fillGradient": 0, "gridPos": { "h": 13, "w": 8, "x": 8, "y": 33 }, "hiddenSeries": false, "id": 30, "legend": { "alignAsTable": true, "avg": true, "current": true, "max": true, "min": true, "show": true, "total": false, "values": true }, "lines": true, "linewidth": 1, "nullPointMode": "null", "options": { "dataLinks": [] }, "percentage": false, "pointradius": 2, "points": false, "renderer": "flot", "seriesOverrides": [], "spaceLength": 10, "stack": false, "steppedLine": false, "targets": [ { "alias": "", "groupBy": [ { "params": [ "1m" ], "type": "time" }, { "params": [ "hostname" ], "type": "tag" }, { "params": [ "null" ], "type": "fill" } ], "measurement": "jvmMemory", "orderByTime": "ASC", "policy": "default", "refId": "E", "resultFormat": "time_series", "select": [ [ { "params": [ "NonHeapMemoryUsage_used" ], "type": "field" }, { "params": [], "type": "last" }, { "params": [ "对外内存使用" ], "type": "alias" } ] ], "tags": [] } ], "thresholds": [], "timeFrom": null, "timeRegions": [], "timeShift": null, "title": "Kafka使用堆外内存", "tooltip": { "shared": true, "sort": 0, "value_type": "individual" }, "type": "graph", "xaxis": { "buckets": null, "mode": "time", "name": null, "show": true, "values": [] }, "yaxes": [ { "$$hashKey": "object:1850", "format": "decbytes", "label": null, "logBase": 1, "max": null, "min": null, "show": true }, { "$$hashKey": "object:1851", "format": "short", "label": null, "logBase": 1, "max": null, "min": null, "show": true } ], "yaxis": { "align": false, "alignLevel": null } }, { "aliasColors": {}, "bars": false, "dashLength": 10, "dashes": false, "datasource": "${DS_KAFKAMONITOR}", "description": "kafka.server:type=ReplicaManager,name=UnderReplicatedPartitions\n不为0则说明有的副本跟不上leader", "fill": 1, "fillGradient": 0, "gridPos": { "h": 13, "w": 8, "x": 16, "y": 33 }, "hiddenSeries": false, "id": 24, "legend": { "alignAsTable": true, "avg": false, "current": true, "max": true, "min": true, "show": true, "total": false, "values": true }, "lines": true, "linewidth": 1, "nullPointMode": "null", "options": { "dataLinks": [] }, "percentage": false, "pluginVersion": "6.7.3", "pointradius": 2, "points": false, "renderer": "flot", "seriesOverrides": [], "spaceLength": 10, "stack": false, "steppedLine": false, "targets": [ { "alias": "", "groupBy": [ { "params": [ "1m" ], "type": "time" }, { "params": [ "hostname" ], "type": "tag" }, { "params": [ "null" ], "type": "fill" } ], "measurement": "underReplicated", "orderByTime": "ASC", "policy": "default", "refId": "A", "resultFormat": "time_series", "select": [ [ { "params": [ "Value" ], "type": "field" }, { "params": [], "type": "last" }, { "params": [ "未充分备份的分区数" ], "type": "alias" } ] ], "tags": [] } ], "thresholds": [], "timeFrom": null, "timeRegions": [], "timeShift": null, "title": "未充分备份的分区数监控", "tooltip": { "shared": true, "sort": 0, "value_type": "individual" }, "type": "graph", "xaxis": { "buckets": null, "mode": "time", "name": null, "show": true, "values": [] }, "yaxes": [ { "$$hashKey": "object:11235", "format": "short", "label": null, "logBase": 1, "max": null, "min": null, "show": true }, { "$$hashKey": "object:11236", "format": "short", "label": null, "logBase": 1, "max": null, "min": null, "show": true } ], "yaxis": { "align": false, "alignLevel": null } }, { "aliasColors": {}, "bars": false, "cacheTimeout": null, "dashLength": 10, "dashes": false, "datasource": "${DS_KAFKAMONITOR}", "description": "", "fill": 1, "fillGradient": 0, "gridPos": { "h": 13, "w": 8, "x": 0, "y": 46 }, "hiddenSeries": false, "id": 12, "legend": { "alignAsTable": true, "avg": false, "current": true, "max": true, "min": true, "show": true, "total": false, "values": true }, "lines": true, "linewidth": 1, "links": [], "nullPointMode": "null", "options": { "dataLinks": [] }, "percentage": false, "pluginVersion": "6.7.3", "pointradius": 2, "points": false, "renderer": "flot", "seriesOverrides": [], "spaceLength": 10, "stack": false, "steppedLine": false, "targets": [ { "alias": "", "groupBy": [ { "params": [ "5m" ], "type": "time" }, { "params": [ "hostname" ], "type": "tag" }, { "params": [ "null" ], "type": "fill" } ], "measurement": "network", "orderByTime": "ASC", "policy": "default", "refId": "A", "resultFormat": "time_series", "select": [ [ { "params": [ "Value" ], "type": "field" }, { "params": [], "type": "mean" }, { "params": [ "网络线程池空闲比例" ], "type": "alias" } ] ], "tags": [] } ], "thresholds": [], "timeFrom": null, "timeRegions": [], "timeShift": null, "title": "Kafka网络线程池线程平均的空闲比例", "tooltip": { "shared": true, "sort": 0, "value_type": "individual" }, "type": "graph", "xaxis": { "buckets": null, "mode": "time", "name": null, "show": true, "values": [] }, "yaxes": [ { "$$hashKey": "object:13734", "format": "percentunit", "label": null, "logBase": 1, "max": null, "min": null, "show": true }, { "$$hashKey": "object:13735", "format": "short", "label": null, "logBase": 1, "max": null, "min": null, "show": true } ], "yaxis": { "align": false, "alignLevel": null } }, { "aliasColors": {}, "bars": false, "cacheTimeout": null, "dashLength": 10, "dashes": false, "datasource": "${DS_KAFKAMONITOR}", "description": "kafka.server:type=KafkaRequestHandlerPool,name=RequestHandlerAvgIdlePercent", "fill": 1, "fillGradient": 0, "gridPos": { "h": 13, "w": 8, "x": 8, "y": 46 }, "hiddenSeries": false, "id": 22, "legend": { "alignAsTable": true, "avg": false, "current": true, "max": true, "min": true, "show": true, "total": false, "values": true }, "lines": true, "linewidth": 1, "links": [], "nullPointMode": "null", "options": { "dataLinks": [] }, "percentage": false, "pluginVersion": "6.7.3", "pointradius": 2, "points": false, "renderer": "flot", "seriesOverrides": [], "spaceLength": 10, "stack": false, "steppedLine": false, "targets": [ { "alias": "", "groupBy": [ { "params": [ "1m" ], "type": "time" }, { "params": [ "hostname" ], "type": "tag" } ], "measurement": "network", "orderByTime": "ASC", "policy": "default", "refId": "A", "resultFormat": "time_series", "select": [ [ { "params": [ "OneMinuteRate" ], "type": "field" }, { "params": [], "type": "last" }, { "params": [ "IO空闲比例" ], "type": "alias" } ] ], "tags": [ { "key": "typeName", "operator": "=", "value": "type=KafkaRequestHandlerPool,name=RequestHandlerAvgIdlePercent" } ] } ], "thresholds": [], "timeFrom": null, "timeRegions": [], "timeShift": null, "title": " I/O 线程池线程平均的空闲比例", "tooltip": { "shared": true, "sort": 0, "value_type": "individual" }, "type": "graph", "xaxis": { "buckets": null, "mode": "time", "name": null, "show": true, "values": [] }, "yaxes": [ { "$$hashKey": "object:13517", "format": "percentunit", "label": null, "logBase": 1, "max": null, "min": null, "show": true }, { "$$hashKey": "object:13518", "format": "short", "label": null, "logBase": 1, "max": null, "min": null, "show": true } ], "yaxis": { "align": false, "alignLevel": null } }, { "aliasColors": {}, "bars": false, "dashLength": 10, "dashes": false, "datasource": "${DS_KAFKAMONITOR}", "description": "监控 kafka.server:type=BrokerTopicMetrics,name=FailedFetchRequestsPerSec 和 kafka.server:type=BrokerTopicMetrics,name=TotalFetchRequestsPerSec 指标", "fill": 1, "fillGradient": 0, "gridPos": { "h": 13, "w": 8, "x": 16, "y": 46 }, "hiddenSeries": false, "id": 18, "legend": { "alignAsTable": true, "avg": true, "current": true, "max": true, "min": true, "show": true, "total": false, "values": true }, "lines": true, "linewidth": 1, "nullPointMode": "null", "options": { "dataLinks": [] }, "percentage": false, "pointradius": 2, "points": false, "renderer": "flot", "seriesOverrides": [], "spaceLength": 10, "stack": false, "steppedLine": false, "targets": [ { "alias": "", "groupBy": [ { "params": [ "1m" ], "type": "time" }, { "params": [ "hostname" ], "type": "tag" }, { "params": [ "null" ], "type": "fill" } ], "measurement": "kafkaServer", "orderByTime": "ASC", "policy": "default", "refId": "H", "resultFormat": "time_series", "select": [ [ { "params": [ "OneMinuteRate" ], "type": "field" }, { "params": [], "type": "last" }, { "params": [ "每秒Fetch(获取)异常的请求" ], "type": "alias" } ] ], "tags": [ { "key": "typeName", "operator": "=", "value": "type=BrokerTopicMetrics,name=FailedFetchRequestsPerSec" } ] }, { "alias": "", "groupBy": [ { "params": [ "1m" ], "type": "time" }, { "params": [ "hostname" ], "type": "tag" }, { "params": [ "null" ], "type": "fill" } ], "measurement": "kafkaServer", "orderByTime": "ASC", "policy": "default", "refId": "J", "resultFormat": "time_series", "select": [ [ { "params": [ "MeanRate" ], "type": "field" }, { "params": [], "type": "last" }, { "params": [ "每秒Producer异常的请求" ], "type": "alias" } ] ], "tags": [ { "key": "typeName", "operator": "=", "value": "type=BrokerTopicMetrics,name=FailedProduceRequestsPerSec" } ] } ], "thresholds": [], "timeFrom": null, "timeRegions": [], "timeShift": null, "title": "Kafka生产、消费请求失败数量", "tooltip": { "shared": true, "sort": 0, "value_type": "individual" }, "type": "graph", "xaxis": { "buckets": null, "mode": "time", "name": null, "show": true, "values": [] }, "yaxes": [ { "$$hashKey": "object:77", "format": "short", "label": null, "logBase": 1, "max": null, "min": null, "show": true }, { "$$hashKey": "object:78", "format": "short", "label": null, "logBase": 1, "max": null, "min": null, "show": true } ], "yaxis": { "align": false, "alignLevel": null } } ], "refresh": false, "schemaVersion": 22, "style": "dark", "tags": [], "templating": { "list": [] }, "time": { "from": "now-1h", "to": "now" }, "timepicker": { "refresh_intervals": [ "5s", "10s", "30s", "1m", "5m", "15m", "30m", "1h", "2h", "1d" ] }, "timezone": "", "title": "Kafka集群监控模板", "uid": "PkULDneZkALL", "variables": { "list": [] }, "version": 27 }
将InfluxDB、JMXTrans、Grafana部署整合使用Docker-Compose进行部署,建立KafkaMonitor目录,在KafkaMonitor目录内建立influxdb目录和jmxtrans目录以及docker-compose.yml文件,将jmxtrans.json文件放到jmxtrans目录。
docker-compose.yml文件以下:
version: '2' services: # JMXTrans服务 jmxtrans: image: jmxtrans/jmxtrans container_name: jmxtrans volumes: - ./jmxtrans:/var/lib/jmxtrans # InfluxDB服务 influxdb: image: influxdb container_name: influxdb volumes: - ./influxdb/conf:/etc/influxdb - ./influxdb/data:/var/lib/influxdb/data - ./influxdb/meta:/var/lib/influxdb/meta - ./influxdb/wal:/var/lib/influxdb/wal ports: - "8086:8086" # 对外暴露端口,提供Grafana访问 restart: always # Grafana服务 grafana: image: grafana/grafana:6.5.0 #高版本可能存在bug container_name: grafana ports: - "3000:3000" # 对外暴露端口,提供web访问
启动监控框架服务:docker-compose -f docker-compose.yml up -d
须要Web登陆Grafana服务,配置相应的数据源和模板。