先上一个架构图html
Flink App : 经过report 将数据发出去java
Pushgateway : Prometheus 生态中一个重要工具node
Prometheus : 一套开源的系统监控报警框架 (Prometheus 入门与实践)linux
Grafana: 一个跨平台的开源的度量分析和可视化工具,能够经过将采集的数据查询而后可视化的展现,并及时通知(可视化工具Grafana:简介及安装)web
Node_exporter : 跟Pushgateway同样是Prometheus 的组件,采集到主机的运行指标如CPU, 内存,磁盘等信息docker
如下安装,大部分参考博客: http://www.javashuo.com/article/p-bydptqfy-hb.htmlapache
一、docker pull 镜像api
docker pull prom/node-exporter docker pull prom/pushgateway docker pull prom/prometheus docker pull grafana/grafana
查看下载的镜像架构
$ docker images REPOSITORY TAG IMAGE ID CREATED SIZE prom/prometheus latest d5b9d7ed160a 2 weeks ago 138MB grafana/grafana latest a6e14b4109af 2 weeks ago 253MB prom/pushgateway latest 20e6dcae675f 4 weeks ago 19.2MB prom/node-exporter latest e5a616e4b9cf 2 months ago 22.9MB
二、编辑prometheus.yml 、建立 Grafana 数据存储目录框架
$ mkdir /opt/grafana-storage # grafana 数据存储目录
$ cat /opt/prometheus/prometheus.yml # prometheus 配置
global: scrape_interval: 60s evaluation_interval: 60s scrape_configs: - job_name: prometheus static_configs: - targets: ['localhost:9090'] labels: instance: prometheus - job_name: linux static_configs: - targets: ['venn:9100'] labels: instance: localhost - job_name: 'pushgateway' static_configs: - targets: ['venn:9091'] labels: instance: 'pushgateway'
三、启动各个组件
docker run -d -p 3000:3000 --name=grafana -v /opt/grafana-storage:/var/lib/grafana grafana/grafana docker run -d -p 9100:9100 -v "/proc:/host/proc:ro" -v "/sys:/host/sys:ro" -v "/:/rootfs:ro" --net="host" prom/node-exporter docker run -d -p 9090:9090 -v /opt/prometheus/prometheus.yml:/etc/prometheus/prometheus.yml prom/prometheus docker run -d -p 9091:9091 prom/pushgateway
查看docker进程
$ docker ps CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES 4a689cf48e10 prom/pushgateway "/bin/pushgateway" 5 days ago Up 5 days 0.0.0.0:9091->9091/tcp infallible_goldstine fcc40433bf75 grafana/grafana "/run.sh" 5 days ago Up 5 days 0.0.0.0:3000->3000/tcp grafana 8ba942d0cf35 prom/prometheus "/bin/prometheus --c…" 5 days ago Up 5 days 0.0.0.0:9090->9090/tcp quizzical_colden b84b0f4be2b2 prom/node-exporter "/bin/node_exporter" 5 days ago Up 5 days fervent_poitras
查看端口
$ netstat -apn | grep -E '9091|3000|9090|9100' (Not all processes could be identified, non-owned process info will not be shown, you would have to be root to see it all.) tcp 0 0 172.17.0.1:39028 172.17.0.4:9091 ESTABLISHED - tcp6 0 0 :::9100 :::* LISTEN - tcp6 0 0 :::3000 :::* LISTEN - tcp6 0 0 :::9090 :::* LISTEN - tcp6 0 0 :::9091 :::* LISTEN - tcp6 0 0 192.168.229.129:45864 192.168.229.128:9091 TIME_WAIT - tcp6 0 0 192.168.229.129:45856 192.168.229.128:9091 TIME_WAIT - tcp6 0 0 192.168.229.129:45824 192.168.229.128:9091 TIME_WAIT - tcp6 0 0 192.168.229.129:45874 192.168.229.128:9091 TIME_WAIT - tcp6 0 0 192.168.229.129:45854 192.168.229.128:9091 TIME_WAIT - tcp6 0 0 192.168.229.129:45836 192.168.229.128:9091 TIME_WAIT - tcp6 0 0 192.168.229.129:45814 192.168.229.128:9091 TIME_WAIT - tcp6 0 0 192.168.229.128:9100 192.168.229.1:13405 ESTABLISHED - tcp6 0 0 192.168.229.129:45826 192.168.229.128:9091 TIME_WAIT - tcp6 0 0 192.168.229.129:45844 192.168.229.128:9091 TIME_WAIT - tcp6 0 0 192.168.229.128:9091 172.17.0.2:53930 ESTABLISHED - tcp6 0 0 192.168.229.129:45846 192.168.229.128:9091 TIME_WAIT - tcp6 0 0 192.168.229.128:9100 172.17.0.2:54776 ESTABLISHED - tcp6 0 0 192.168.229.129:45816 192.168.229.128:9091 TIME_WAIT - tcp6 0 0 192.168.229.129:45876 192.168.229.128:9091 ESTABLISHED 40846/java tcp6 0 0 192.168.229.129:45834 192.168.229.128:9091 TIME_WAIT - tcp6 0 0 192.168.229.129:45866 192.168.229.128:9091 TIME_WAIT -
四、查看组件页面
node_exporter: ip:9100/metrics
查看 prometheus: ip:9090/targets
若是state 不是 UP 的,等一会就起来了
查看Grafana:
默认用户名密码 : amin/admin
此处再也不赘述,配置数据源、建立系统负载监控参考博客:http://www.javashuo.com/article/p-bydptqfy-hb.html
五、配置Flink report :
在Flink 配置文件 flink-conf.yml 中添加以下内容:
##metrics metrics.reporter.promgateway.class: org.apache.flink.metrics.prometheus.PrometheusPushGatewayReporter metrics.reporter.promgateway.host: venn metrics.reporter.promgateway.port: 9091 metrics.reporter.promgateway.jobName: myJob metrics.reporter.promgateway.randomJobNameSuffix: true metrics.reporter.promgateway.deleteOnShutdown: false
启动一个任务(上一篇博客的案例迟到数据处理):
flink run -m yarn-cluster -ynm LateDataProcess -yn 1 -c com.venn.stream.api.sideoutput.lateDataProcess.LateDataProcess jar/flinkDemo-1.0.jar
查看任务webUI:
PS:任务已经跑了一段时间了
六、Grafana 中配置Flink监控
因为上面一句配置好Flink report、 pushgateway、prometheus,而且在Grafana中已经添加了prometheus 数据源,因此Grafana中会自动获取到 flink job的metrics 。
Grafana 首页,点击New dashboard,建立一个新的dashboard
选中以后,即会出现对应的监控指标
至此,Flink 的metrics 的指标展现在Grafana 中了
flink 指标对应的指标名比较长,能够在Legend 中配置显示内容,在{{key}} 将key换成对应须要展现的字段便可,如: {{job_name}},{{operator_name}}
对应显示以下:
保存,搞定