环境:所有服务都是基于docker运行html
本文略微草率,好文章在这里,特别好以下:前端
http://www.javashuo.com/article/p-xikbiroq-dh.htmlnode
这是个系列文章,值得一看: https://yunlzheng.gitbook.io/prometheus-book/part-ii-prometheus-jin-jie/exporter/commonly-eporter-usage/use-prometheus-monitor-containerlinux
注意:每个操做建议结合状况使用,文章里的也会有不少错误,只是给一个思路方便理解git
prometheus经过node-exporter收集当前主机运行的状况,由于本环境全部都使用的容器,因此对于node-exporter来讲咱们要将对应的目录进行映射,由于node-exporter是跑在容器里,可是咱们要让他监控的是宿主机的各个状态github
再而后,部署了alertmanager容器服务,使之映射在主机的9093端口;prometheus会周期性的对告警规则进行计算,若是知足告警触发条件就会向alertmanager发送告警信号,alertmanager收到告警信号以后,发送给相应的接受者(已经在配置文件定义好的)web
docker pull prom/prometheus #拉取prometheus镜像 docker pull prom/node-exporter #拉取node-exporter镜像 docker pull grafana/grafana #拉取grafana镜像
cat prometheus.yml # my global config global: scrape_interval: 15s # Set the scrape interval to every 15 seconds. Default is every 1 minute. evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute. # scrape_timeout is set to the global default (10s). # Alertmanager configuration alerting: alertmanagers: - static_configs: - targets: # - alertmanager:9093 # Load rules once and periodically evaluate them according to the global 'evaluation_interval'. rule_files: # - "first_rules.yml" # - "second_rules.yml" # A scrape configuration containing exactly one endpoint to scrape: # Here it's Prometheus itself. scrape_configs: # The job name is added as a label `job=<job_name>` to any timeseries scraped from this config. - job_name: 'prometheus' # metrics_path defaults to '/metrics' # scheme defaults to 'http'. static_configs: - targets: ['localhost:9090'] #prometheus运行端口 - job_name: 'linux' static_configs: - targets: ['172.21.71.50:9100'] #node节点端口 labels: instance: node
运行容器docker
$ docker run -d \ > --net="host" \ > --pid="host" \ > -v "/:/host:ro,rslave" \ > prom/node-exporter \ > --path.rootfs /host #运行node-exporter,这个比较特殊,在不是特别了解以前,先这样操做着 $ sudo docker run -d \ -p 9090:9090 \ -v /usr/local/prometheus/file/prometheus.yml:/usr/local/prometheus/file/prometheus.yml \ prom/prometheus \ --config.file=/usr/local/prometheus/file/prometheus.yml --web.enable-lifecycle #运行prometheus容器 $ git clone https://github.com/grafana/piechart-panel.git #饼图插件 $ docker run -d --name=grafana -v /usr/local/prometheus/grafana/plugin/:/var/lib/grafana/plugins/ -p 3333:3000 grafana/grafana #运行grafana,grafana的默认帐号密码是admin/admin
下载镜像json
$ docker pull google/cadvisor
运行vim
cadvisor咱们须要运行在docker宿主机上(与node_exporter相似),而后经过HTTP方式供Prometheus获取数据
$ docker run \ --volume=/:/roos:ro \ --volume=/var/run:/var/run:rw \ --volume=/sys:/sys:ro \ --volume=/var/lib/docker/:/var/lib/docker:ro \ --publish=9101:8080 \ --detach=true \ --name=cadvisor \ google/cadvisor:latest #这个cadvisor也是比较特殊,在你不是很熟悉它以前,按照个人操做作下去
注意:这里是把容器8080端口映射到主机9101,cadvisor有web界面地址:http://IP:9101
接入Grafana展现容器监控数据
这里咱们去Grafana官网,找别人作好的Dashboard模板,地址:https://grafana.com/dashboards/4170,下载模板json文件而后导入本地Grafana。关于导入Dashbozrd模板参考https://www.cnblogs.com/tchua/p/11115146.html
接下来进行的操做是修改下该模板文件的一个变量,由于它原本是为cadvisor定作的;
修改为我这个样子便可(在你对它不是很了解以前,按照文档的作下去,再变通)
若是一切顺畅,那么就会出现下图这样
如今这个程度还不行,由于版本的问题,由于该模板不是基于最新版Node_exporter开发,有些值并不适用,咱们须要修改对应的值,具体咱们也能够经过Prometheus查询界面肯定value值。
$ docker pull prom/alertmanager(linuxtips/alertmanager_alpine) #拉取alertmanager镜像
$ cat /usr/local/prometheus/alertmanager/alertmanager.yml global: resolve_timeout: 5m route: group_by: ['alertname'] group_wait: 10s group_interval: 10s repeat_interval: 1h receivers: ying.qiao receivers: - name: 'ying.qiao' webhook_configs: - url: 'https://hook.bearychat.com/=bwD9B/prometheus/2e31f72d81f31d322db49e85d22e1cee' inhibit_rules: - source_match: severity: 'critical' target_match: severity: 'warning' equal: ['alertname', 'dev', 'instance']
prometheus添加告警规则
$ sudo mkdir /usr/local/prometheus/rules $ sudo vim /usr/local/prometheus/rules/node_alerts.yml groups: - name: node_alerts rules: - alert: InstanceDown ## alert名称 expr: up{job='node'} == 0 ## 报警条件 for: 1m ## 超过1分钟,prometheus会把报警信息发送至alertmanger labels: severity: "warning" annotations: summary: Host {{ $labels.instance }} of {{ $labels.job }} is Down!
这里有一个很坑的问题,花括号里的job后面那个node,必需要和在prometheus.yml里定义的job名称严格一致
$ sudo vim /usr/local/prometheus/file/prometheus.yml rule_files: - /usr/local/prometheus/rules/node_alerts.yml #指定对应的规则文件 alerting: alertmanagers: - static_configs: - targets: - 172.21.71.50:9093 ## alertmanager服务地址 ## 添加prometheus对alertmanager服务的监控 #以上配置文件,注意下添加的位置 - job_name: 'alertmanager' static_configs: - targets: ['172.21.71.50:9093']
重启prometheus,并启动alertmanager
$ docker rm -f c1473106d0f0 $ docker run -d -p 9090:9090\ -v /usr/local/prometheus/file/prometheus.yml:/usr/local/prometheus/file/prometheus.yml\ -v "/usr/local/prometheus/file/alertmanager_rules.yml:/usr/local/prometheus/file/alertmanager_rules.yml:ro"\ prom/prometheus\ --config.file=/usr/local/prometheus/file/prometheus.yml\ --web.enable-lifecycle $ docker run -d -p 9093:9093 \ -v /usr/local/prometheus/alertmanager/:/usr/local/prometheus/alertmanager/ \ -v /var/lib/alertmanager:/alertmanager \ --name alertmanager prom/alertmanager \ --config.file="/usr/local/prometheus/alertmanager/alertmanager.yml" \ --storage.path=/alertmanager