安装请看http://www.javashuo.com/article/p-bfwlieic-do.html ,最好是对应的版本组件,不然可能会有差异。
html
(一)prometheus + grafana + alertmanager 配置主机监控node
(二)prometheus + grafana + alertmanager 配置Mysql监控mysql
(三)prometheus + grafana + alertmanager 配置Redis监控web
(四)prometheus + grafana + alertmanager 配置Kafka监控redis
(五)prometheus + grafana + alertmanager 配置ES监控sql
(三) prometheus + grafana + alertmanager 配置Redis监控json
一. redis_exporter安装vim
若是是直接搭建的redis服务器,能够参考上一章的内容。api
这里以云redis来安装与配置,固然直接搭建的服务器redis也可使用这种方法(登录到prometheus服务器,prometheus grafana alertmanager在同一台服务器上)。服务器
A. 下载redis_exporter安装包(下载地址: https://pan.baidu.com/s/1k-KcATTpxux9qiPFDt_6hQ ),而后解压到/data/monitor/下。
B. 而后在cd /data/monitor/redis_exporter/scripts 下,建立每一个redis的监控启动脚本,cat ba_10.8.100.140_6379_18019.sh,其它的相似。
nohup /data/monitor/redis_exporter/bin/redis_exporter -web.listen-address=':18019' -redis.addr=10.8.100.140:6379 >> /data/monitor/redis_exporter/log/18019_10.8.100.140_6379.log 2>&1 &
C. 而后cd /data/monitor/redis_exporter 下,sh start.sh,查看端口是否监听或者查看进行是否启动
二. prometheus配置
1. 将mysqld_exporter的配置增长到prometheus.yml文件中,vim /data/monitor/prometheus/conf/prometheus.yml
global:
# Server端抓取数据的时间间隔
scrape_interval: 1m
# 评估报警规则的时间间隔
evaluation_interval: 1m
# 数据抓取的超时时间
scrape_timeout: 20s
# 加全局标签
#external_labels:
# monitor: "hk"
# 链接alertmanager
alerting:
alertmanagers:
- static_configs:
- targets: ["localhost:9093"]
# 告警规则
rule_files:
- /data/monitor/prometheus/conf/rule/*.yml
# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
# 监控prometheus本机
- job_name: 'prometheus'
scrape_interval: 15s
static_configs:
- targets: ['10.8.53.218:9090']
# 监控指定主机
- job_name: 'node_resources'
scrape_interval: 1m
static_configs:
file_sd_configs:
- files:
- /data/monitor/prometheus/conf/node_conf/node_host_info.json
honor_labels: true
# mysql采集器
- job_name: 'mysql_global_status'
scrape_interval: 60s
static_configs:
file_sd_configs:
- files:
- /data/monitor/prometheus/conf/node_conf/node_mysql_info.json
# redis采集器
- job_name: 'redis_resources'
scrape_interval: 60s
static_configs:
file_sd_configs:
- files:
- /data/monitor/prometheus/conf/node_conf/node_redis_info.json
2. vim /data/monitor/prometheus/conf/node_conf/node_redis_info.json
[
{
"labels": {
"addr": "10.8.100.140:6379",
"alias": "ba_redis",
"desc": "ba_redis_10.8.100.140:6379"
},
"targets": [
"localhost:18019"
]
},
{
"labels": {
"addr": "10.8.194.27:6379",
"alias": "openapi_redis",
"desc": "openapi_redis_10.8.194.27:6379"
},
"targets": [
"localhost:18047"
]
}
]
2. 而后cd /data/monitor/prometheus下, sh reload.sh
注意:redis 3.0及如下版本会有一些图表会没有数据,咱们是直接对接ucloud的api用脚本生成的数据。
三. 配置grafana
1. 下载redis监控模板,下载地址: https://pan.baidu.com/s/151lqEyDFObEs7BDFiBr1bw
2. 如何导入请参考配置主机监控的文章中的2.配置grafana中的h至l步骤( http://www.javashuo.com/article/p-ybzkorax-mn.html )
4. 配置alertmanager
A. 在prometheus配置规则,cat /data/monitor/prometheus/conf/rule/redis.yml ,下面是文件内容,而后重启prometheus,cd /data/monitor/prometheus && sh reload.sh
groups:
- name: redis_alert
rules:
### 内存 ###
# 默认内存告警策略
- alert: redis内存95%
expr: ((floor(redis_memory_used_rss_bytes / redis_memory_max_bytes * 100) >= 95) or (floor(redis_mem_use_ratio) >= 95)) and ((redis_memory_max_bytes <= 1024 * 1024 * 1024 * 4) or (redis_mem_total_size <= 4))
for: 3m
labels:
severity: warning
annotations:
description: "[{{ $labels.alias }}],地址:[{{ $labels.addr }}],告警值为:[{{ $value }}%],告警初始时长为3分钟."
- alert: redis内存98%
expr: ((floor(redis_memory_used_rss_bytes / redis_memory_max_bytes * 100) >= 98) or (floor(redis_mem_use_ratio) >= 98)) and ((redis_memory_max_bytes > 1024 * 1024 * 1024 * 4) or (redis_mem_total_size > 4))
for: 3m
labels:
severity: warning
annotations:
description: "[{{ $labels.alias }}],地址:[{{ $labels.addr }}],告警值为:[{{ $value }}%],告警初始时长为3分钟."
B. 配置alertmanager, cat /data/prometheus/alertmanager/conf/alertmanager.yml ,若是是相同的接收人,能够直接在原来的资源后面增长,若是是不一样的接收人,就须要从新定义接收人模板,而后再定义资源规则并绑定到新的接收人模板
global:
resolve_timeout: 2m
smtp_auth_password: q5AYahvxi3WLDap3 #发送邮箱密码
smtp_auth_username: itliuqs@163.com #发送邮箱
smtp_from: itliuqs@163.com #发送邮箱
smtp_require_tls: false
smtp_smarthost: smtp.163.com:465 #发送服务器
wechat_api_url: https://qyapi.weixin.qq.com/cgi-bin/ #微信接口连接
inhibit_rules:
- equal:
- instance
source_match:
alertname: "主机CPU90%"
target_match:
alertname: "主机负载太高"
- equal:
- instance
source_match:
alertname: "mysql运行进程数5分钟增加数>150"
target_match:
alertname: "mysql慢查询5分钟100条"
- equal:
- instance
source_match:
severity: error
target_match:
severity: warning
- equal:
- instance
source_match:
severity: fatal
target_match:
severity: error
- equal:
- service_name
source_match:
severity: error
target_match:
severity: warning
receivers:
- email_configs: #定义test发送人模块
- html: '{{ template "email.default.html" . }}' #调用的模板
send_resolved: true
to: liuqs@126.com #将报警信息发给些邮箱,多人用|
name: test #发送人模板名
wechat_configs: #微信接收这些信息请看最下面的企业微信介绍
- agent_id: 1000002 #应用id
api_secret: hnyU1LTGnJUiBaCp47l3WVQLTEFF5RXyfNO751xlaHa #应用认证
corp_id: wwd397231fa801beaa #企业微信ID
send_resolved: true
to_user: LiuQingShan|liuqs #发送给企业微信通信人的Id 多我的就用|分开
- email_configs: #定义默认的发送人
- html: '{{ template "email.default.html" . }}'
send_resolved: true
to: liuqs@126.com
name: default_group
wechat_configs:
- agent_id: 1000002
api_secret: hnyU1LTGnJUiBaCp47l3WVQLTEFF5RXyfNO751xlaHa
corp_id: wwd397231fa801beaa
send_resolved: true
to_user: LiuQingShan
route: #定义资源报警规则
group_by:
- monitor
group_interval: 2m
group_wait: 30s
receiver: default_group
repeat_interval: 6h
routes:
- continue: true
match_re:
instance: 10.8.46.117:9100|10.8.80.126:9100|10.8.32.67:9100|10.8.9.35:9100|10.8.69.81:9100|localhost:15050|localhost:15221|localhost:15052|localhost:15053|localhost:15049|localhost:15051|localhost:15060|localhost:18019|localhost:18047 #定义使用的资源
receiver: test #使用test发送人模板
templates:
- /data/monitor/alertmanager/template/*.tmpl #调用报警内容模板的路径