在工做中OpenStack集群的vm须要解决基础性能指标的监控,若是每台的启动再去手动添加监控node_exporter,再写prometheus.yml的话,对于吾等懒程序员简直就是噩梦,由此开始设计基于Prometheus+Consul的监控方案。node
1. 经过将node_exporter打包进Image实现强制自动部署 2. 经过开发一个小程序自动注册node_exporter到consul,同时小程序也与node_exporter同样打包进Image 3. 配置Prometheus经过consul来发现node_exporter节点
系统 | 主机名 | IP |
---|---|---|
Centos-7.7 | compute-7-1 | 172.16.100.71 |
Centos-7.7 | compute-7-2 | 172.16.100.72 |
Centos-7.7 | compute-7-3 | 172.16.100.73 |
Consul v1.7.2
全部节点分别安装consullinux
$ wget https://releases.hashicorp.com/consul/1.7.2/consul_1.7.2_linux_amd64.zip $ unzip consul_1.7.2_linux_amd64.zip $ mv consul_1.7.2/consul /usr/bin/ $ mkdir /data/consul $ mkdir /etc/consul.d $ useradd consul
全部节点分别修改配置文件git
$ vim /etc/consul.d/consul_config.json { "bootstrap_expect": 1, "datacenter": "sibat_consul", "data_dir": "/data/consul", "node_name": "compute-7-1", "server": true, "client_addr": "0.0.0.0", "ui": true, "bind_addr": "172.16.100.71" }
$ vim /etc/consul.d/consul_config.json { "bootstrap_expect": 1, "datacenter": "sibat_consul", "data_dir": "/data/consul", "node_name": "compute-7-2", "server": true, "client_addr": "0.0.0.0", "ui": true, "bind_addr": "172.16.100.72" }
$ vim /etc/consul.d/consul_config.json { "bootstrap_expect": 1, "datacenter": "sibat_consul", "data_dir": "/data/consul", "node_name": "compute-7-3", "server": true, "client_addr": "0.0.0.0", "ui": true, "bind_addr": "172.16.100.73" } } }
全部节点分别配置systemd,启动consul并设置开机自启动程序员
$ vim /usr/lib/systemd/system/consul.service [Unit] Description=consul: the monitoring system Documentation=http://prometheus.io/docs/ [Service] User=consul Group=consul ExecStart=/usr/bin/consul agent -config-file /etc/consul.d/consul_config.json KillMode=process Restart=on-failure LimitNOFILE=65536 [Install] WantedBy=multi-user.target
$ systemctl daemon-reload && systemctl start consul && systemctl enable consul $ systemctl daemon-reload && systemctl start consul && systemctl enable consul $ systemctl daemon-reload && systemctl start consul && systemctl enable consul
初始化master tokengithub
$ curl \ --request PUT \ http://172.16.100.71:8500/v1/acl/bootstrap `{"ID":"8dc1eb67-1f5f-4e10-ad9d-5e58b047647c"}`
获取encrypt算法
$ consul keygen gEjZMbDxnA5UDS5DJRI3Nn5KvOwdVa46jneHK0gFDa8=
compute-7-1:shell
{ "bootstrap_expect": 1, "datacenter": "sibat_consul", "primary_datacenter":"sibat_consul", "data_dir": "/data/consul", "start_join":[ "172.16.100.72", "172.16.100.73" ], "retry_join":[ "172.16.100.72", "172.16.100.73" ], "connect":{ "enabled": true }, "server": true, "client_addr": "0.0.0.0", "ui": true, "node_name": "compute-7-1", "bind_addr": "172.16.100.71", "advertise_addr": "172.16.100.71", "enable_script_checks": false, "enable_local_script_checks": true, "log_file": "/var/log", "log_rotate_bytes": 300000000, "log_rotate_duration": "360h", "log_level": "info", "encrypt": "gEjZMbDxnA5UDS5DJRI3Nn5KvOwdVa46jneHK0gFDa8=", "acl": { "enabled": true, "default_policy": "deny", "enable_token_persistence": true, "tokens": { "master": "8dc1eb67-1f5f-4e10-ad9d-5e58b047647c" } } }
compute-7-2json
{ "datacenter": "sibat_consul", "primary_datacenter":"sibat_consul", "data_dir": "/data/consul", "connect":{ "enabled": true }, "server": true, "client_addr": "0.0.0.0", "ui": true, "node_name": "compute-7-2", "bind_addr": "172.16.100.72", "advertise_addr": "172.16.100.72", "enable_script_checks": false, "enable_local_script_checks": true, "log_file": "/var/log", "log_rotate_bytes": 300000000, "log_rotate_duration": "360h", "log_level": "info", "acl_datacenter": "sibat_consul", "encrypt": "gEjZMbDxnA5UDS5DJRI3Nn5KvOwdVa46jneHK0gFDa8=", "acl": { "enabled": true, "default_policy": "deny", "enable_token_persistence": true, "tokens": { "master": "8dc1eb67-1f5f-4e10-ad9d-5e58b047647c" } } }
compute-7-3bootstrap
{ "datacenter": "sibat_consul", "primary_datacenter":"sibat_consul", "data_dir": "/data/consul", "connect":{ "enabled": true }, "server": true, "client_addr": "0.0.0.0", "ui": true, "node_name": "compute-7-3", "bind_addr": "172.16.100.73", "advertise_addr": "172.16.100.73", "enable_script_checks": false, "enable_local_script_checks": true, "log_file": "/var/log", "log_rotate_bytes": 300000000, "log_rotate_duration": "360h", "log_level": "info", "acl_datacenter": "sibat_consul", "encrypt": "gEjZMbDxnA5UDS5DJRI3Nn5KvOwdVa46jneHK0gFDa8=", "acl": { "enabled": true, "default_policy": "deny", "enable_token_persistence": true, "tokens": { "master": "8dc1eb67-1f5f-4e10-ad9d-5e58b047647c" } } }
在三个节点中启动
先在slave节点启动小程序
$ systemctl restart consul $ systemctl restart consul
以后再master启动
$ systemctl restart consul
启动后咱们会查看到服务器日志中出现与权限有关的错误,根据官方文档的说法是由于未配置agent的token致使的,所以还需初始化slave token:
$ curl --request PUT --header "X-Consul-Token: cd76a0f7-5535-40cc-8696-073462acc6c7" --data '{ "Name": "Agent Token", "Type": "client", "Rules": "node \"\" { policy = \"write\" } service \"\" { policy = \"read\" }" }' http://172.16.100.71:8500/v1/acl/create
compute-7-1:
{ "bootstrap_expect": 1, "datacenter": "sibat_consul", "primary_datacenter":"sibat_consul", "data_dir": "/data/consul", "start_join":[ "172.16.100.72", "172.16.100.73" ], "retry_join":[ "172.16.100.72", "172.16.100.73" ], "connect":{ "enabled": true }, "server": true, "client_addr": "0.0.0.0", "ui": true, "node_name": "compute-7-1", "bind_addr": "172.16.100.71", "advertise_addr": "172.16.100.71", "enable_script_checks": false, "enable_local_script_checks": true, "log_file": "/var/log", "log_rotate_bytes": 300000000, "log_rotate_duration": "360h", "log_level": "info", "encrypt": "gEjZMbDxnA5UDS5DJRI3Nn5KvOwdVa46jneHK0gFDa8=", "acl": { "enabled": true, "default_policy": "deny", "enable_token_persistence": true, "tokens": { "master": "8dc1eb67-1f5f-4e10-ad9d-5e58b047647c", "agent": "883efc94-0c59-c46f-67cf-4644ac4adad2" } } }
compute-7-2
{ "datacenter": "sibat_consul", "primary_datacenter":"sibat_consul", "data_dir": "/data/consul", "connect":{ "enabled": true }, "server": true, "client_addr": "0.0.0.0", "ui": true, "node_name": "compute-7-2", "bind_addr": "172.16.100.72", "advertise_addr": "172.16.100.72", "enable_script_checks": false, "enable_local_script_checks": true, "log_file": "/var/log", "log_rotate_bytes": 300000000, "log_rotate_duration": "360h", "log_level": "info", "acl_datacenter": "sibat_consul", "encrypt": "gEjZMbDxnA5UDS5DJRI3Nn5KvOwdVa46jneHK0gFDa8=", "acl": { "enabled": true, "default_policy": "deny", "enable_token_persistence": true, "tokens": { "master": "8dc1eb67-1f5f-4e10-ad9d-5e58b047647c", "agent": "883efc94-0c59-c46f-67cf-4644ac4adad2" } } }
compute-7-3
{ "datacenter": "sibat_consul", "primary_datacenter":"sibat_consul", "data_dir": "/data/consul", "connect":{ "enabled": true }, "server": true, "client_addr": "0.0.0.0", "ui": true, "node_name": "compute-7-3", "bind_addr": "172.16.100.73", "advertise_addr": "172.16.100.73", "enable_script_checks": false, "enable_local_script_checks": true, "log_file": "/var/log", "log_rotate_bytes": 300000000, "log_rotate_duration": "360h", "log_level": "info", "acl_datacenter": "sibat_consul", "encrypt": "gEjZMbDxnA5UDS5DJRI3Nn5KvOwdVa46jneHK0gFDa8=", "acl": { "enabled": true, "default_policy": "deny", "enable_token_persistence": true, "tokens": { "master": "8dc1eb67-1f5f-4e10-ad9d-5e58b047647c", "agent": "883efc94-0c59-c46f-67cf-4644ac4adad2" } } }
在三个节点中启动
先在slave节点启动
$ systemctl restart consul $ systemctl restart consul
以后再master启动
$ systemctl restart consul
待集群稳定后便可访问UI,http://172.16.100.71:8500
$ sudo vim /etc/prometheus/prometheus.yml ... - job_name: 'OpenStack-vms' consul_sd_configs: - server: "172.16.100.71:8500" token: '8dc1eb67-1f5f-4e10-ad9d-5e58b047647c' services: [] - server: "172.16.100.72:8500" token: '8dc1eb67-1f5f-4e10-ad9d-5e58b047647c' services: [] - server: "172.16.100.73:8500" token: '8dc1eb67-1f5f-4e10-ad9d-5e58b047647c' services: [] relabel_configs: - source_labels: [__meta_consul_tags] regex: ".*OpenStack-vms.*" replacement: OpenStack-vms action: keep target_label: env - regex: __meta_consul_service_metadata_(.+) action: labelmap ...
$ sudo systemctl restart prometheus
启动后,在prometheus UI就能够找到刚才配置的job_name了:
问题:关于自动注册,原生的组件中都没有较美好的方案。我刚开始使用curl的方式经过shell写入rc.local的方式自动注册,可是发现有时仍是会出现没有注册的状况,再加上centos7的并发启动的机制,使得这个过程并不友好。同时还发现consul并非强一致性的注册中心,有时会出现相同的serviceid同时被注册到不一样的节点的状况:
因此使用go语言开发了一个小程序自动注册node_exporter,并使用systemd设置开机自启动来达到自动注册的效果,并经过一套算法来避免重复注册以及实现均衡注册。
$ wget https://github.com/prometheus/node_exporter/releases/download/v1.0.0/node_exporter-1.0.0.linux-amd64.tar.gz $ tar -zxvf node_exporter-1.0.0.linux-amd64.tar.gz -C /usr/local/ $ mv /usr/local/node_exporter-1.0.0.linux-amd64.tar.gz /usr/local/node_exporter
$ vim /usr/lib/systemd/system/node_exporter.service [Unit] Description=node_exporter: the monitoring system Documentation=http://prometheus.io/docs/ [Service] User=nobody ExecStart=/usr/local/node_exporter/node_exporter Restart=always StartLimitInterval=0 RestartSec=10 [Install] WantedBy=multi-user.target
$ systemctl daemon-reload && systemctl start node_exporter && systemctl enable node_exporter
安装consulR小程序
$ wget https://github.com/FrankenFuncc/consul-registy-service/releases/download/202006161758/consulR.zip $ unzip consulR.zip $ cd consulR $ chmod +x consulR $ mv consulR /usr/local/ $ mkdir /data/consul/logs -p
配置文件
$ vim /etc/consul/consulR.yaml System: ServiceName: consul-registy-service ListenAddress: 0.0.0.0 Port: 9984 #经过此IP与端口来检索出口网卡IP地址 FindAddress: 8.8.8.8:80 Logs: LogFilePath: /data/consul/consul.log LogLevel: info Consul: Address: 172.16.100.71:8500,172.16.100.72:8500,172.16.100.73:8500 #Consul Master Token Token: 8dc1eb67-1f5f-4e10-ad9d-5e58b047647c CheckTimeout: 5s CheckInterval: 5s #关于虚机删除或宕机可在此配置,consul为保持立即状态或自动清理 CheckDeregisterCriticalServiceAfter: true CheckDeregisterCriticalServiceAfterTime: 5s Service: Tag: node-exporter #Address空则默认经过FindAddress配置来检索出口网卡IP地址 Address: Port: 9100
$ chown -R nobody.nobody /etc/consul/consulR.yaml
使用systemd管理
$ vim /usr/lib/systemd/system/consulR.service [Unit] Description=Consul After=network-online.target [Service] User=nobody ExecStart=/usr/local/consulR --confpath=/etc/consul/consulR.yaml Restart=on-failure RestartSec=1 [Install] WantedBy=multi-user.target
设置开机自启动
$ systemctl daemon-reload && systemctl start consulR && systemctl enable consulR
VM关机
$ poweroff
制做镜像
$ qemu-img convert -c disk -O qcow2 centos-fantasy.qcow2 $ openstack image create "CentOS7-Fantasy" --file centos-fantasy.qcow2 --disk-format qcow2 --container-format bare --public
建立镜像后,用这个镜像建立虚拟机,将会自动把9100注册到consul集群,以后就能被Prometheus自动发现了。
在Grafana导入8919模板这样就能够在instance看到自动发现后的监控主机详情了。。。很简单对吧?