Consul支持众多监控工具进行对自身监控。咱们这里使用Prometheus进行监控。node
有一个consul server集群及agent。集群搭建及配置请参考Consul安装备份升级python
须要在配置文件中指定telemetry选项。以下所示shell
~]# cat /usr/local/consul/consul.d/consul.json { "datacenter": "dc1", "client_addr": "0.0.0.0", "bind_addr": "{{ GetInterfaceIP \"eth0\" }}", "data_dir": "/usr/local/consul/data", "retry_interval": "20s", "retry_join": ["10.111.67.1","10.111.67.2","10.111.67.3","10.111.67.4","10.111.67.5"], "enable_local_script_checks": true, "log_file": "/usr/local/consul/logs/", "log_level": "debug", "enable_debug": true, "pid_file": "/var/run/consul.pid", "performance": { "raft_multiplier": 1 }, "telemetry": { "prometheus_retention_time": "120s", "disable_hostname": true } }
启动成功后,使用以下命令测试json
~]# curl 127.0.0.1:8500/v1/agent/metrics?format=prometheus # HELP consul_fsm_register consul_fsm_register # TYPE consul_fsm_register summary consul_fsm_register{quantile="0.5"} NaN consul_fsm_register{quantile="0.9"} NaN consul_fsm_register{quantile="0.99"} NaN consul_fsm_register_sum 3.396029010415077 consul_fsm_register_count 8 # HELP consul_http_GET_v1_agent_metrics consul_http_GET_v1_agent_metrics # TYPE consul_http_GET_v1_agent_metrics summary consul_http_GET_v1_agent_metrics{quantile="0.5"} 0.5403839945793152 consul_http_GET_v1_agent_metrics{quantile="0.9"} 0.5403839945793152 consul_http_GET_v1_agent_metrics{quantile="0.99"} 0.5403839945793152 consul_http_GET_v1_agent_metrics_sum 366820.44427236915 consul_http_GET_v1_agent_metrics_count 349523 # HELP consul_http_GET_v1_catalog_service__ consul_http_GET_v1_catalog_service__ # TYPE consul_http_GET_v1_catalog_service__ summary consul_http_GET_v1_catalog_service__{quantile="0.5"} 31258.423828125 consul_http_GET_v1_catalog_service__{quantile="0.9"} 306137.71875 consul_http_GET_v1_catalog_service__{quantile="0.99"} 306137.71875 consul_http_GET_v1_catalog_service___sum 4.0220439955034314e+11 consul_http_GET_v1_catalog_service___count 2.388023e+06 …………………………
server监控咱们采用Prometheus基于文件的自动发现(file_sd_configs
),也能够使用静态配置(static_config
)。api
由于咱们要作Consul的报警,报警须要有主机名,因此咱们使用基于文件的自动发现(file_sd_configs
),对每台主机打上consul_node_name
标签。而静态配置(static_config
)则不能对每一台主机单独打标签,只能对总体的targets列表打标签。浏览器
配置文件以下,此配置文件是k8s的配置文件bash
~]# cat prometheus-configmap.yaml apiVersion: v1 kind: ConfigMap metadata: name: prometheus-config-consul namespace: prometheus labels: app: prometheus-consul environment: prod release: release data: prometheus.yml: | global: external_labels: region: cn-hangzhou monitor: consul replica: A scrape_configs: - job_name: prometheus static_configs: - targets: - localhost:9090 - job_name: consul-server # 采集频率 scrape_interval: 60s # 采集超时 scrape_timeout: 10s # 采集对象的path路径 metrics_path: "/v1/agent/metrics" scheme: http params: format: ['prometheus'] file_sd_configs: - files: - /etc/config/consul-server.json refresh_interval: 1m consul-server.json: | [ { "targets": [ "10.111.67.1:8500" ], "labels": { "consul_node_name": "Consul-Server-1" } }, { "targets": [ "10.111.67.2:8500" ], "labels": { "consul_node_name": "Consul-Server-2" } }, { "targets": [ "10.111.67.3:8500" ], "labels": { "consul_node_name": "Consul-Server-3" } }, { "targets": [ "10.111.67.4:8500" ], "labels": { "consul_node_name": "Consul-Server-4" } }, { "targets": [ "10.111.67.5:8500" ], "labels": { "consul_node_name": "Consul-Server-5" } } ]
至此,Prometheus就能够采集的Consul Server的数据了,能够使用Prometheus自带的UI进行查询。markdown
对于Consul client监控,由于Consul client数量太多,成百上千台。所以若是使用基于文件的发现(file_sd_configs
)给每一台主机打标签,维护这个文件工做量太大(有主机的新增和删除)。因此咱们选用基于Consul的自动发现(consul_sd_config
)`来实现client的监控。app
要想让Prometheus或者别的服务发现,那这个服务必须得注册到Consul中。所以咱们使用脚本生成一个简单的服务注册curl
~]# cat create-consul-registration.sh #!/bin/bash ADDR=`ip addr show|awk -F '[ /]+' '/eth[0-9]|em[0-9]/ && /inet/ {print $3}'` CONSUL_CONF_DIR='/usr/local/consul/consul.d' CONSUL_REDISTER_FILE="$CONSUL_CONF_DIR/consul-members-registration.json" if [[ -n "$ADDR" && -d $CONSUL_CONF_DIR ]];then cat > ${CONSUL_REDISTER_FILE} <<-EOF { "service": { "id": "consul-${ADDR}", "name": "consul-members", "tags": [ "prometheus", "client", "consul-client" ], "address": "${ADDR}", "port": 8500, "check": { "http": "http://127.0.0.1:8500", "interval": "60s" } } } EOF else echo "ip address is empty or the $CONSUL_CONF_DIR does not exist" fi
执行这个脚本会在/usr/local/consul/consul.d/
下建立服务注册的配置文件consul-members-registration.json
~]# cat /usr/local/consul/consul.d/consul-members-registration.json { "service": { "id": "consul-10.111.74.8", "name": "consul-members", "tags": [ "prometheus", "client", "consul-client" ], "address": "10.111.74.8", "port": 8500, "check": { "http": "http://127.0.0.1:8500", "interval": "60s" } } }
以后执行consul reload
加载配置
~]# consul reload
此时,这个服务就已经注册到Consul中了,service名称为consul-members
,service ID为consul-10.111.74.86
,咱们能够使用curl命令或者浏览器来验证。
~]# curl -s 127.0.0.1:8500/v1/agent/services|python -m json.tool { "consul-10.111.74.8": { "Address": "10.111.74.8", "EnableTagOverride": false, "ID": "consul-10.111.74.8", "Meta": {}, "Port": 8500, "Service": "consul-members", "Tags": [ "prometheus", "client", "consul-client" ], "Weights": { "Passing": 1, "Warning": 1 } } }
配置以下:
~]# cat prometheus-configmap.yaml apiVersion: v1 kind: ConfigMap metadata: name: prometheus-config-consul namespace: prometheus labels: app: prometheus-consul environment: prod release: release data: prometheus.yml: | global: external_labels: region: cn-hangzhou monitor: consul replica: A scrape_configs: - job_name: prometheus static_configs: - targets: - localhost:9090 - job_name: consul-client # 采集频率 scrape_interval: 60s # 采集超时 scrape_timeout: 10s # 采集对象的path路径 metrics_path: "/v1/agent/metrics" scheme: http params: format: ['prometheus'] consul_sd_configs: - server: "10.111.67.1:8500" services: - consul-members relabel_configs: - action: replace source_labels: - __meta_consul_dc target_label: consul_dc - action: replace source_labels: - __meta_consul_node target_label: consul_node_name - action: replace source_labels: - __meta_consul_service target_label: consul_service - action: replace source_labels: - __meta_consul_service_id target_label: consul_service_id
由于咱们要作Consul的报警,报警须要有主机名、Service名称、Service ID、DC等信息,因此咱们须要对标签进行重写。可重写的标签有:
__meta_consul_address
: the address of the target__meta_consul_dc
: the datacenter name for the target__meta_consul_tagged_address_<key>
: each node tagged address key value of the target__meta_consul_metadata_<key>
: each node metadata key value of the target__meta_consul_node
: the node name defined for the target__meta_consul_service_address
: the service address of the target__meta_consul_service_id
: the service ID of the target__meta_consul_service_metadata_<key>
: each service metadata key value of the target__meta_consul_service_port
: the service port of the target__meta_consul_service
: the name of the service the target belongs to__meta_consul_tags
: the list of tags of the target joined by the tag separator