采用prometheus 监控mysql

时间 2019-11-10

标签采用 prometheus 监控 mysql 栏目 MySQL 繁體版

原文原文链接

1. prometheus 是什么

开源的系统监控和报警工具，监控项目的流量、内存量、负载量等实时数据。mysql

它经过直接或短时jobs中介收集监控数据，在本地存储全部收集到的数据，而且经过定义好的rules产生新的时间序列数据，或发送警报。经过其它api能够将采集到的数据可视化。git

2. 怎么实现监控

简单的说，主要就是如下几个步骤：github

配置待监控的各个服务器，在每一个服务器本地收集并存储数据。若是采用第三方系统收集数据metrics，且数据不是prometheus时序对，则须要定义exporter将那些metrics export为prometheus时序对。如今有不少已经定义好的官方或第三方的exporters。有些软件抓取的数据直接就是prometheus格式的。
找一台服务器部署prometheus服务。而后修改配置文件，设定监控对象的ip地址和端口等。启动prometheus，以后prometheus就会用轮询的方式去各个服务器pull数据。
分析数据。prometheus提供了强大的查询库，能够定制收集到的数据。prometheus提供了browser的结果呈现，也能够配置使用第三方的数据可视化平台。

部署监控mysql

以一个例子来讲明部署流程。sql

安装和运行prometheus

有不少种安装方法，这里我使用预编译的二进制文件。到这里下载。以后解压，terminal中输入./prometheus，回车启动prometheus服务。docker

监控prometheus本身

打开解压后的prometheus目录，发现其中有个prometheus.yml文件。prometheus.yml是设置监控对象等的配置文件。打开prometheus.yml，默认的prometheus.yml的初始配置以下：数据库

global:
  scrape_interval:     15s # By default, scrape targets every 15 seconds.

  # Attach these labels to any time series or alerts when communicating with
  # external systems (federation, remote storage, Alertmanager).
  external_labels:
    monitor: 'codelab-monitor'

# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
  # The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
  - job_name: 'prometheus'

    # Override the global default and scrape targets from this job every 5 seconds.
    scrape_interval: 5s

    static_configs:
      - targets: ['localhost:9090']

这里设定监控目标为localhost:9090，即prometheus本身。浏览器打开localhost:9090，就能访问prometheus提供的可视化界面。localhost:9090/metrics提供了全部的监控数据信息。其中有一条prometheus_target_interval_length_seconds，表示真实的数据获取间隔，在prometheus首页输入它并回车，就能够看到一系列的数据，它们有不一样quantile，从0.01至0.99不等。quantitle表示有多少比例的数据在这个值之内。若是只关注0.99的数据，能够输入prometheus_target_interval_length_seconds{quantile="0.99"}查询。查询还支持函数，好比count(prometheus_target_interval_length_seconds)可以查询数量。
若是想要查询结果直接包含数量那个数据，建立一个prometheus.rules文件，在文件中定义这条规则，而后在prometheus.yml中配置rules文件。express

//prometheus.rules
test:prometheus_target_interval_length_seconds:count = count(prometheus_target_interval_length_seconds)

//prometheus.yml
# my global config
global:
  scrape_interval:     15s # By default, scrape targets every 15 seconds.
  evaluation_interval: 15s # By default, scrape targets every 15 seconds.
  # scrape_timeout is set to the global default (10s).

  # Attach these labels to any time series or alerts when communicating with
  # external systems (federation, remote storage, Alertmanager).
  external_labels:
      monitor: 'codelab-monitor'

# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
  - "prometheus.rules"
  # - "second.rules"

# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
  # The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
  - job_name: 'prometheus'

    # Override the global default and scrape targets from this job every 5 seconds.
    scrape_interval: 5s

    # metrics_path defaults to '/metrics'
    # scheme defaults to 'http'.

    static_configs:
      - targets: ['localhost:9090']

以后，就能够直接输入test:prometheus_target_interval_length_seconds:count查询数据了。这里rule比较简单，若是有一些经常使用的但比较复杂的数据，均可以用rule的方法来定义获取。api

监控mysql

修改prometheus.yml，在文件最后添加：浏览器

- job_name: 'mysql'

    # Override the global default and scrape targets from this job every 5 seconds.
    scrape_interval: 5s

    # metrics_path defaults to '/metrics'
    # scheme defaults to 'http'.

    static_configs:
      - targets: ['localhost:9104']
        labels:
          instance: db1

重启prometheus服务:服务器

$ ./prometheus -config.file=prometheus.yml

再打开localhost:9090，查看Status -> Targets页面下，就能够看到配置的两个target：一个是prometheus自己，State为UP，另外一个是mysql，State为DOWN，由于咱们尚未配置监控mysql的服务。

安装并运行mysql exporter

在在这里下载并解压mysql exporter，或者直接使用docker：

$ docker pull prom/mysqld-exporter

mysqld_exporter须要链接到mysql，须要mysql的权限，须要先为他建立用户并赋予所需的权限：

CREATE USER 'mysqlexporter'@'localhost' IDENTIFIED BY 'msyqlexporter';
GRANT PROCESS, REPLICATION CLIENT, SELECT ON *.* TO 'exporter'@'localhost'
  WITH MAX_USER_CONNECTIONS 3;

而后在docker中运行exporter，其中DATA_SOURCE_NAME是环境变量，用于链接数据库。

$ docker run -d \          
-p 9104:9104 \        
-e DATA_SOURCE_NAME="mysqlexporter:mysqlexporter@(localhost:3306)/data_store" prom/mysqld-exporter

此时再刷下localhost:9090/targets，就能够看到mysql的state转为UP，即已经成功的监测了mysql。

总结

核心的几个点：

数据收集
1. 经过中介网关支持短期序列数据收集
2. 经过http pull的形式采集时间序列
3. 能够经过自定义的rules产生新的时间数据系列（即定义一个rule，这个rule可能以已有的监控数据为输入，计算以后获得加工后的监控数据。加入该监控规则后，再监控时就能直接拿到这个加工后的数据了），例如官网的这个例子。
4. 监控目标：服务发现/静态配置。基于服务收集数据，而不是基于服务器收集数据。
数据存储
1. 不依赖分布式存储，单个服务器节点工做
2. 多维度数据模型（键值对肯定的时间序列模型），解决了分布式存储的问题。就是说你的项目是分布在多个容器（例如每一个服务器有一个容器）中，要得到整个项目的数据，须要监控这全部的容器。能够利用cAdvisor等从每一个容器中拿数据，这样获得的数据是分散的，而后采用多维度数据模型配合查询语法就能够查到整个项目的流量数据。
数据查询
1. 灵活的查询语言来利用上述维度数据模型
数据展现
1. 各类展现面板：expression browser（无需配置）、Grafana等

局限和适用

局限：

单机缺点。由于它是以单个服务器节点工做为基础的，所以每一个节点要存储监控数据，那么每一个节点的监控数据量就会受限于存储空间。
内存占用量大（能够配置改善）。由于集成了leveldb（高效插入数据的数据库），在ssd盘下io占用高。

适用于监控全部时间序列的项目。