环境:node
k8s 1.11集群版本,kubeadm部署linux
docker 17.3.2版本git
Centos 7系统github
阿里云服务器web
仓库下载prometheus operatordocker
$ git clone https://github.com/coreos/kube-prometheus.git $ cd kube-prometheus/manifests
进入到 manifests 目录下面,这个目录下面包含咱们全部的资源清单文件,咱们须要对其中的文件 prometheus-serviceMonitorKubelet.yaml 进行简单的修改,由于默认状况下,这个 ServiceMonitor 是关联的 kubelet 的10250端口去采集的节点数据,而咱们前面说过为了安全,这个 metrics 数据已经迁移到10255这个只读端口上面去了,咱们只须要将文件中的https-metrics
更改为http-metrics
便可,这个在 Prometheus-Operator 对节点端点同步的代码中有相关定义,感兴趣的能够点此查看完整代码:api
Subsets: []v1.EndpointSubset{ { Ports: []v1.EndpointPort{ { Name: "https-metrics", Port: 10250, }, { Name: "http-metrics", Port: 10255, }, { Name: "cadvisor", Port: 4194, }, }, }, },
须要注意将insecureSkipVerify参数配置为false,http才生效 insecureSkipVerify: false安全
endpoints: - bearerTokenFile: /var/run/secrets/kubernetes.io/serviceaccount/token honorLabels: true interval: 30s port: http-metrics scheme: http tlsConfig: insecureSkipVerify: false - bearerTokenFile: /var/run/secrets/kubernetes.io/serviceaccount/token honorLabels: true interval: 30s metricRelabelings: - action: drop regex: container_(network_tcp_usage_total|network_udp_usage_total|tasks_state|cpu_load_average_10s) sourceLabels: - __name__ path: /metrics/cadvisor port: http-metrics scheme: http tlsConfig: insecureSkipVerify: false
配置钉钉路由文件,并建立为secret对象,挂载到prometheus-prometheus,yaml文件中。这里须要将prometheus数据就行持久化存储,还须要定义一个storageClass或者pvc挂载进去。服务器
alertmanager-main.yamlapp
global: resolve_timeout: 5m route: group_by: ['alertname'] group_wait: 30s group_interval: 5m repeat_interval: 2h receiver: 'web.hook' receivers: - name: 'web.hook' webhook_configs: - url: 'http://prometheus-webhook-dingtalk.monitors.svc.cluster.local:8060/dingtalk/ops_dingding/send'
建立alertmanager 配置文件secret对象
kubectl -n monitoring create se^C [k8s@master ~]$ kubectl -n monitoring create secret generic altermanager-main --from-file=altermanager-main.yaml
建立storageClass对象,为prometheus提供持久化存储,这里使用阿里云提供的云盘或NAS服务,建立自定义storageClass对象
这里选择云盘建立的alicloud-disk-ssd 存储对象
为prometheus配置服务自动发现功能,将prometheus-additional.yaml文件建立为secret对象
prometheus-additional.yaml
- job_name: 'kubernetes-cadvisor' kubernetes_sd_configs: - role: node scheme: https tls_config: ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token relabel_configs: - action: labelmap regex: __meta_kubernetes_node_label_(.+) - target_label: __address__ replacement: kubernetes.default.svc:443 - source_labels: [__meta_kubernetes_node_name] regex: (.+) target_label: __metrics_path__ replacement: /api/v1/nodes/${1}/proxy/metrics/cadvisor - job_name: 'kubernetes-service-endpoints' kubernetes_sd_configs: - role: endpoints relabel_configs: - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scrape] action: keep regex: true - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scheme] action: replace target_label: __scheme__ regex: (https?) - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_path] action: replace target_label: __metrics_path__ regex: (.+) - source_labels: [__address__, __meta_kubernetes_service_annotation_prometheus_io_port] action: replace target_label: __address__ regex: ([^:]+)(?::\d+)?;(\d+) replacement: $1:$2 - action: labelmap regex: __meta_kubernetes_service_label_(.+) - source_labels: [__meta_kubernetes_namespace] action: replace target_label: kubernetes_namespace - source_labels: [__meta_kubernetes_service_name] action: replace target_label: kubernetes_name - job_name: 'kubernetes-services' kubernetes_sd_configs: - role: service metrics_path: /probe params: module: [http_2xx] relabel_configs: - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_probe] action: keep regex: true - source_labels: [__address__] target_label: __param_target - target_label: __address__ replacement: blackbox-exporter.example.com:9115 - source_labels: [__param_target] target_label: instance - action: labelmap regex: __meta_kubernetes_service_label_(.+) - source_labels: [__meta_kubernetes_namespace] target_label: kubernetes_namespace - source_labels: [__meta_kubernetes_service_name] target_label: kubernetes_name - job_name: 'kubernetes-ingresses' kubernetes_sd_configs: - role: ingress relabel_configs: - source_labels: [__meta_kubernetes_ingress_annotation_prometheus_io_probe] action: keep regex: true - source_labels: [__meta_kubernetes_ingress_scheme,__address__,__meta_kubernetes_ingress_path] regex: (.+);(.+);(.+) replacement: ${1}://${2}${3} target_label: __param_target - target_label: __address__ replacement: blackbox-exporter.example.com:9115 - source_labels: [__param_target] target_label: instance - action: labelmap regex: __meta_kubernetes_ingress_label_(.+) - source_labels: [__meta_kubernetes_namespace] target_label: kubernetes_namespace - source_labels: [__meta_kubernetes_ingress_name] target_label: kubernetes_name - job_name: 'kubernetes-pods' kubernetes_sd_configs: - role: pod relabel_configs: - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape] action: keep regex: true - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path] action: replace target_label: __metrics_path__ regex: (.+) - source_labels: [__address__, __meta_kubernetes_pod_annotation_prometheus_io_port] action: replace regex: ([^:]+)(?::\d+)?;(\d+) replacement: $1:$2 target_label: __address__ - action: labelmap regex: __meta_kubernetes_pod_label_(.+) - source_labels: [__meta_kubernetes_namespace] action: replace target_label: kubernetes_namespace - source_labels: [__meta_kubernetes_pod_name] action: replace target_label: kubernetes_pod_name
建立secret对象additional-config
kubectl -n monitoring create secret generics additional-config --from-file=prometheus-additional.yaml
将前面自定的配置文件和存储类写进prometheus中,实现prometheus监控的自定义化
apiVersion: monitoring.coreos.com/v1 kind: Prometheus metadata: labels: prometheus: k8s name: k8s namespace: monitoring spec: alerting: alertmanagers: - name: alertmanager-main namespace: monitoring port: web storage: #配置持久化存储 volumeClaimTemplate: spec: storageClassName: alicloud-disk-ssd #使用alicloud-disk-ssd存储类 resources: requests: storage: 50Gi baseImage: quay.io/prometheus/prometheus nodeSelector: kubernetes.io/os: linux podMonitorSelector: {} replicas: 2 secrets: #etcd 证书secret配置文件 - etcd-certs resources: requests: memory: 400Mi ruleSelector: matchLabels: prometheus: k8s role: alert-rules securityContext: fsGroup: 2000 runAsNonRoot: true runAsUser: 1000 additionalScrapeConfigs: #配置服务发现功能 name: additional-configs #secret 资源对象名称 key: prometheus-additional.yaml #secret 对象中的key serviceAccountName: prometheus-k8s serviceMonitorNamespaceSelector: {} serviceMonitorSelector: {} version: v2.11.0
etcd 使用的证书都对应在节点的 /etc/kubernetes/pki/etcd 这个路径下面,因此首先咱们将须要使用到的证书经过 secret 对象保存到集群中去:(在 etcd 运行的节点)
kubectl create secret generics etcd---from-file=/etc/kubernetes/pki/etcd/ca.pemcerts --from-file=/etc/kubernetes/pki/etcd/etcd-client.pem --from-file=/etc/kubernetes/pki/etcd/etcd-client-key.pem -n monitoring
如今 Prometheus 访问 etcd 集群的证书已经准备好了,接下来建立 ServiceMonitor 对象便可(prometheus-serviceMonitorEtcd.yaml)
apiVersion: monitoring.coreos.com/v1 kind: ServiceMonitor metadata: name: etcd-k8s namespace: monitoring labels: k8s-app: etcd-k8s spec: jobLabel: k8s-app endpoints: - port: port interval: 30s scheme: https tlsConfig: caFile: /etc/prometheus/secrets/etcd-certs/ca.pem certFile: /etc/prometheus/secrets/etcd-certs/etcd-client.pem keyFile: /etc/prometheus/secrets/etcd-certs/etcd-client-key.pem insecureSkipVerify: true selector: matchLabels: k8s-app: etcd namespaceSelector: matchNames: - kube-system
ServiceMonitor 建立完成了,可是如今尚未关联的对应的 Service 对象,因此须要咱们去手动建立一个 Service 对象(prometheus-etcdService.yaml):
apiVersion: v1 kind: Service metadata: name: etcd-k8s namespace: kube-system labels: k8s-app: etcd spec: type: ClusterIP clusterIP: None ports: - name: port port: 2379 protocol: TCP --- apiVersion: v1 kind: Endpoints metadata: name: etcd-k8s namespace: kube-system labels: k8s-app: etcd subsets: - addresses: - ip: 172.16.23.231 nodeName: etcd-master ports: - name: port port: 2379 protocol: TCP
咱们这里建立的 Service 没有采用前面经过 label 标签的形式去匹配 Pod 的作法,由于前面咱们说过不少时候咱们建立的 etcd 集群是独立于集群以外的,这种状况下面咱们就须要自定义一个 Endpoints,要注意 metadata 区域的内容要和 Service 保持一致,Service 的 clusterIP 设置为 None,对改知识点不太熟悉的,能够去查看咱们前面关于 Service 部分的讲解。
Endpoints 的 subsets 中填写 etcd 集群的地址便可,咱们这里是单节点的,填写一个便可,若是etcd配置文件中配置地址为127.0.0.1则有可能监控失败,须要修改成0.0.0.0
kube-scheduler、kube-controller-manager组件绑定地址都为127.0.0.1,须要进入配置文件进行修改成0.0.0.0才能访问端口,进行监控
apiVersion: v1 kind: Service metadata: name: kube-scheduler namespace: kube-system labels: k8s-app: kube-scheduler spec: selector: component: kube-scheduler clusterIP: None ports: - name: http-metrics targetPort: 10251 port: 10251 protocol: TCP --- apiVersion: v1 kind: Service metadata: name: kube-controller-manager namespace: kube-system labels: k8s-app: kube-controller-manager spec: selector: component: kube-controller-manager clusterIP: None ports: - name: http-metrics targetPort: 10252 port: 10252 ##kubelet-service.yaml 文件省略
apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRole metadata: name: prometheus-k8s rules: - apiGroups: - "" resources: - nodes - services - endpoints - pods - nodes/proxy verbs: - get - list - watch - apiGroups: - "" resources: - configmaps - nodes/metrics verbs: - get - nonResourceURLs: - /metrics verbs: - get
kubectl create -f .
到此prometheus-operator 生产环境中就已经部署完毕,grafana的图表配置和alertmanager的告警优化模板通知功能还需补全。
一、建立storageClass对象,为prometheus提供持久化存储写入prometheus-prometheus.yaml文件中 二、修改alermanager-secret.yaml secret对象中的配置数据,改为自定义的钉钉报警路由或者 邮箱帐号配置 三、建立etcd、scheduler、controller Service对象 四、配置服务告警规则prometheus-etcdRules.yaml 文件或在源文件中添加 五、建立prometheus服务自动发现secret配置文件,并写入prometheus-prometheus.yaml文件中 六、建立etcd证书secret 、serviceMonitorEtcd对象文件 七、修改promethus-clusterRule.yaml 权限 八、执行部署