Jaeger分布式跟踪工具初探

官方文档

Jaegertracing前端

Jaeger简介

Jaeger:开源的端到端分布式跟踪,监视复杂的分布式系统中的事务并进行故障排除。
下图对比了经常使用的开源全链路追踪方案,目前SkyWalking和Pinpoint使用比较多,Jaeger相比客户端支持语言比较多,特别是对C++的支持,因此此次选择测试下。
Jaeger分布式跟踪工具初探git

Jaeger解决的问题

  • 分布式事务监控
  • 性能和延迟优化
  • 根本缘由分析
  • 服务依赖性分析
  • 分布式上下文传播

Jaeger架构图

Jaeger分布式跟踪工具初探

Jaeger组件

  • Jaeger Agent,负责和客户端通讯,把收集到的追踪信息上报个收集器 Jaeger Collector
  • Jaeger Colletor把收集到的数据存入数据库或者其它存储器
  • Jaeger Query 负责对追踪数据进行查询
  • Jaeger Ingester 是一个从Kafka主题读取并写入另外一个存储后端(Cassandra、Elasticsearch)的服务
  • Jaeger UI负责用户交互

Jaeger端口统计

Agent
5775 UDP协议,接收兼容zipkin的协议数据
6831 UDP协议,接收兼容jaeger的兼容协议
6832 UDP协议,接收jaeger的二进制协议
5778 HTTP协议,数据量大不建议使用github

Collector
14267 tcp agent发送jaeger.thrift格式数据
14250 tcp agent发送proto格式数据(背后gRPC)
14268 http 直接接受客户端数据
14269 http 健康检查docker

Query
16686 http jaeger的前端,放给用户的接口
16687 http 健康检查数据库

Jaeger部署

1.建立命名空间后端

[root@VM-0-123-centos jaeger]# kubectl create namespace jaeger

2.部署Jaeger-Operator
Jaeger Operator:Jaeger Operator for Kubernetes简化了在Kubernetes上的部署和运行Jaeger。
Jaeger Operator是Kubernetes operator的实现。操做员是一种软件,能够减轻运行另外一软件的操做复杂性。从技术上讲,操做员是打包,部署和管理Kubernetes应用程序的一种方法。
Jaeger Operator版本跟踪Jaeger组件(查询,收集器,代理)的一种版本。发行新版本的Jaeger组件时,将发行新版本的操做员,该操做员了解如何将先前版本的运行实例升级到新版本。centos

[root@VM-0-123-centos jaeger]# kubectl create -n jaeger -f https://raw.githubusercontent.com/jaegertracing/jaeger-operator/master/deploy/crds/jaegertracing.io_jaegers_crd.yaml 
[root@VM-0-123-centos jaeger]# kubectl create -n jaeger -f https://raw.githubusercontent.com/jaegertracing/jaeger-operator/master/deploy/service_account.yaml
[root@VM-0-123-centos jaeger]# kubectl create -n jaeger -f https://raw.githubusercontent.com/jaegertracing/jaeger-operator/master/deploy/role.yaml
[root@VM-0-123-centos jaeger]# kubectl create -n jaeger -f https://raw.githubusercontent.com/jaegertracing/jaeger-operator/master/deploy/role_binding.yaml
[root@VM-0-123-centos jaeger]# kubectl create -n jaeger -f https://raw.githubusercontent.com/jaegertracing/jaeger-operator/master/deploy/operator.yaml

查看状态api

[root@VM-0-123-centos jaeger]# kubectl get all -n jaeger
NAME                                         READY   STATUS        RESTARTS   AGE
pod/jaeger-operator-6ff67bdd4b-4nffk         1/1     Running       0          14d
pod/simple-prod-collector-59fc47bf5c-h26mq   0/1     Terminating   0          9d

NAME                              TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)             AGE
service/jaeger-operator-metrics   ClusterIP   172.20.253.138   <none>        8383/TCP,8686/TCP   14d

NAME                              READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/jaeger-operator   1/1     1            1           14d

NAME                                         DESIRED   CURRENT   READY   AGE
replicaset.apps/jaeger-operator-6ff67bdd4b   1         1         1       14d

3.建立jaeger实例
建立jaeger.yaml文件,配置ES集群及限制Deployment/simple-prod-collector容器的cpu和内存使用大小。最大数量能够起10个pod。架构

apiVersion: jaegertracing.io/v1
kind: Jaeger
metadata:
  name: simple-prod
spec:
  strategy: production
  storage:
    type: elasticsearch
    options:
      es:
        server-urls: http://10.0.16.3:9200
        index-prefix: zhjt
  collector:
    maxReplicas: 10
    resources:
      limits:
        cpu: 500m
        memory: 512Mi
[root@VM-0-123-centos jaeger]# kubectl apply -f  jaeger.yaml  -n jaeger
jaeger.jaegertracing.io/simple-prod created

列出jaeger对象
备注:貌似使用官网all in one的例子状态是正常的Running,这里状态虽然是Failed,可是不影响使用。app

[root@VM-0-123-centos jaeger]# kubectl get jaegers -n jaeger
NAME          STATUS   VERSION   STRATEGY     STORAGE         AGE
simple-prod   Failed   1.22.0    production   elasticsearch   9d

获取pod名字

[root@VM-0-123-centos jaeger]# kubectl get pods -l app.kubernetes.io/instance=simple-prod -n jaeger
NAME                                              READY   STATUS      RESTARTS   AGE
simple-prod-collector-59fc47bf5c-h26mq            1/1     Running     0          9d
simple-prod-query-85689b7bbd-g5jw9                2/2     Running     0          9d

获取pod日志

[root@VM-0-123-centos jaeger]# kubectl  logs simple-prod-query-85689b7bbd-g5jw9 jaeger-agent  -n jaeger
2021/04/28 04:55:34 maxprocs: Leaving GOMAXPROCS=4: CPU quota undefined
{"level":"info","ts":1619585734.2081811,"caller":"flags/service.go:117","msg":"Mounting metrics handler on admin server","route":"/metrics"}
{"level":"info","ts":1619585734.2082183,"caller":"flags/service.go:123","msg":"Mounting expvar handler on admin server","route":"/debug/vars"}
{"level":"info","ts":1619585734.2083232,"caller":"flags/admin.go:105","msg":"Mounting health check on admin server","route":"/"}
{"level":"info","ts":1619585734.2083883,"caller":"flags/admin.go:111","msg":"Starting admin HTTP server","http-addr":":14271"}
{"level":"info","ts":1619585734.2084124,"caller":"flags/admin.go:97","msg":"Admin server started","http.host-port":"[::]:14271","health-status":"unavailable"}
{"level":"info","ts":1619585734.2089527,"caller":"grpc/builder.go:70","msg":"Agent requested insecure grpc connection to collector(s)"}
{"level":"info","ts":1619585734.2089992,"caller":"grpc@v1.29.1/clientconn.go:243","msg":"parsed scheme: \"dns\"","system":"grpc","grpc_log":true}
{"level":"info","ts":1619585734.21038,"caller":"command-line-arguments/main.go:84","msg":"Starting agent"}
{"level":"info","ts":1619585734.2104166,"caller":"healthcheck/handler.go:128","msg":"Health Check state change","status":"ready"}
{"level":"info","ts":1619585734.2108943,"caller":"grpc/builder.go:108","msg":"Checking connection to collector"}
{"level":"info","ts":1619585734.210908,"caller":"grpc/builder.go:119","msg":"Agent collector connection state change","dialTarget":"dns:///simple-prod-collector-headless.jaeger.svc:14250","status":"IDLE"}
{"level":"info","ts":1619585734.211061,"caller":"app/agent.go:69","msg":"Starting jaeger-agent HTTP server","http-port":5778}
{"level":"info","ts":1619585734.3344934,"caller":"grpc@v1.29.1/resolver_conn_wrapper.go:143","msg":"ccResolverWrapper: sending update to cc: {[{172.20.0.88:14250  <nil> 0 <nil>}] <nil> <nil>}","system":"grpc","grpc_log":true}
{"level":"info","ts":1619585734.3345578,"caller":"grpc@v1.29.1/clientconn.go:667","msg":"ClientConn switching balancer to \"round_robin\"","system":"grpc","grpc_log":true}
{"level":"info","ts":1619585734.3345697,"caller":"grpc@v1.29.1/clientconn.go:682","msg":"Channel switches to new LB policy \"round_robin\"","system":"grpc","grpc_log":true}
{"level":"info","ts":1619585734.3346283,"caller":"grpc@v1.29.1/clientconn.go:1056","msg":"Subchannel Connectivity change to CONNECTING","system":"grpc","grpc_log":true}
{"level":"info","ts":1619585734.33467,"caller":"grpc@v1.29.1/clientconn.go:1193","msg":"Subchannel picks a new address \"172.20.0.88:14250\" to connect","system":"grpc","grpc_log":true}
{"level":"info","ts":1619585734.334736,"caller":"grpc@v1.29.1/clientconn.go:417","msg":"Channel Connectivity change to CONNECTING","system":"grpc","grpc_log":true}
{"level":"info","ts":1619585734.3347983,"caller":"grpc/builder.go:119","msg":"Agent collector connection state change","dialTarget":"dns:///simple-prod-collector-headless.jaeger.svc:14250","status":"CONNECTING"}
{"level":"info","ts":1619585734.335669,"caller":"grpc@v1.29.1/clientconn.go:1056","msg":"Subchannel Connectivity change to READY","system":"grpc","grpc_log":true}
{"level":"info","ts":1619585734.3357751,"caller":"base/balancer.go:200","msg":"roundrobinPicker: newPicker called with info: {map[0xc0002f5ea0:{{172.20.0.88:14250  <nil> 0 <nil>}}]}","system":"grpc","grpc_log":true}
{"level":"info","ts":1619585734.3357947,"caller":"grpc@v1.29.1/clientconn.go:417","msg":"Channel Connectivity change to READY","system":"grpc","grpc_log":true}
{"level":"info","ts":1619585734.335807,"caller":"grpc/builder.go:119","msg":"Agent collector connection state change","dialTarget":"dns:///simple-prod-collector-headless.jaeger.svc:14250","status":"READY"}
{"level":"info","ts":1619592172.4516647,"caller":"grpc@v1.29.1/clientconn.go:1056","msg":"Subchannel Connectivity change to CONNECTING","system":"grpc","grpc_log":true}
{"level":"info","ts":1619592172.4517512,"caller":"grpc@v1.29.1/clientconn.go:1193","msg":"Subchannel picks a new address \"172.20.0.88:14250\" to connect","system":"grpc","grpc_log":true}
{"level":"info","ts":1619592172.4517596,"caller":"base/balancer.go:200","msg":"roundrobinPicker: newPicker called with info: {map[]}","system":"grpc","grpc_log":true}
{"level":"info","ts":1619592172.4517772,"caller":"grpc@v1.29.1/clientconn.go:417","msg":"Channel Connectivity change to CONNECTING","system":"grpc","grpc_log":true}
{"level":"info","ts":1619592172.4517884,"caller":"grpc/builder.go:119","msg":"Agent collector connection state change","dialTarget":"dns:///simple-prod-collector-headless.jaeger.svc:14250","status":"CONNECTING"}
{"level":"warn","ts":1619592172.4523218,"caller":"grpc@v1.29.1/clientconn.go:1275","msg":"grpc: addrConn.createTransport failed to connect to {172.20.0.88:14250  <nil> 0 <nil>}. Err: connection error: desc = \"transport: Error while dialing dial tcp 172.20.0.88:14250: connect: connection refused\". Reconnecting...","system":"grpc","grpc_log":true}
{"level":"info","ts":1619592172.4523551,"caller":"grpc@v1.29.1/clientconn.go:1056","msg":"Subchannel Connectivity change to TRANSIENT_FAILURE","system":"grpc","grpc_log":true}
{"level":"info","ts":1619592172.452386,"caller":"grpc@v1.29.1/clientconn.go:417","msg":"Channel Connectivity change to TRANSIENT_FAILURE","system":"grpc","grpc_log":true}
{"level":"info","ts":1619592172.4523947,"caller":"grpc/builder.go:119","msg":"Agent collector connection state change","dialTarget":"dns:///simple-prod-collector-headless.jaeger.svc:14250","status":"TRANSIENT_FAILURE"}
{"level":"info","ts":1619592172.6118224,"caller":"grpc@v1.29.1/resolver_conn_wrapper.go:143","msg":"ccResolverWrapper: sending update to cc: {[{172.20.0.178:14250  <nil> 0 <nil>}] <nil> <nil>}","system":"grpc","grpc_log":true}
{"level":"info","ts":1619592172.6118581,"caller":"grpc@v1.29.1/clientconn.go:1056","msg":"Subchannel Connectivity change to CONNECTING","system":"grpc","grpc_log":true}
{"level":"info","ts":1619592172.6118758,"caller":"grpc@v1.29.1/clientconn.go:1056","msg":"Subchannel Connectivity change to SHUTDOWN","system":"grpc","grpc_log":true}
{"level":"info","ts":1619592172.611892,"caller":"grpc@v1.29.1/clientconn.go:417","msg":"Channel Connectivity change to CONNECTING","system":"grpc","grpc_log":true}
{"level":"info","ts":1619592172.6119003,"caller":"grpc/builder.go:119","msg":"Agent collector connection state change","dialTarget":"dns:///simple-prod-collector-headless.jaeger.svc:14250","status":"CONNECTING"}
{"level":"info","ts":1619592172.6119049,"caller":"grpc@v1.29.1/clientconn.go:1193","msg":"Subchannel picks a new address \"172.20.0.178:14250\" to connect","system":"grpc","grpc_log":true}
{"level":"info","ts":1619592172.612726,"caller":"grpc@v1.29.1/clientconn.go:1056","msg":"Subchannel Connectivity change to READY","system":"grpc","grpc_log":true}
{"level":"info","ts":1619592172.6127572,"caller":"base/balancer.go:200","msg":"roundrobinPicker: newPicker called with info: {map[0xc0003df970:{{172.20.0.178:14250  <nil> 0 <nil>}}]}","system":"grpc","grpc_log":true}
{"level":"info","ts":1619592172.6127682,"caller":"grpc@v1.29.1/clientconn.go:417","msg":"Channel Connectivity change to READY","system":"grpc","grpc_log":true}
{"level":"info","ts":1619592172.6127849,"caller":"grpc/builder.go:119","msg":"Agent collector connection state change","dialTarget":"dns:///simple-prod-collector-headless.jaeger.svc:14250","status":"READY"}
[root@VM-0-123-centos jaeger]# kubectl  logs simple-prod-query-85689b7bbd-g5jw9 jaeger-query   -n jaeger
2021/04/28 04:55:29 maxprocs: Leaving GOMAXPROCS=4: CPU quota undefined
{"level":"info","ts":1619585729.8951077,"caller":"flags/service.go:117","msg":"Mounting metrics handler on admin server","route":"/metrics"}
{"level":"info","ts":1619585729.8951416,"caller":"flags/service.go:123","msg":"Mounting expvar handler on admin server","route":"/debug/vars"}
{"level":"info","ts":1619585729.8952546,"caller":"flags/admin.go:105","msg":"Mounting health check on admin server","route":"/"}
{"level":"info","ts":1619585729.8953054,"caller":"flags/admin.go:111","msg":"Starting admin HTTP server","http-addr":":16687"}
{"level":"info","ts":1619585729.8953238,"caller":"flags/admin.go:97","msg":"Admin server started","http.host-port":"[::]:16687","health-status":"unavailable"}
{"level":"info","ts":1619585729.9169888,"caller":"config/config.go:183","msg":"Elasticsearch detected","version":7}
{"level":"info","ts":1619585729.9174955,"caller":"app/static_handler.go:181","msg":"UI config path not provided, config file will not be watched"}
{"level":"info","ts":1619585729.9175768,"caller":"app/server.go:170","msg":"Query server started"}
{"level":"info","ts":1619585729.9175944,"caller":"healthcheck/handler.go:128","msg":"Health Check state change","status":"ready"}
{"level":"info","ts":1619585729.9176183,"caller":"app/server.go:249","msg":"Starting GRPC server","port":16685,"addr":":16685"}
{"level":"info","ts":1619585729.9176335,"caller":"app/server.go:230","msg":"Starting HTTP server","port":16686,"addr":":16686"}

4.查看jaeger资源

[root@VM-0-123-centos jaeger]# kubectl get all -n jaeger
NAME                                                  READY   STATUS      RESTARTS   AGE
pod/jaeger-operator-6ff67bdd4b-4nffk                  1/1     Running     0          14d
pod/simple-prod-collector-59fc47bf5c-h26mq            1/1     Running     0          8d
pod/simple-prod-query-85689b7bbd-g5jw9                2/2     Running     0          8d

NAME                                     TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)                                  AGE
service/jaeger-operator-metrics          ClusterIP   172.20.253.138   <none>        8383/TCP,8686/TCP                        14d
service/simple-prod-collector            ClusterIP   172.20.255.184   <none>        9411/TCP,14250/TCP,14267/TCP,14268/TCP   8d
service/simple-prod-collector-headless   ClusterIP   None             <none>        9411/TCP,14250/TCP,14267/TCP,14268/TCP   8d
service/simple-prod-query                ClusterIP   172.20.254.102   <none>        16686/TCP                                8d

NAME                                    READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/jaeger-operator         1/1     1            1           14d
deployment.apps/simple-prod-collector   1/1     1            1           8d
deployment.apps/simple-prod-query       1/1     1            1           8d

NAME                                               DESIRED   CURRENT   READY   AGE
replicaset.apps/jaeger-operator-6ff67bdd4b         1         1         1       14d
replicaset.apps/simple-prod-collector-59fc47bf5c   1         1         1       8d
replicaset.apps/simple-prod-query-85689b7bbd       1         1         1       8d

NAME                                                        REFERENCE                          TARGETS             MINPODS   MAXPODS   REPLICAS   AGE
horizontalpodautoscaler.autoscaling/simple-prod-collector   Deployment/simple-prod-collector   1457m/90, 137m/90   1         10        1          8d

若是流量大须要减少es压力,能够接入kafka集群,修改jaeger.yaml文件

apiVersion: jaegertracing.io/v1
kind: Jaeger
metadata:
  name: simple-streaming
spec:
  strategy: streaming
  collector:
    options:
      kafka:
        producer:
          topic: jaeger-spans
          brokers: my-cluster-kafka-brokers.kafka:9092   #修改成kafka地址
  ingester:
    options:
      kafka:
        consumer:
          topic: jaeger-spans
          brokers: my-cluster-kafka-brokers.kafka:9092  #修改成kafka地址
      ingester:
        deadlockInterval: 5s
  storage:
    type: elasticsearch
    options:
      es:
        server-urls: http://elasticsearch:9200   #修改成ES地址

5.agent部署

jaeger client的一个代理程序,client将收集到的调用链数据发给agent,而后由agent发给collector。因为使用的udp协议,通常部署在靠近client的位置。

agent有多种安装方式

1).docker安装

下载:jaegertracing/jaeger-agent Tags (docker.com)

docker run -d -p 6831:6831/udp -p 6832:6832/udp -p 5778:5778/tcp jaegertracing/jaeger-agent:1.12 --reporter.grpc.host-port=xx.xx.xx.xx:14250

2).k8s安装又分两种

sidecar方式

daemonset方式

参考:Operator for Kubernetes — Jaeger documentation (jaegertracing.io)

3).二进制安装

下载:Jaeger – Download Jaeger (jaegertracing.io)

nohup ./jaeger-agent --collector.host-port=xxxx:14267 1>1.log 2>2.log &

相关文章
相关标签/搜索