springboot(2.1.0)+springcloud(Greenwich.M1)实现链路追踪

时间 2019-11-08

标签 springboot 2.1.0 springcloud greenwich.m1 greenwich 实现链路追踪栏目 Spring 繁體版

原文原文链接

主要问题

因为springboot新版本(2.1.0)、springcloud新版本(Greenwich.M1)实现链路追踪sleuth+zipkin的一些“新特性”，使得我在实现sleuth+zipkin的过程上踩了很多坑。javascript

在springboot1.X版本的时候，实现链路追踪服务须要用户本身实现client以及server，一般在server服务端须要引入各类各样的包(spring-cloud-sleuth-stream，以及支持zipkin的一些相关依赖包等等)；css

但在spring cloud新版本实现链路追踪sleuth+zipkin的方式上已经再也不须要本身再去实现一个server服务端（集成sleuth+zipkin），而是由zinkin官方提供了一个现成的zipkin-server.jar，或者是一个docker镜像，用户能够下载并经过命令进行启动它，用户能够通一些配置来肯定sleuth收集到信息后传输到zipkin之间采用http,仍是经过rabbit/kafka的方式。在新的版本下，用户只须要关注slenth-client选用何种传输方式（http或mq（rabbit/kafka），若是选择http,则在配置中指明base-url；若是选择mq,则在配置指明相关消息中间件的相关信息host/port/username/password...），至于zipkin的信息storage问题,则由zipkin-server要负责，能够经过zipkin-server.jar 配置一些具体的参数来启动。（下面会细讲）html

ps:这不是教程贴，这主要是解决一些问题的一些方法，不会有详细的实现过程，但为了简明我会贴上部分代码。java

背景

最近开始实习了，老大让我自学一下sc(spring cloud)，学就学嘛，也不是难事。看完spring cloud的全家桶,老大说让我重点了解一下它的链路追踪服务，后期会有这方面的任务安排给我作，因此呢我就重点关注这一方面，打算本身作个demo练练手，看了网上的教程，膨胀的我选择了个最新的版本，结果发现就这么掉坑里了。。。mysql

版本

按照惯例，先说下springboot跟spring cloud的版本
springboot：2.1.0
springcloud：Greenwich.M1
我的建议新手不要过度追求新版本，旧版本的仍是够用的,比springboot 2.6.0搭配sringcloud Finchley SR2仍是挺稳的，若是真的要探索新版本你会发现这里面的坑实在是踩不完，基本要花个一两天才能让本身从坑里跳出去，这样频繁踩坑会让新手很容易放弃~~~
ps：不要问我为何知道。。。git

正题

闲话扯完了，能够进入正题了
一共四个服务
eureka-server
zipkin-server：新版本的zipkin服务端，负责接受sleuth发送过来的数据，完成处理、存储、创建索引，而且提供了一个可视化的ui数据分析界面。
须要的同窗话能够直接在github上下载https://github.com/openzipkin...github

嗯就是这两个家伙
下面两个是两个服务web

eureka-server服务注册中心，这个实现我就不讲了，网上搜一大把，各个版本实现基本都是一致的，并不存在版本更新跨度极大的状况。并且这里我把它是打包成一个jar包,在须要的时候直接用java -jar XXX.jar 直接启动spring

至于product跟order(也即实际场景下各类种样的服务A、B、C...)sql

order服务只有一个接口/test，去调用product的接口

这里的productclient就是使用feignf去调用order的/product/list接口

product只有一个接口/product/list，查找全部商品的列表

简单的来讲，这里的场景就是order服务--（去调用）-->product服务

说完场景后，贴一下这两个服务的相关配置信息(order跟producet的配置基本上是相同的）
application.yml

spring:
  application:
    #服务名
    name: product
  #因为业务逻辑须要操做数据库，因此这里配置了mysql的一些信息
  datasource:
    driver-class-name: com.mysql.jdbc.Driver
    username: root
    password: 123456
    url: jdbc:mysql://127.0.0.1:3306/sc_sell?characterEncoding=utf-8&useSSL=false&serverTimezone=Asia/Shanghai
  jpa:
    show-sql: true
  #重点
  zipkin:
    #base-url:当你设置sleuth-cli收集信息后经过http传输到zinkin-server时，须要在这里配置
    base-url: http://localhost:9411
    enabled: true
  sleuth:
    sampler:
      #收集追踪信息的比率，若是是0.1则表示只记录10%的追踪数据，若是要所有追踪，设置为1（实际场景不推荐，由于会形成不小的性能消耗）
      probability: 1
eureka:
  client:
    service-url:
    #注册中心地址
      defaultZone: http://localhost:8999/eureka/
logging:
  level:
    #这个是设置feign的一个日志级别,key-val的形式设置
    org.springframework.cloud.openfeign: debug

说完配置信息，就该讲一下依赖了，很简单，client实现链路追踪只须要添加一个依赖spring-cloud-starter-zipkin。就是这个

<dependency>
            <groupId>org.springframework.cloud</groupId>
            <artifactId>spring-cloud-starter-zipkin</artifactId>
        </dependency>

其实这些都是基础操做，是吧，那么来点进阶的。
从上面的例子上来看，其实仍是有几个问题须要考虑一下。

有点开发经验的人都会发现，首先它是基于http协议传输的，http协议传输有个很差的地方就是，它是短链接，即须要频繁经过三次握手创建连接，这在追踪不少服务时会形成不小的性能消耗。
另外还有一个问题：对于直接传输的方式，有个弊端就是一旦接收方意外断开链接，那么在传输链路中的一些数据将会丢失，若是这些数据是关键数据，那么后果将是很是严重的。一样一些场景下须要保存链路追踪的数据，以备后面观察对比，因此一样须要一个db来存储数据。

因此对于以上的问题，仍是须要去考虑，值得欣慰的是，zipkin在这两个方面也做了很nice的解决方案，在实现过程当中只须要稍做配置便可。

在sleuth-cli跟zipkin-server之间插入一个消息中间件rabbitmq/kafka，这里我举例中只使用rabbitmq来实现
将链路追踪的数据存储到DB上，目前zipkin暂时只支持mysql/elasticsearch,这里我使用mysql

若是你是刚开始学习sc，给你去实现的话，你确定会开始打开浏览器开始搜索教程。
结果你会发现，大部分博客上都是之前版本的实现方式，一些较旧会让你本身实现一个zipkin-server（我怀疑他们的版本是1.x）,你会发现很郁闷，由于这跟你想象的不太同样啊。
继续找，终于在茫茫帖子中，找到了一篇是关于springboot2.0.X版本的实现链路追踪的教程，这时候你会兴奋，终于找到靠谱一点的啊，喜出望外有木有啊，可是，事情还没完，它会让你在客户端依赖下面这个依赖包

<dependency>
            <groupId>org.springframework.cloud</groupId>
            <artifactId>spring-cloud-sleuth-zipkin-stream</artifactId>
        </dependency>
         <dependency>
            <groupId>org.springframework.cloud</groupId>
            <artifactId>spring-cloud-sleuth-stream</artifactId>
        </dependency>

结果你会发现，你在依赖它的时候，实际上是依赖不了，为何？由于版本的问题，什么？你跟我说你的pom文件没报错啊，可是，你打开idea右边的maven插件看一下

这真的是一个巨坑，我一直不明白是怎么回事，直到有一次，我打开了这个页面，花了我一天的时间去摸索是什么缘由形成的集成rabbitmq失败，真的是被安排得明明白白，最后我发现，这条路行不通啊

最后，豪无头绪的我，继续在网上查找一些springboot2.x版本的一些链路追踪的教程，在搜索了一个下午，我忽然想起，诶不对，我应该直接去官网看它的官方教程的啊。。。虽然都英文，大不了我用chrome自带的翻译工具翻译一下咯。结果就立马打开spring的官网，选择了最新的版本，进去找了一下，还真的让我找到，还特别简单！！！
传送门：https://cloud.spring.io/sprin...
官方文档是这么说的。

意思大概是说：若是你想使用rabbitmq或kafka替换掉http,添加spring-rabbit或spring-kafka的依赖包，默认目标名是zipkin(队列名),若是你使用kafka/mysql，你须要设置属性：spring-zipkin-sender-type=kafka/mysql
也就是说，只须要引入下面这两个依赖包！！！

<dependency> 
    <groupId>org.springframework.cloud</groupId>
    <artifactId>spring-cloud-starter-zipkin</artifactId>
</dependency>
<dependency> 
    <groupId>org.springframework.amqp</groupId>
    <artifactId>spring-rabbit</artifactId>
</dependency>

再往下看，你会发现有一个提示

spring-cloud-sleuth-stream已经被弃用，再也不与这个版本新内容。。。
因此如今再回过头去看，你会知道为何在上一个尝试中引入spring-cloud-sleuth-stream会无效了。

再修改下application.yml的配置信息，只须要注释掉base-url,修改zipkin.sender.type=rabiit，再配置一下rabbitmq的一些信息，就大功告成。

zipkin:
#  内存方式配置：可不配
#    base-url: http://localhost:9411/
    sender:
      type: rabbit
  rabbitmq:
    host: localhost
    port: 5672
    username: guest
    password: guest

到这里，你就已经把order/poduct的链路追踪部分作好了。

咱们上面讲了sleuth负责收集数据，zipkin负责接收sleuth收集后发送过来的追踪信息，处理、存储、索引、提供ui，因此接下来，就是来实现zipkin-server的从rabbitmq队列取出追踪数据，并存储在mysql数据中这一功能了。

对于zipkin-server如何去实现，其实zinkin官网已经给咱们作了功能的集成，只须要在启动的时候，设置参数便可，下面就来说一下

对于须要根据什么场景设置什么样的参数的问题，我不会具体讲解应该怎么设置，由于我也只是刚接触sc不久，一些场景我也不是很熟悉，但我会讲怎么去找咱们须要的一些参数。

方法一，经过修改基配置文件后启动。

首先，咱们用解压工具解压一下zipkin-server.jar这个压缩包，解压出来有三个文件夹，里面大部分都是.class文件。

而后咱们进入BOOT-INFclasses目录下，你会发现有两个.yml文件，没错这就是yml的配置文件了

其中zipkin-server.yml就是zinpkin-server主要的配置文件了，但你打开后会发现，其实里面只有一行配置，spring.profiles.include: shared
,即引入shared.yml文件，因此这里咱们主要看zinkin-serer-shared.yml文件。
打开zinkin-serer-shared.yml

zipkin:
  self-tracing:
    # Set to true to enable self-tracing.
    enabled: ${SELF_TRACING_ENABLED:false}
    # percentage to self-traces to retain
    sample-rate: ${SELF_TRACING_SAMPLE_RATE:1.0}
    # Timeout in seconds to flush self-tracing data to storage.
    message-timeout: ${SELF_TRACING_FLUSH_INTERVAL:1}
  collector:
    # percentage to traces to retain
    sample-rate: ${COLLECTOR_SAMPLE_RATE:1.0}
    http:
      # Set to false to disable creation of spans via HTTP collector API
      enabled: ${HTTP_COLLECTOR_ENABLED:true}
    kafka:
      # Kafka bootstrap broker list, comma-separated host:port values. Setting this activates the
      # Kafka 0.10+ collector.
      bootstrap-servers: ${KAFKA_BOOTSTRAP_SERVERS:}
      # Name of topic to poll for spans
      topic: ${KAFKA_TOPIC:zipkin}
      # Consumer group this process is consuming on behalf of.
      group-id: ${KAFKA_GROUP_ID:zipkin}
      # Count of consumer threads consuming the topic
      streams: ${KAFKA_STREAMS:1}
    rabbitmq:
      # RabbitMQ server address list (comma-separated list of host:port)
      addresses: ${RABBIT_ADDRESSES:}
      concurrency: ${RABBIT_CONCURRENCY:1}
      # TCP connection timeout in milliseconds
      connection-timeout: ${RABBIT_CONNECTION_TIMEOUT:60000}
      password: ${RABBIT_PASSWORD:guest}
      queue: ${RABBIT_QUEUE:zipkin}
      username: ${RABBIT_USER:guest}
      virtual-host: ${RABBIT_VIRTUAL_HOST:/}
      useSsl: ${RABBIT_USE_SSL:false}
      uri: ${RABBIT_URI:}
  query:
    enabled: ${QUERY_ENABLED:true}
    # 1 day in millis
    lookback: ${QUERY_LOOKBACK:86400000}
    # The Cache-Control max-age (seconds) for /api/v2/services and /api/v2/spans
    names-max-age: 300
    # CORS allowed-origins.
    allowed-origins: "*"

  storage:
    strict-trace-id: ${STRICT_TRACE_ID:true}
    search-enabled: ${SEARCH_ENABLED:true}
    type: ${STORAGE_TYPE:mem}
    mem:
      # Maximum number of spans to keep in memory.  When exceeded, oldest traces (and their spans) will be purged.
      # A safe estimate is 1K of memory per span (each span with 2 annotations + 1 binary annotation), plus
      # 100 MB for a safety buffer.  You'll need to verify in your own environment.
      # Experimentally, it works with: max-spans of 500000 with JRE argument -Xmx600m.
      max-spans: 500000
    cassandra:
      # Comma separated list of host addresses part of Cassandra cluster. Ports default to 9042 but you can also specify a custom port with 'host:port'.
      contact-points: ${CASSANDRA_CONTACT_POINTS:localhost}
      # Name of the datacenter that will be considered "local" for latency load balancing. When unset, load-balancing is round-robin.
      local-dc: ${CASSANDRA_LOCAL_DC:}
      # Will throw an exception on startup if authentication fails.
      username: ${CASSANDRA_USERNAME:}
      password: ${CASSANDRA_PASSWORD:}
      keyspace: ${CASSANDRA_KEYSPACE:zipkin}
      # Max pooled connections per datacenter-local host.
      max-connections: ${CASSANDRA_MAX_CONNECTIONS:8}
      # Ensuring that schema exists, if enabled tries to execute script /zipkin-cassandra-core/resources/cassandra-schema-cql3.txt.
      ensure-schema: ${CASSANDRA_ENSURE_SCHEMA:true}
      # 7 days in seconds
      span-ttl: ${CASSANDRA_SPAN_TTL:604800}
      # 3 days in seconds
      index-ttl: ${CASSANDRA_INDEX_TTL:259200}
      # the maximum trace index metadata entries to cache
      index-cache-max: ${CASSANDRA_INDEX_CACHE_MAX:100000}
      # how long to cache index metadata about a trace. 1 minute in seconds
      index-cache-ttl: ${CASSANDRA_INDEX_CACHE_TTL:60}
      # how many more index rows to fetch than the user-supplied query limit
      index-fetch-multiplier: ${CASSANDRA_INDEX_FETCH_MULTIPLIER:3}
      # Using ssl for connection, rely on Keystore
      use-ssl: ${CASSANDRA_USE_SSL:false}
    cassandra3:
      # Comma separated list of host addresses part of Cassandra cluster. Ports default to 9042 but you can also specify a custom port with 'host:port'.
      contact-points: ${CASSANDRA_CONTACT_POINTS:localhost}
      # Name of the datacenter that will be considered "local" for latency load balancing. When unset, load-balancing is round-robin.
      local-dc: ${CASSANDRA_LOCAL_DC:}
      # Will throw an exception on startup if authentication fails.
      username: ${CASSANDRA_USERNAME:}
      password: ${CASSANDRA_PASSWORD:}
      keyspace: ${CASSANDRA_KEYSPACE:zipkin2}
      # Max pooled connections per datacenter-local host.
      max-connections: ${CASSANDRA_MAX_CONNECTIONS:8}
      # Ensuring that schema exists, if enabled tries to execute script /zipkin2-schema.cql
      ensure-schema: ${CASSANDRA_ENSURE_SCHEMA:true}
      # how many more index rows to fetch than the user-supplied query limit
      index-fetch-multiplier: ${CASSANDRA_INDEX_FETCH_MULTIPLIER:3}
      # Using ssl for connection, rely on Keystore
      use-ssl: ${CASSANDRA_USE_SSL:false}
    elasticsearch:
      # host is left unset intentionally, to defer the decision
      hosts: ${ES_HOSTS:}
      pipeline: ${ES_PIPELINE:}
      max-requests: ${ES_MAX_REQUESTS:64}
      timeout: ${ES_TIMEOUT:10000}
      index: ${ES_INDEX:zipkin}
      date-separator: ${ES_DATE_SEPARATOR:-}
      index-shards: ${ES_INDEX_SHARDS:5}
      index-replicas: ${ES_INDEX_REPLICAS:1}
      username: ${ES_USERNAME:}
      password: ${ES_PASSWORD:}
      http-logging: ${ES_HTTP_LOGGING:}
      legacy-reads-enabled: ${ES_LEGACY_READS_ENABLED:true}
    mysql:
      jdbc-url: ${MYSQL_JDBC_URL:}
      host: ${MYSQL_HOST:localhost}
      port: ${MYSQL_TCP_PORT:3306}
      username: ${MYSQL_USER:}
      password: ${MYSQL_PASS:}
      db: ${MYSQL_DB:zipkin}
      max-active: ${MYSQL_MAX_CONNECTIONS:10}
      use-ssl: ${MYSQL_USE_SSL:false}
  ui:
    enabled: ${QUERY_ENABLED:true}
    ## Values below here are mapped to ZipkinUiProperties, served as /config.json
    # Default limit for Find Traces
    query-limit: 10
    # The value here becomes a label in the top-right corner
    environment:
    # Default duration to look back when finding traces.
    # Affects the "Start time" element in the UI. 1 hour in millis
    default-lookback: 3600000
    # When false, disables the "find a trace" screen
    search-enabled: ${SEARCH_ENABLED:true}
    # Which sites this Zipkin UI covers. Regex syntax. (e.g. http:\/\/example.com\/.*)
    # Multiple sites can be specified, e.g.
    # - .*example1.com
    # - .*example2.com
    # Default is "match all websites"
    instrumented: .*
    # URL placed into the <base> tag in the HTML
    base-path: /zipkin

server:
  port: ${QUERY_PORT:9411}
  use-forward-headers: true
  compression:
    enabled: true
    # compresses any response over min-response-size (default is 2KiB)
    # Includes dynamic json content and large static assets from zipkin-ui
    mime-types: application/json,application/javascript,text/css,image/svg

spring:
  jmx:
     # reduce startup time by excluding unexposed JMX service
     enabled: false
  mvc:
    favicon:
      # zipkin has its own favicon
      enabled: false
  autoconfigure:
    exclude:
      # otherwise we might initialize even when not needed (ex when storage type is cassandra)
      - org.springframework.boot.autoconfigure.jdbc.DataSourceAutoConfiguration
info:
  zipkin:
    version: "2.11.8"

logging:
  pattern:
    level: "%clr(%5p) %clr([%X{traceId}/%X{spanId}]){yellow}"
  level:
    # Silence Invalid method name: '__can__finagle__trace__v3__'
    com.facebook.swift.service.ThriftServiceProcessor: 'OFF'
#     # investigate /api/v2/dependencies
#     zipkin2.internal.DependencyLinker: 'DEBUG'
#     # log cassandra queries (DEBUG is without values)
#     com.datastax.driver.core.QueryLogger: 'TRACE'
#     # log cassandra trace propagation
#     com.datastax.driver.core.Message: 'TRACE'
#     # log reason behind http collector dropped messages
#     zipkin2.server.ZipkinHttpCollector: 'DEBUG'
#     zipkin2.collector.kafka.KafkaCollector: 'DEBUG'
#     zipkin2.collector.kafka08.KafkaCollector: 'DEBUG'
#     zipkin2.collector.rabbitmq.RabbitMQCollector: 'DEBUG'
#     zipkin2.collector.scribe.ScribeCollector: 'DEBUG'

management:
  endpoints:
    web:
      exposure:
        include: '*'
  endpoint:
    health:
      show-details: always
# Disabling auto time http requests since it is added in Undertow HttpHandler in Zipkin autoconfigure
# Prometheus module. In Zipkin we use different naming for the http requests duration
  metrics:
    web:
      server:
        auto-time-requests: false

这其实就是配置文件，对于须要使用的组件，其实就是只修改对应的配置，好比我须要使用storage，让它把追踪数据保存到mysql中，那么我只须要修改对应的配置信息：

storage:
    #其实部分不须要修改,省略掉
    mysql:
      jdbc-url: jdbc:sqlserver://localhost?XXX=xxx;
      host: localhost
      port: 3306
      username: root
      password: 123456
      db: zipkin
      #最大链接数
      max-active: ${MYSQL_MAX_CONNECTIONS:10}
      #是否使用ssl
      use-ssl: ${MYSQL_USE_SSL:false}

修改完配置，咱们从新压缩成一个jar包，直接启动便可。

方法二，经过使用环境变量的方式来启动zipkin-server.jar服务。

直接使用java -jar zipkin-server.jar --zipkin.storage.mysql.username=root --zipkin.storage.mysql.password=123456 --zipkin.storage.mysql.host=localhost --zipkin.storage.mysql.port=3306 ...
后面接上的便是它的环境变量，至于环境变量有哪些，请看方法一的yml文件，都是一一对应的。这种方法好片就是不须要修改jar包，但就是须要后面接上一串较长的环境变量声明。

好了，基本上就已经结束了。其实配置都是一样的原理。可以触类旁通天然其它相关配置都不是什么问题。

总结

更新过程当中由于比较忙中间还没写完就发表了，致使内容欠缺，今天终于利用周末的时间补上了，万幸。第一篇文章，主要记录本身的踩坑经历，非专业的写教程，大都是一些随心的记录，若是有什么看不懂的，欢迎留下你的问题，一样，若是哪些地方写得有误，望您不吝赐教，帮我指出一些错误，谢谢。