Kafka Consumer Lag Monitoring

时间 2019-11-16

标签 kafka consumer lag monitoring 栏目 Kafka 繁體版

原文原文链接

Sematext Monitoring 是最全面的Kafka监视解决方案之一，可捕获约200个Kafka指标，包括Kafka Broker，Producer和Consumer指标。尽管其中许多指标颇有用，但每一个人都有一个要监视的特定指标–消费者滞后。java

什么是卡夫卡消费者滞后？

卡夫卡消费者滞后指标代表卡夫卡生产者和消费者之间存在多少滞后。人们谈论卡夫卡时，一般指的是卡夫卡经纪人。您能够将Kafka Broker视为Kafka服务器。代理其实是存储和提供Kafka消息的对象。Kafka生产者是将消息写入Kafka（经纪人）的应用程序。Kafka使用者是从Kafka（Brokers）读取消息的应用程序。git

内部经纪人数据存储在一个或多个主题中，每一个主题由一个或多个分区组成。当写入数据时，代理实际上会将其写入特定的分区。在写入数据时，它会跟踪每一个分区中的最后一个“写入位置”。这称为最新偏移，也称为对数结束偏移。每一个分区都有本身独立的最新偏移量。github

就像Broker跟踪每一个分区中的写入位置同样，每一个Consumer跟踪每一个正在消耗其数据的分区中的“读取位置”。也就是说，它跟踪已读取的数据。这被称为消费者抵销。消费者偏移量会按期存在（到ZooKeeper或Kafka自己的特殊主题），所以它能够承受消费者崩溃或不正常关机的状况，并避免重复使用过多的旧数据。服务器

卡夫卡消费者滞后率和读/写率app

在上面的图表中，咱们能够看到黄色的条形，表明着经纪人编写生产者建立的消息的速率。橙色条形表示消费者从经纪人那里消费消息的速率。费率看起来大体相等-必须保持一致，不然消费者将落后。可是，在写入消息和使用消息之间始终会有一些延迟。读取老是落后于写入，这就是咱们所说的“消费者滞后”。消费者滞后时间只是最新偏移量和消费者偏移量之间的增量。less

为何消费者滞后很重要

现在，许多应用程序都是基于可以处理（接近）实时数据的。考虑一下性能监控系统（例如Sematext Monitoring）或日志管理服务（例如Sematext Logs）。他们接二连三地处理无限量的近实时数据。若是它们向您显示指标或日志的时间过长-若是“消费者滞后”过大-它们将几乎无用。消费者滞后告诉咱们每一个分区中每一个消费者（组）落后多远。滞后时间越短，实时数据消耗就越大。ide

监视读写速率

卡夫卡消费者滞后和经纪人抵销变化工具

正如咱们刚刚了解到的，“最新偏移量”与“消费者偏移量”之间的差别是致使咱们“消费者滞后”的缘由。在上面的Sematext图表中，您可能已经注意到其余一些指标：性能

经纪人写率
消耗率
经纪人最先的抵销变更

速率指标是派生的指标。若是您查看Kafka的指标，您将找不到它们。在后台，开源Sematext代理收集了一些Kafka指标具备各类偏移量，可从这些偏移量计算这些费率。此外，它还绘制了经纪人最先的偏移量变化图，这是每一个经纪人分区中已知的最先的偏移量。换句话说，此偏移量是分区中最旧消息的偏移量。尽管仅靠偏移量可能并非超级有用，但当状况出现问题时，了解其变化状况可能会很方便。Kafka中的数据具备必定的TTL（生存时间），能够轻松清除旧数据。该清除操做由Kafka自己执行。每次清除都会使最旧数据的偏移量发生变化。Sematext的经纪人最先的抵销更改会浮出水面，以便您进行监控。该指标使您了解清除的频率以及每次运行时清除的消息数量。this

Kafka监控工具

那里有几种Kafka监控工具，例如 LinkedIn的Burrow，其Sematext中使用了Kafka Offset监控和Consumer Lag监控方法。咱们在Kafka开源监控工具中编写了各类开源监控工具。若是您须要一个好的Kafka监控解决方案，请尝试使用Sematext。将您的Kafka和其余日志发送到Sematext Logs中，您便拥有了一个DevOps解决方案，该解决方案使故障排除变得容易而不是麻烦。

is one of the most comprehensive Kafka monitoring solutions, capturing some 200 Kafka metrics, including Kafka Broker, Producer, and Consumer metrics. While lots of those metrics are useful, there is one particular metric everyone wants to monitor – Consumer Lag.

What is Kafka Consumer Lag?

Kafka Consumer Lag is the indicator of how much lag there is between Kafka producers and consumers. When people talk about Kafka they are typically referring to Kafka Brokers. You can think of a Kafka Broker as a Kafka server. A Broker is what actually stores and serves Kafka messages. Kafka Producers are applications that write messages into Kafka (Brokers). Kafka Consumers are applications that read messages from Kafka (Brokers).

Inside Brokers data is stored in one or more Topics, and each Topic consists of one or more Partitions. When writing data a Broker actually writes it into a specific Partition. As it writes data it keeps track of the last “write position” in each Partition. This is called Latest Offset also known as Log End Offset. Each Partition has its own independent Latest Offset.

Just like Brokers keep track of their write position in each Partition, each Consumer keeps track of “read position” in each Partition whose data it is consuming. That is, it keeps track of which data it has read. This is known as Consumer Offset. This Consumer Offset is periodically persisted (to ZooKeeper or a special Topic in Kafka itself) so it can survive Consumer crashes or unclean shutdowns and avoid re-consuming too much old data.

Kafka Consumer Lag and Read/Write Rates

In our diagram above we can see yellow bars, which represents the rate at which Brokers are writing messages created by Producers. The orange bars represent the rate at which Consumers are consuming messages from Brokers. The rates look roughly equal – and they need to be, otherwise the Consumers will fall behind. However, there is always going to be some delay between the moment a message is written and the moment it is consumed. Reads are always going to be lagging behind writes, and that is what we call Consumer Lag. The Consumer Lag is simply the delta between the Latest Offset and Consumer Offset.

Why is Consumer Lag Important

Many applications today are based on being able to process (near) real-time data. Think about performance monitoring system like Sematext Monitoring or log management service like Sematext Logs. They continuously process infinite streams of near real-time data. If they were to show you metrics or logs with too much delay – if the Consumer Lag were too big – they’d be nearly useless. This Consumer Lag tells us how far behind each Consumer (Group) is in each Partition. The smaller the lag the more real-time the data consumption.

Monitoring Read and Write Rates

Kafka Consumer Lag and Broker Offset Changes

As we just learned the delta between the Latest Offset and the Consumer Offset is what gives us the Consumer Lag. In the above chart from Sematext you may have noticed a few other metrics:

Broker Write Rate
Consume Rate
Broker Earliest Offset Changes

The rate metrics are derived metrics. If you look at Kafka’s metrics you won’t find them there. Under the hood the open source Sematext agent collects a few Kafka metrics with various offsets from which these rates are computed. In addition, it charts Broker Earliest Offset Changes, which is the earliest known offset in each Broker’s Partition. Put another way, this offset is the offset of the oldest message in a Partition. While this offset alone may not be super useful, knowing how it’s changing could be handy when things go awry. Data in Kafka has a certain TTL (Time To Live) to allow for easy purging of old data. This purging is performed by Kafka itself. Every time such purging kicks in the offset of the oldest data changes. Sematext’s Broker Earliest Offset Change surfaces this information for your monitoring pleasure. This metric gives you an idea how often purges are happening and how many messages they’ve removed each time they ran.

Kafka Monitoring Tools

There are several Kafka monitoring tools out there that, like LinkedIn’s Burrow, whose Kafka Offset monitoring and Consumer Lag monitoring approach is used in Sematext. We’ve written various open source monitoring tools in Kafka Open Source Monitoring Tools. If you need a good Kafka monitoring solution, give Sematext a go. Ship your Kafka and other logs into Sematext Logs and you’ve got yourself a DevOps solution that will make troubleshooting easy instead of dreadful.