目前最新版本为2.3(20190808更新)。html
从使用上来看,以0.9为分界线,0.9开始再也不区分高级(至关于mysql binlog的GTID,只须要跟topic打交道,服务器自动管理偏移量和负载均衡)/低级消费者API(至关于mysql binlog的文件+position,直接和分区以及偏移量打交道)。java
从兼容性上来看,以0.8.x为分界线,0.8.x不兼容之前的版本。node
从上可知:mysql
一、生产者不须要访问zookeeper(0.8.x版本的kafka consumer直连zk获得偏移量信息,以后的版本直接从cluster获取,因此这两个版本的API并不兼容,上图是0.8x的结构,0.9.x以及以后略有失真)。c++
二、消费者fetch消息、生产者发布消息老是向leader节点发请求,不会发送给follower(broker之间,而不是broker和客户端之间协调复制)。git
三、和rocketmq同样,为了线性提升性能,每一个topic被分为partition(跟数据库的分库分表同样的道理,对业务而言透明,属于技术策略,不是业务策略),每一个partition只能被相同消费组的任何一个成员消费(因此若是topic中的message不要求有序消费的话,partition是在大流量下提高性能的关键机制),topic的分区数量(默认是1)可经过./kafka-topics.sh –zookeeper localhost:2181 -alter –partitions 5 –topic userService修改,其合理值的设置能够参考https://blog.csdn.net/kwengelie/article/details/51150114。程序员
四、kafka 0.8.x使用zk存储每一个consumer-group在每一个topic每一个partition的点位,0.9版本开始存储在专门的topic中,该topic名为"__consumer_offset",采用日志压缩存储,也就是仅存储每一个key的最新值,而非全部。 github
五、每一个topic本地有一个local log,broker会持续顺序写入。web
六、每条消息能够有key,也能够没有。有的话,用于肯定消息发往哪一个parition,不然就是轮询机制,java中是对key应用hash(实际为了重复消费的问题,通常会设置key),每一个分区内的记录是保证有序的,因此选择合适的key可以将串行转为并行,这个须要很是理解业务逻辑要求,不少时候,严格递增并不是必须(OLTP更是如此,能够根据产品、客户、商家、甚至某一次活动),只是实现简单而已。须要记住的是:是生产者而非broker决定去哪一个分区。面试
七、在replicas模式下,一致性遵循的是全一致性模式,而非过半模式,以下:
ISRs见下文所述。一个topic中的不一样parition能够为不一样broker中的leader,这种模式能够提升性能,由于读写都是leader负责。committed记录所在的截止位置也成为高水位"High Watermark"。虽然使用角度不直接care,可是partition是HA和扩展性的真正落地之处。
这里要说起的是controller,其中一个broker会被做为controller,controller主要负责处理kafka集群范围内的事件,包括leader选举、topic变化、paritions副本数跟踪、broker的变化等,主要和zk通讯。
kafka controller架构以下:
其职责能够参考https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Controller+Internals。对controller工做原理的解析,能够参考https://www.cnblogs.com/huxi2b/p/6980045.html,这篇文章总结的仍是比较到位的。
消息发送过程
那客户端是怎么知道哪一个broker是leader呢?由于每一个broker都缓存了元数据,因此在链接初始创建的时候,客户端能够从任何一个broker获取每一个topic的元数据信息,以下:
消息消费过程
核心高级API(不容许用户控制消费者和broker的交互过程)
ConsumerConnector
KafkaStream
ConsumerConfig
低级API则容许控制各个交互过程,好比从哪里开始读以及在客户端维护点位,rocketmq实现其实采用的就是高层和底层结合的API,也就是kafka 0.9以后合并的api版本。
底层API的主要接口是SimpleConsumer。
每一个消费者group会记录本身在每一个分区中的消费进度(该信息记录在专门的topic log中,见上文)。一个分区只能由被每一个消费者group中的任意一个消费者成员消费,由于通常状况下微服务都是集群部署,因此这会致使N-1个微服务节点中的topic listener空跑,这是须要注意的,可是若是当前消费者所在的服务挂了,kafka会自动选择其中一个剩下的consumer,可是若是已经消费可是ack未被kafka收到,其它consumer接管时就会重复消费,要注意幂等。想要一个topic被消费者group中的成员并行消费的话,就须要配置不低于集群成员数的partition。简单的说,就是管理粒度是消费者组(在其余MQ中称订阅者)和topic,底层消息接收粒度分区和消费者。
不只集群微服务能够从多partition受益,单JVM也能够收益,只要启动多个独立的线程,每一个线程都做为topic的consumer就能够并发处理,这主要用于SMP服务器的时候,因此当消息处理须要必定时间或消息TPS大的时候,都应该使用多parition。
An ISR is an in-sync replica. If a leader fails, an ISR is picked to be a new leader.
kafka的topic log会持续增加,因此为了保持稳定,应该按期回收。这涉及到两方面:消息的key是否会相同,它们的策略是不一样的。Log Compaction主要用于key会相同的状况,也就是非UUID做为消息的键,不然就没有意义了。其机制是根据消息保留的时间或文件大小来删除key相同的历史value,以下所示:
可知,历史版本被清了。启用compact后,topic log分为了head和tail部分,只有tail的才会被压缩,可是删除还要根据其它配置决定,以下。
kafka参数min.compaction.lag.ms控制消息至少过多久才会被压缩,delete.retention.ms控制多久会被删除,log.cleanup.policy=compact控制启用压缩,因此消费者只要在此以内进行消费,就能够保证至少看到最新记录(由于producer可能又写入了,因此至少会看到最新,也许更多)。
每一个topic能够基于时间或topic log的大小声明消息的保留时间,由下列参数决定:
属性名 | 含义 | 默认值 |
---|---|---|
log.cleanup.polict | 日志清理保存的策略只有delete和compact两种 | delete |
log.retention.hours | 日志保存的时间,能够选择hours,minutes和ms | 168(7day) |
log.retention.bytes | 删除前日志文件容许保存的最大值(任意一个达到都会执行删除) | -1 |
log.segment.delete.delay.ms | 日志文件被真正删除前的保留时间 | 60000 |
log.cleanup.interval.mins | 每隔一段时间多久调用一次清理的步骤 | 10 |
log.retention.check.interval.ms | 周期性检查是否有日志符合删除的条件(新版本使用) | 300000 |
生产者(如今面试,咱们都问如何保证发出的消息不丢失)能够经过ack设置数据一致性要求(和mysql机制相似)。ack=0(不须要ACK,至多一次), ack=all(leader和全部follows都写入成功,默认), ack=1(leader成功便可)。
能够经过在producer properties中设置,以下:
早期版本的生产者不支持“精确一次”的概念,从Kafka 0.11.0支持精确一次投递概念,它是经过引入生产者消息幂等+原子事务概念实现的,能够参考https://dzone.com/articles/exactly-once-semantics-with-apache-kafka-1。
在消费者层面,kafka支持至多一次和至少一次两种模式。
To implement “at-most-once” consumer reads a message, then saves its offset in the partition by sending it to the broker, and finally process the message. The issue with “at-most-once” is a consumer could die after saving its position but before processing the message. Then the consumer that takes over or gets restarted would leave off at the last position and message in question is never processed.
To implement “at-least-once” the consumer reads a message, process messages, and finally saves offset to the broker. The issue with “at-least-once” is a consumer could crash after processing a message but before saving last offset position. Then if the consumer is restarted or another consumer takes over, the consumer could receive the message that was already processed. The “at-least-once” is the most common set up for messaging, and it is your responsibility to make the messages idempotent, which means getting the same message twice will not cause a problem (two debits).
To implement “exactly once” on the consumer side, the consumer would need a two-phase commit between storage for the consumer position, and storage of the consumer’s message process output. Or, the consumer could store the message process output in the same location as the last offset.
kafka仅支持前两种消费者ACK,第三种须要用户本身实现,通常你们都是用第二种+幂等来实现,也就是消费者自身的一致性,经过幂等+ACK保证,就不重复阐述了。
经过以下能够保证手工管理ack提交:
props.put("enable.auto.commit", "false"); try { while (running) { ConsumerRecords<String, String> records = consumer.poll(1000); for (ConsumerRecord<String, String> record : records) System.out.println(record.offset() + ": " + record.value()); try { consumer.commitSync(); } catch (CommitFailedException e) { // application specific failure handling } } } finally { consumer.close(); }
在自动提交模式下,提交间隔由auto.commit.interval.ms肯定。各类提交模式的使用能够参考https://www.confluent.io/blog/tutorial-getting-started-with-the-new-apache-kafka-0-9-consumer-client/。
MirrorMaker是kafka集群之间同步的组件,本质上是一个生产者+消费者,以下:
如上所示。
为了显示方便,LZ设置了chroot为localKakfa,以下:
各个zk节点的含义以下示意图所示,其中kafka01就是chroot,在kafka的server.properties中设置,加载zookeeper.connect后便可。如zookeeper.connect=localhost:2181/localKafka。
主流的三种kafka监控程序分别为:
以咱们使用的KafkaOffsetMonitor为例,KafkaOffsetMonitor是Kafka的一款客户端消费监控工具,用来实时监控Kafka服务的Consumer以及它们所在的Partition中的Offset,咱们能够浏览当前的消费者组,而且每一个Topic的全部Partition的消费状况均可以一目了然。KafkaOffsetMonitor托管在Github上,能够经过Github下载。下载地址:https://github.com/quantifind/KafkaOffsetMonitor/releases,也能够从baidu网盘下载(内网的话,要使用这个,不然会缺乏从cdn加载的js)。
能够经过java -cp KafkaOffsetMonitor-assembly-0.2.0.jar com.quantifind.kafka.offsetapp.OffsetGetterWeb --zk 10.20.30.10:2181 --port 8088 --refresh 10.seconds --retain 2.days启动,各配置含义能够参考github。
如何经过java api获取全部topic?
消费和如何一次性订阅多个topic?
如何查看全部的topic?
[root@hs-test-10-20-30-11 kafka]# bin/kafka-topics.sh --zookeeper 10.20.30.10:2181 --list
global
test
查看特定topic的配置?
[root@hs-test-10-20-30-11 kafka]# bin/kafka-topics.sh --zookeeper 10.20.30.10:2181 --topic global --describe
Topic:global PartitionCount:1 ReplicationFactor:1 Configs:
Topic: global Partition: 0 Leader: 0 Replicas: 0 Isr: 0
生产者链接的时候报了下列错误
WARN [Producer clientId=console-producer] Connection to node -1 could not be established. Broker may not be available. (org.apache.kafka.clients.NetworkClient)
有两个缘由:一、kafka没有启动;二、链接串使用了非conf/server.properties里面的LISTENERS参数的值。
如何查看全部的消费者?
新的方式,也就是否是使用基于zk的客户端(kafka.consumer.Consumer.createJavaConsumerConnector、内部是bootstrap)。
[root@hs-test-10-20-30-11 kafka]# bin/kafka-consumer-groups.sh --new-consumer --bootstrap-server 10.20.30.11:9092 --list
Note: This will only show information about consumers that use the Java consumer API (non-ZooKeeper-based consumers).
老的方式:基于zk的客户端(kafka.javaapi.consumer.ZookeeperConsumerConnector,已经deprecated)。
[root@hs-test-10-20-30-11 kafka]# bin/kafka-consumer-groups.sh --zookeeper 10.20.30.10:2181 --list
Note: This will only show information about consumers that use ZooKeeper (not those using the Java consumer API).
AAA
TA50-Aggr-Logger-ConsumerGroup
console-consumer-23104
console-consumer-37858
log4j-kafka配置
增长jar包依赖:
<dependency> <groupId>org.apache.kafka</groupId> <artifactId>kafka_2.12</artifactId> <version>0.11.0.3</version> </dependency>
配置log4j2.xml,以下:
logger增长kafka appender。
<Root level="INFO" additivity="false"> <AppenderRef ref="Console"/> <AppenderRef ref="KAFKA"/> <AppenderRef ref="app_error" /> </Root>
增长kafka appender。
<Appenders> <!-- 输出错误日志到Kafka --> <Kafka name="KAFKA" topic="bomp"> <ThresholdFilter level="error" onMatch="DENY" onMismatch="NEUTRAL"/> <ThresholdFilter level="trace" onMatch="ACCEPT" onMismatch="DENY"/> <PatternLayout pattern="%d{yyyy-MM-dd HH:mm:ss,SSS}:%4p %t (%F:%L) - %m%n" /> <Property name="bootstrap.servers">10.20.30.11:9092</Property> </Kafka> </Appenders>
这样log4j配置kafka就完成了。对于c++,可使用librdkafka库,https://docs.confluent.io/2.0.0/clients/librdkafka/index.html,后续会专门出文讲解。
相关问题
消费者报:
2018-09-17 14:10:07.768 WARN 130400 --- [r-finder-thread] kafka.client.ClientUtils$ : Fetching topic metadata with correlation id 0 for topics [Set(test)] from broker [BrokerEndPoint(0,10.20.30.11,9092)] failed
java.nio.channels.ClosedChannelException: null
at kafka.network.BlockingChannel.send(BlockingChannel.scala:112) ~[kafka_2.12-0.11.0.3.jar:na]
at kafka.producer.SyncProducer.liftedTree1$1(SyncProducer.scala:80) ~[kafka_2.12-0.11.0.3.jar:na]
at kafka.producer.SyncProducer.doSend(SyncProducer.scala:79) ~[kafka_2.12-0.11.0.3.jar:na]
at kafka.producer.SyncProducer.send(SyncProducer.scala:124) ~[kafka_2.12-0.11.0.3.jar:na]
at kafka.client.ClientUtils$.fetchTopicMetadata(ClientUtils.scala:61) [kafka_2.12-0.11.0.3.jar:na]
at kafka.client.ClientUtils$.fetchTopicMetadata(ClientUtils.scala:96) [kafka_2.12-0.11.0.3.jar:na]
at kafka.consumer.ConsumerFetcherManager$LeaderFinderThread.doWork(ConsumerFetcherManager.scala:72) [kafka_2.12-0.11.0.3.jar:na]
at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:64) [kafka_2.12-0.11.0.3.jar:na]
解决方法:在server.properties里面设置下advertised.host.name,重启试试看。参考https://stackoverflow.com/questions/30606447/kafka-consumer-fetching-metadata-for-topics-failed
zk日志中报:
2018-10-08 14:13:28,297 [myid:] - INFO [ProcessThread(sid:0 cport:2181)::PrepRequestProcessor@653] - Got user-level KeeperException when processing sessionid:0x100147743c10000 type:setData cxid:0xc8 zxid:0x53 txntype:-1 reqpath:n/a Error Path:/config/topics/uft_trade Error:KeeperErrorCode = NoNode for /config/topics/uft_trade
2018-10-08 14:13:28,302 [myid:] - INFO [ProcessThread(sid:0 cport:2181)::PrepRequestProcessor@653] - Got user-level KeeperException when processing sessionid:0x100147743c10000 type:create cxid:0xc9 zxid:0x54 txntype:-1 reqpath:n/a Error Path:/config/topics Error:KeeperErrorCode = NodeExists for /config/topics
解决方法:待排查。
spring boot kafka客户端在某虚拟机服务器(物理机一直运行未发生)上运行一段时间后,瞬间cpu system 80-90%,大量下列日志:
2018-10-09 13:54:57,713 WARN ClientCnxn:1108 - Client session timed out, have not heard from server in 2682ms for sessionid 0x100175687960002 2018-10-09 13:54:57,904 WARN ClientCnxn:1108 - Client session timed out, have not heard from server in 2672ms for sessionid 0x100175687960004 2018-10-09 13:54:58,621 WARN ClientCnxn:1108 - Client session timed out, have not heard from server in 2675ms for sessionid 0x100175687960003 2018-10-09 13:54:57,232 WARN ClientCnxn:1108 - Client session timed out, have not heard from server in 2700ms for sessionid 0x100175687960007 2018-10-09 13:55:09,812 INFO ClientCnxn:1156 - Client session timed out, have not heard from server in 2672ms for sessionid 0x100175687960004, closing socket connection and attempting reconn ect 2018-10-09 13:55:02,942 WARN ClientCnxn:1108 - Client session timed out, have not heard from server in 2702ms for sessionid 0x100175687960008 2018-10-09 13:55:09,755 INFO ClientCnxn:1156 - Client session timed out, have not heard from server in 2675ms for sessionid 0x100175687960003, closing socket connection and attempting reconn ect 2018-10-09 13:55:09,789 INFO ClientCnxn:1156 - Client session timed out, have not heard from server in 2682ms for sessionid 0x100175687960002, closing socket connection and attempting reconn ect 2018-10-09 13:55:18,677 WARN ClientCnxn:1108 - Client session timed out, have not heard from server in 2675ms for sessionid 0x100175687960005 2018-10-09 13:55:11,752 WARN ClientCnxn:1108 - Client session timed out, have not heard from server in 20016ms for sessionid 0x100175687960001 2018-10-09 13:55:17,709 WARN ClientCnxn:1108 - Client session timed out, have not heard from server in 2678ms for sessionid 0x100175687960006 2018-10-09 13:55:12,779 INFO ClientCnxn:1156 - Client session timed out, have not heard from server in 2700ms for sessionid 0x100175687960007, closing socket connection and attempting reconn ect 2018-10-09 13:55:20,634 INFO ClientCnxn:1156 - Client session timed out, have not heard from server in 2702ms for sessionid 0x100175687960008, closing socket connection and attempting reconn ect 2018-10-09 13:55:22,178 INFO ClientCnxn:1156 - Client session timed out, have not heard from server in 20016ms for sessionid 0x100175687960001, closing socket connection and attempting recon nect 2018-10-09 13:58:10,244 INFO ZkClient:713 - zookeeper state changed (Disconnected) 2018-10-09 13:58:10,240 INFO ZkClient:713 - zookeeper state changed (Disconnected) 2018-10-09 13:58:10,241 INFO ZkClient:713 - zookeeper state changed (Disconnected) 2018-10-09 13:58:10,240 INFO ClientCnxn:1156 - Client session timed out, have not heard from server in 2675ms for sessionid 0x100175687960005, closing socket connection and attempting reconn ect 2018-10-09 13:58:10,243 INFO ClientCnxn:1156 - Client session timed out, have not heard from server in 2678ms for sessionid 0x100175687960006, closing socket connection and attempting reconn ect 2018-10-09 13:58:11,107 INFO ZkClient:713 - zookeeper state changed (Disconnected) 2018-10-09 13:58:40,384 INFO ClientCnxn:1032 - Opening socket connection to server localhost/127.0.0.1:2181. Will not attempt to authenticate using SASL (unknown error) 2018-10-09 13:58:40,383 INFO ClientCnxn:1032 - Opening socket connection to server localhost/127.0.0.1:2181. Will not attempt to authenticate using SASL (unknown error) 2018-10-09 13:58:40,379 INFO ClientCnxn:1032 - Opening socket connection to server localhost/127.0.0.1:2181. Will not attempt to authenticate using SASL (unknown error) 2018-10-09 13:58:40,378 INFO ZkClient:713 - zookeeper state changed (Disconnected) 2018-10-09 13:58:40,378 INFO ZkClient:713 - zookeeper state changed (Disconnected) 2018-10-09 13:58:40,377 INFO ClientCnxn:1032 - Opening socket connection to server localhost/127.0.0.1:2181. Will not attempt to authenticate using SASL (unknown error) 2018-10-09 13:59:22,082 INFO ClientCnxn:876 - Socket connection established to localhost/127.0.0.1:2181, initiating session 2018-10-09 13:59:22,084 INFO ClientCnxn:876 - Socket connection established to localhost/127.0.0.1:2181, initiating session 2018-10-09 13:59:22,099 INFO ClientCnxn:876 - Socket connection established to localhost/127.0.0.1:2181, initiating session 2018-10-09 13:59:22,108 INFO ClientCnxn:876 - Socket connection established to localhost/127.0.0.1:2181, initiating session 2018-10-09 13:59:22,130 INFO ZkClient:713 - zookeeper state changed (Disconnected) 2018-10-09 13:59:23,382 INFO ClientCnxn:1032 - Opening socket connection to server localhost/127.0.0.1:2181. Will not attempt to authenticate using SASL (unknown error) 2018-10-09 13:59:23,412 INFO ZkClient:713 - zookeeper state changed (Expired) 2018-10-09 13:59:23,412 INFO ZkClient:713 - zookeeper state changed (Expired) 2018-10-09 13:59:23,443 INFO ZooKeeper:438 - Initiating client connection, connectString= sessionTimeout=500 watcher=org.I0Itec.zkclient.ZkClient@8646db9 2018-10-09 13:59:23,411 WARN ClientCnxn:1285 - Unable to reconnect to ZooKeeper service, session 0x100175687960001 has expired 2018-10-09 13:59:32,474 INFO ZkClient:713 - zookeeper state changed (Disconnected) 2018-10-09 13:59:23,404 WARN ClientCnxn:1285 - Unable to reconnect to ZooKeeper service, session 0x100175687960007 has expired 2018-10-09 13:59:23,390 INFO ZkClient:713 - zookeeper state changed (Expired) 2018-10-09 13:59:32,477 INFO ZooKeeper:438 - Initiating client connection, connectString= sessionTimeout=500 watcher=org.I0Itec.zkclient.ZkClient@4671e53b 2018-10-09 13:59:23,390 WARN ClientCnxn:1285 - Unable to reconnect to ZooKeeper service, session 0x100175687960008 has expired 2018-10-09 13:59:23,390 INFO ZkClient:713 - zookeeper state changed (Expired) 2018-10-09 13:59:32,477 INFO ZooKeeper:438 - Initiating client connection, connectString= sessionTimeout=500 watcher=org.I0Itec.zkclient.ZkClient@6a1aab78 2018-10-09 13:59:23,389 WARN ClientCnxn:1285 - Unable to reconnect to ZooKeeper service, session 0x100175687960004 has expired 2018-10-09 13:59:32,417 INFO ClientCnxn:1032 - Opening socket connection to server localhost/127.0.0.1:2181. Will not attempt to authenticate using SASL (unknown error) 2018-10-09 13:59:23,380 INFO ClientCnxn:1032 - Opening socket connection to server localhost/127.0.0.1:2181. Will not attempt to authenticate using SASL (unknown error) 2018-10-09 13:59:23,446 INFO ZooKeeper:438 - Initiating client connection, connectString= sessionTimeout=30000 watcher=org.I0Itec.zkclient.ZkClient@dc24521 2018-10-09 13:59:41,829 INFO ClientCnxn:1154 - Unable to reconnect to ZooKeeper service, session 0x100175687960004 has expired, closing socket connection 2018-10-09 13:59:41,832 INFO ZkClient:936 - Waiting for keeper state SyncConnected 2018-10-09 13:59:41,829 INFO ClientCnxn:1154 - Unable to reconnect to ZooKeeper service, session 0x100175687960008 has expired, closing socket connection 2018-10-09 13:59:41,831 INFO ClientCnxn:876 - Socket connection established to localhost/127.0.0.1:2181, initiating session 2018-10-09 13:59:41,830 INFO ClientCnxn:1154 - Unable to reconnect to ZooKeeper service, session 0x100175687960007 has expired, closing socket connection 2018-10-09 13:59:41,830 INFO ClientCnxn:1154 - Unable to reconnect to ZooKeeper service, session 0x100175687960001 has expired, closing socket connection 2018-10-09 13:59:41,860 INFO ClientCnxn:876 - Socket connection established to localhost/127.0.0.1:2181, initiating session 2018-10-09 13:59:42,585 INFO ClientCnxn:876 - Socket connection established to localhost/127.0.0.1:2181, initiating session 2018-10-09 13:59:42,810 INFO ClientCnxn:1032 - Opening socket connection to server localhost/127.0.0.1:2181. Will not attempt to authenticate using SASL (unknown error) 2018-10-09 13:59:42,835 INFO ClientCnxn:876 - Socket connection established to localhost/127.0.0.1:2181, initiating session 2018-10-09 14:00:31,813 WARN ClientCnxn:1108 - Client session timed out, have not heard from server in 48978ms for sessionid 0x100175687960002 2018-10-09 14:00:31,825 WARN ClientCnxn:1108 - Client session timed out, have not heard from server in 49644ms for sessionid 0x100175687960005 2018-10-09 14:00:31,825 INFO ClientCnxn:1156 - Client session timed out, have not heard from server in 49644ms for sessionid 0x100175687960005, closing socket connection and attempting recon nect 2018-10-09 14:00:31,827 WARN ClientCnxn:1108 - Client session timed out, have not heard from server in 49968ms for sessionid 0x100175687960006 2018-10-09 14:00:31,827 INFO ClientCnxn:1156 - Client session timed out, have not heard from server in 49968ms for sessionid 0x100175687960006, closing socket connection and attempting recon nect 2018-10-09 14:00:31,842 WARN ClientCnxn:1108 - Client session timed out, have not heard from server in 50011ms for sessionid 0x100175687960003 2018-10-09 14:00:31,868 INFO ClientCnxn:1156 - Client session timed out, have not heard from server in 50011ms for sessionid 0x100175687960003, closing socket connection and attempting recon nect 2018-10-09 14:00:31,853 INFO ClientCnxn:1156 - Client session timed out, have not heard from server in 48978ms for sessionid 0x100175687960002, closing socket connection and attempting recon nect 2018-10-09 14:00:31,885 INFO ClientCnxn:1032 - Opening socket connection to server localhost/127.0.0.1:2181. Will not attempt to authenticate using SASL (unknown error) 2018-10-09 14:00:31,886 INFO ClientCnxn:876 - Socket connection established to localhost/127.0.0.1:2181, initiating session 2018-10-09 14:00:31,887 INFO ClientCnxn:1032 - Opening socket connection to server localhost/127.0.0.1:2181. Will not attempt to authenticate using SASL (unknown error) 2018-10-09 14:00:31,887 INFO ClientCnxn:876 - Socket connection established to localhost/127.0.0.1:2181, initiating session 2018-10-09 14:00:31,907 INFO ClientCnxn:519 - EventThread shut down for session: 0x100175687960001 2018-10-09 14:00:31,907 INFO ClientCnxn:519 - EventThread shut down for session: 0x100175687960008 2018-10-09 14:00:31,908 INFO ClientCnxn:519 - EventThread shut down for session: 0x100175687960004 2018-10-09 14:00:31,944 INFO ClientCnxn:519 - EventThread shut down for session: 0x100175687960007 2018-10-09 14:00:33,391 INFO ClientCnxn:1032 - Opening socket connection to server localhost/127.0.0.1:2181. Will not attempt to authenticate using SASL (unknown error) 2018-10-09 14:00:33,396 INFO ClientCnxn:1032 - Opening socket connection to server localhost/127.0.0.1:2181. Will not attempt to authenticate using SASL (unknown error) 2018-10-09 14:00:33,424 WARN ClientCnxn:1108 - Client session timed out, have not heard from server in 1336ms for sessionid 0x0 2018-10-09 14:00:33,430 INFO ClientCnxn:1299 - Session establishment complete on server localhost/127.0.0.1:2181, sessionid = 0x10017568796000b, negotiated timeout = 30000 2018-10-09 14:00:33,517 INFO ClientCnxn:876 - Socket connection established to localhost/127.0.0.1:2181, initiating session 2018-10-09 14:00:33,516 INFO ZkClient:713 - zookeeper state changed (SyncConnected) 2018-10-09 14:00:34,399 INFO ClientCnxn:1032 - Opening socket connection to server localhost/127.0.0.1:2181. Will not attempt to authenticate using SASL (unknown error) 2018-10-09 14:00:34,354 INFO ClientCnxn:1156 - Client session timed out, have not heard from server in 1336ms for sessionid 0x0, closing socket connection and attempting reconnect 2018-10-09 14:00:34,433 INFO ClientCnxn:876 - Socket connection established to localhost/127.0.0.1:2181, initiating session 2018-10-09 14:00:34,475 INFO ClientCnxn:1032 - Opening socket connection to server localhost/127.0.0.1:2181. Will not attempt to authenticate using SASL (unknown error) 2018-10-09 14:00:34,476 INFO ClientCnxn:876 - Socket connection established to localhost/127.0.0.1:2181, initiating session 2018-10-09 14:00:34,485 WARN ClientCnxn:1108 - Client session timed out, have not heard from server in 968ms for sessionid 0x0 2018-10-09 14:00:34,488 INFO ClientCnxn:1156 - Client session timed out, have not heard from server in 968ms for sessionid 0x0, closing socket connection and attempting reconnect 2018-10-09 14:00:37,472 INFO ClientCnxn:1032 - Opening socket connection to server localhost/127.0.0.1:2181. Will not attempt to authenticate using SASL (unknown error) 2018-10-09 14:00:37,484 INFO ClientCnxn:876 - Socket connection established to localhost/127.0.0.1:2181, initiating session 2018-10-09 14:00:37,487 INFO ClientCnxn:876 - Socket connection established to localhost/127.0.0.1:2181, initiating session 2018-10-09 14:00:37,488 INFO ClientCnxn:1032 - Opening socket connection to server localhost/127.0.0.1:2181. Will not attempt to authenticate using SASL (unknown error) 2018-10-09 14:00:37,489 INFO ClientCnxn:876 - Socket connection established to localhost/127.0.0.1:2181, initiating session 2018-10-09 14:00:37,479 WARN ClientCnxn:1285 - Unable to reconnect to ZooKeeper service, session 0x100175687960006 has expired 2018-10-09 14:00:37,495 INFO ClientCnxn:1154 - Unable to reconnect to ZooKeeper service, session 0x100175687960006 has expired, closing socket connection 2018-10-09 14:00:37,447 INFO ZkClient:713 - zookeeper state changed (Expired) 2018-10-09 14:00:37,479 INFO ZkClient:713 - zookeeper state changed (Expired) 2018-10-09 14:00:37,519 INFO ZooKeeper:438 - Initiating client connection, connectString= sessionTimeout=500 watcher=org.I0Itec.zkclient.ZkClient@69b0fd6f 2018-10-09 14:00:37,519 INFO ZooKeeper:438 - Initiating client connection, connectString= sessionTimeout=500 watcher=org.I0Itec.zkclient.ZkClient@4a87761d 2018-10-09 14:00:37,446 WARN ClientCnxn:1285 - Unable to reconnect to ZooKeeper service, session 0x100175687960005 has expired 2018-10-09 14:00:37,519 INFO ClientCnxn:1154 - Unable to reconnect to ZooKeeper service, session 0x100175687960005 has expired, closing socket connection 2018-10-09 14:00:37,765 INFO ZkClient:713 - zookeeper state changed (Expired) 2018-10-09 14:00:37,780 INFO ZkClient:713 - zookeeper state changed (Expired) 2018-10-09 14:00:37,780 WARN ClientCnxn:1285 - Unable to reconnect to ZooKeeper service, session 0x100175687960003 has expired 2018-10-09 14:00:37,791 INFO ClientCnxn:1154 - Unable to reconnect to ZooKeeper service, session 0x100175687960003 has expired, closing socket connection 2018-10-09 14:00:38,194 INFO ZooKeeper:438 - Initiating client connection, connectString= sessionTimeout=500 watcher=org.I0Itec.zkclient.ZkClient@3aeaafa6 2018-10-09 14:00:37,995 WARN ClientCnxn:1108 - Client session timed out, have not heard from server in 507ms for sessionid 0x0 2018-10-09 14:00:52,148 INFO ClientCnxn:1156 - Client session timed out, have not heard from server in 507ms for sessionid 0x0, closing socket connection and attempting reconnect 2018-10-09 14:00:38,198 INFO ZooKeeper:438 - Initiating client connection, connectString= sessionTimeout=500 watcher=org.I0Itec.zkclient.ZkClient@491cc5c9 2018-10-09 14:00:52,141 INFO ClientCnxn:519 - EventThread shut down for session: 0x100175687960006 2018-10-09 14:00:52,128 INFO ClientCnxn:1032 - Opening socket connection to server localhost/127.0.0.1:2181. Will not attempt to authenticate using SASL (unknown error) 2018-10-09 14:00:52,154 INFO ClientCnxn:876 - Socket connection established to localhost/127.0.0.1:2181, initiating session 2018-10-09 14:00:52,126 INFO ClientCnxn:1032 - Opening socket connection to server localhost/127.0.0.1:2181. Will not attempt to authenticate using SASL (unknown error) 2018-10-09 14:00:52,179 INFO ClientCnxn:876 - Socket connection established to localhost/127.0.0.1:2181, initiating session 2018-10-09 14:00:38,010 WARN ClientCnxn:1285 - Unable to reconnect to ZooKeeper service, session 0x100175687960002 has expired 2018-10-09 14:00:52,231 INFO ClientCnxn:1154 - Unable to reconnect to ZooKeeper service, session 0x100175687960002 has expired, closing socket connection 2018-10-09 14:00:52,683 WARN ClientCnxn:1108 - Client session timed out, have not heard from server in 504ms for sessionid 0x0 2018-10-09 14:05:12,238 INFO ClientCnxn:1032 - Opening socket connection to server localhost/127.0.0.1:2181. Will not attempt to authenticate using SASL (unknown error) 2018-10-09 14:05:12,176 INFO ClientCnxn:1032 - Opening socket connection to server localhost/127.0.0.1:2181. Will not attempt to authenticate using SASL (unknown error) 2018-10-09 14:08:21,078 INFO ClientCnxn:519 - EventThread shut down for session: 0x100175687960002 2018-10-09 14:05:12,113 WARN ClientCnxn:1108 - Client session timed out, have not heard from server in 259911ms for sessionid 0x10017568796000b 2018-10-09 14:08:21,107 INFO ClientCnxn:1156 - Client session timed out, have not heard from server in 259911ms for sessionid 0x10017568796000b, closing socket connection and attempting reco nnect 2018-10-09 14:05:12,098 INFO ClientCnxn:519 - EventThread shut down for session: 0x100175687960003 2018-10-09 14:00:52,677 WARN ClientCnxn:1108 - Client session timed out, have not heard from server in 501ms for sessionid 0x0 2018-10-09 14:08:21,107 INFO ClientCnxn:1156 - Client session timed out, have not heard from server in 501ms for sessionid 0x0, closing socket connection and attempting reconnect
经大概看了下帖子https://blog.csdn.net/xjping0794/article/details/77784171的内容,查看该段时间系统io,确实很高,高达50%,以下:
14时00分28秒 sda 3062.38 922268.58 670.77 301.38 5.17 1.71 0.16 49.44 14时00分28秒 ol-root 3111.77 922266.41 495.79 296.54 5.29 1.70 0.16 49.43 14时00分28秒 ol-swap 22.04 2.09 174.24 8.00 0.13 5.80 0.15 0.33 14时11分16秒 sda 5432.75 1537105.34 768.61 283.07 19.06 3.53 0.17 91.53 14时11分16秒 ol-root 5513.26 1537106.56 731.82 278.93 19.55 3.54 0.17 91.52 14时11分16秒 ol-swap 5.07 4.68 35.87 8.00 0.01 2.27 0.19 0.10 14时11分16秒 DEV tps rd_sec/s wr_sec/s avgrq-sz avgqu-sz await svctm %util 14时20分01秒 sda 2784.00 795332.59 462.60 285.85 10.89 3.93 0.18 50.09 14时20分01秒 ol-root 2827.44 795311.85 414.30 281.43 11.18 3.95 0.18 50.07 14时20分01秒 ol-swap 6.96 12.98 42.72 8.00 0.05 7.80 0.18 0.12 14时30分01秒 sda 3.13 12.42 59.59 23.04 0.00 0.57 0.44 0.14
可是这段时间没有东西特别在运行,这就比较奇怪了,那会儿一会儿也忘了用iotop看下是哪一个进程所致。上述帖子提到的几点是:
关于ZK日志存放,官网给出以下建议:
Having a dedicated log devicehas a large impact on throughput and stable latencies. It is highly recommenedto dedicate a log device and set dataLogDir to point to a directory on thatdevice, and then make sure to point dataDir to a directory not residing on thatdevice.
在ZOO.CFG中增长:
forceSync=no
默认是开启的,为避免同步延迟问题,ZK接收到数据后会马上去讲当前状态信息同步到磁盘日志文件中,同步完成后才会应答。将此项关闭后,客户端链接能够获得快速响应(这一点在有BMU的服务器上问题不大)。
再看下zk服务器的日志,差很少时间开始出现大量CancelledKeyException:
2018-10-09 13:56:36,712 [myid:] - INFO [SyncThread:0:NIOServerCnxn@1040] - Closed socket connection for client /127.0.0.1:14926 which had sessionid 0x100175687960008 2018-10-09 13:56:43,857 [myid:] - INFO [SyncThread:0:NIOServerCnxn@1040] - Closed socket connection for client /127.0.0.1:14924 which had sessionid 0x100175687960006 2018-10-09 13:56:49,783 [myid:] - INFO [SyncThread:0:NIOServerCnxn@1040] - Closed socket connection for client /127.0.0.1:14919 which had sessionid 0x100175687960001 2018-10-09 13:56:49,816 [myid:] - WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@236] - Ignoring unexpected runtime exception java.nio.channels.CancelledKeyException at sun.nio.ch.SelectionKeyImpl.ensureValid(SelectionKeyImpl.java:73) at sun.nio.ch.SelectionKeyImpl.readyOps(SelectionKeyImpl.java:87) at org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:205) at java.lang.Thread.run(Thread.java:748) 2018-10-09 13:58:54,331 [myid:] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@941] - Client attempting to renew session 0x100175687960000 at /192.168.223.137:23459 2018-10-09 13:58:54,377 [myid:] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@686] - Invalid session 0x100175687960000 for client /192.168.223.137:23459, probably expired 2018-10-09 13:58:54,401 [myid:] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@215] - Accepted socket connection from /192.168.223.137:23485 2018-10-09 13:58:54,441 [myid:] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@215] - Accepted socket connection from /192.168.223.137:23494 2018-10-09 13:58:56,314 [myid:] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1040] - Closed socket connection for client /192.168.223.137:23459 which had sessionid 0x10017 5687960000 2018-10-09 13:58:56,336 [myid:] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@941] - Client attempting to renew session 0x100175687960000 at /192.168.223.137:23485 2018-10-09 13:58:56,392 [myid:] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@686] - Invalid session 0x100175687960000 for client /192.168.223.137:23485, probably expired 2018-10-09 13:58:57,890 [myid:] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@215] - Accepted socket connection from /192.168.223.137:23497 2018-10-09 13:58:59,480 [myid:] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1040] - Closed socket connection for client /192.168.223.137:23485 which had sessionid 0x10017 5687960000 2018-10-09 13:59:00,383 [myid:] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@941] - Client attempting to renew session 0x100175687960000 at /192.168.223.137:23494 2018-10-09 13:59:00,910 [myid:] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@686] - Invalid session 0x100175687960000 for client /192.168.223.137:23494, probably expired 2018-10-09 13:59:02,140 [myid:] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@215] - Accepted socket connection from /192.168.223.137:23507 2018-10-09 13:59:03,286 [myid:] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@941] - Client attempting to renew session 0x100175687960000 at /192.168.223.137:23497 2018-10-09 13:59:03,671 [myid:] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1040] - Closed socket connection for client /192.168.223.137:23494 which had sessionid 0x10017 5687960000 2018-10-09 13:59:03,905 [myid:] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@686] - Invalid session 0x100175687960000 for client /192.168.223.137:23497, probably expired 2018-10-09 13:59:05,341 [myid:] - WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@236] - Ignoring unexpected runtime exception java.nio.channels.CancelledKeyException at sun.nio.ch.SelectionKeyImpl.ensureValid(SelectionKeyImpl.java:73) at sun.nio.ch.SelectionKeyImpl.readyOps(SelectionKeyImpl.java:87) at org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:205) at java.lang.Thread.run(Thread.java:748) 2018-10-09 13:59:06,862 [myid:] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@215] - Accepted socket connection from /192.168.223.137:23511 2018-10-09 13:59:10,044 [myid:] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@941] - Client attempting to renew session 0x100175687960000 at /192.168.223.137:23507 2018-10-09 13:59:10,267 [myid:] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1040] - Closed socket connection for client /192.168.223.137:23497 which had sessionid 0x10017 5687960000 2018-10-09 13:59:10,285 [myid:] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@686] - Invalid session 0x100175687960000 for client /192.168.223.137:23507, probably expired 2018-10-09 13:59:10,286 [myid:] - WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@236] - Ignoring unexpected runtime exception java.nio.channels.CancelledKeyException at sun.nio.ch.SelectionKeyImpl.ensureValid(SelectionKeyImpl.java:73) at sun.nio.ch.SelectionKeyImpl.readyOps(SelectionKeyImpl.java:87) at org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:205) at java.lang.Thread.run(Thread.java:748) 2018-10-09 13:59:10,287 [myid:] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1040] - Closed socket connection for client /192.168.223.137:23507 which had sessionid 0x10017 5687960000 2018-10-09 13:59:10,287 [myid:] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@941] - Client attempting to renew session 0x100175687960000 at /192.168.223.137:23511 2018-10-09 13:59:10,287 [myid:] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@686] - Invalid session 0x100175687960000 for client /192.168.223.137:23511, probably expired 2018-10-09 13:59:10,313 [myid:] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@215] - Accepted socket connection from /192.168.223.137:23519 2018-10-09 13:59:10,313 [myid:] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1040] - Closed socket connection for client /192.168.223.137:23511 which had sessionid 0x10017 5687960000 2018-10-09 13:59:10,314 [myid:] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@215] - Accepted socket connection from /192.168.223.137:23524 2018-10-09 13:59:10,314 [myid:] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@941] - Client attempting to renew session 0x100175687960000 at /192.168.223.137:23519 2018-10-09 13:59:10,314 [myid:] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@686] - Invalid session 0x100175687960000 for client /192.168.223.137:23519, probably expired 2018-10-09 13:59:10,315 [myid:] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@941] - Client attempting to renew session 0x100175687960000 at /192.168.223.137:23524 2018-10-09 13:59:10,315 [myid:] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1040] - Closed socket connection for client /192.168.223.137:23519 which had sessionid 0x10017 5687960000 2018-10-09 13:59:10,316 [myid:] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@686] - Invalid session 0x100175687960000 for client /192.168.223.137:23524, probably expired 2018-10-09 13:59:10,321 [myid:] - WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@236] - Ignoring unexpected runtime exception java.nio.channels.CancelledKeyException at sun.nio.ch.SelectionKeyImpl.ensureValid(SelectionKeyImpl.java:73) at sun.nio.ch.SelectionKeyImpl.readyOps(SelectionKeyImpl.java:87) at org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:205) at java.lang.Thread.run(Thread.java:748)
上述帖子中提到在3.4.8中修复,咱们用的3.4.12。进一步查找,有些说起写日志延迟很大,例如“fsync-ing the write ahead log in SyncThread:0 took 8001ms which will adversely effect operation latency. See the ZooKeeper troubleshooting guide
”可是日志中并无看到该告警。决定加上forceSync=no试试看,参考https://www.jianshu.com/p/73eec030db86。
至于日志中的超时时间有长、有短,这是tickTime有关,能够解释,不作详细说明。
zk日志中大量下列错误信息:
id:0x9d zxid:0x42 txntype:-1 reqpath:n/a Error Path:/config/topics Error:KeeperErrorCode = NodeExists for /config/topics 2018-10-09 12:01:07,918 [myid:] - INFO [ProcessThread(sid:0 cport:2181)::PrepRequestProcessor@653] - Got user-level KeeperException when processing sessionid:0x100175687960000 type:create cx id:0xa5 zxid:0x45 txntype:-1 reqpath:n/a Error Path:/brokers/topics/uft_individual/partitions/0 Error:KeeperErrorCode = NoNode for /brokers/topics/uft_individual/partitions/0 2018-10-09 12:01:07,921 [myid:] - INFO [ProcessThread(sid:0 cport:2181)::PrepRequestProcessor@653] - Got user-level KeeperException when processing sessionid:0x100175687960000 type:create cx id:0xa6 zxid:0x46 txntype:-1 reqpath:n/a Error Path:/brokers/topics/uft_individual/partitions Error:KeeperErrorCode = NoNode for /brokers/topics/uft_individual/partitions 2018-10-09 12:01:17,740 [myid:] - INFO [ProcessThread(sid:0 cport:2181)::PrepRequestProcessor@653] - Got user-level KeeperException when processing sessionid:0x100175687960000 type:setData c xid:0xaf zxid:0x4a txntype:-1 reqpath:n/a Error Path:/config/topics/uft_splitter Error:KeeperErrorCode = NoNode for /config/topics/uft_splitter 2018-10-09 12:01:17,741 [myid:] - INFO [ProcessThread(sid:0 cport:2181)::PrepRequestProcessor@653] - Got user-level KeeperException when processing sessionid:0x100175687960000 type:create cx id:0xb0 zxid:0x4b txntype:-1 reqpath:n/a Error Path:/config/topics Error:KeeperErrorCode = NodeExists for /config/topics 2018-10-09 12:01:17,753 [myid:] - INFO [ProcessThread(sid:0 cport:2181)::PrepRequestProcessor@653] - Got user-level KeeperException when processing sessionid:0x100175687960000 type:create cx id:0xb8 zxid:0x4e txntype:-1 reqpath:n/a Error Path:/brokers/topics/uft_splitter/partitions/0 Error:KeeperErrorCode = NoNode for /brokers/topics/uft_splitter/partitions/0 2018-10-09 12:01:17,754 [myid:] - INFO [ProcessThread(sid:0 cport:2181)::PrepRequestProcessor@653] - Got user-level KeeperException when processing sessionid:0x100175687960000 type:create cx id:0xb9 zxid:0x4f txntype:-1 reqpath:n/a Error Path:/brokers/topics/uft_splitter/partitions Error:KeeperErrorCode = NoNode for /brokers/topics/uft_splitter/partitions 2018-10-09 12:01:35,671 [myid:] - INFO [ProcessThread(sid:0 cport:2181)::PrepRequestProcessor@653] - Got user-level KeeperException when processing sessionid:0x100175687960000 type:setData c xid:0xc2 zxid:0x53 txntype:-1 reqpath:n/a Error Path:/config/topics/cres_global Error:KeeperErrorCode = NoNode for /config/topics/cres_global
参考https://github.com/mesos/kafka/issues/136,但是kafka服务一直正常启动着啊(对比启动日志也能够看出确实已经启动了)。https://stackoverflow.com/questions/34393837/zookeeper-kafka-error-keepererrorcode-nodeexists还有一个缘由,是由于zk的data未删除的缘由,可咱们是全新安装过一会也有这个问题。最后查看https://stackoverflow.com/questions/43559328/got-user-level-keeperexception-when-processing,以下:
The message you see is not an error yet. It is a potential exception raised by Zookeeper that original object making a request has to handle.
When you start a fresh Kafka, it gets a bunch of NoNode
messages. It's normal because some paths don't exist yet. At the same time, you get also NodeExists
messages as the path exists already.
Example: Error:KeeperErrorCode = NoNode for /config/topics/test
It's because Kafka sends a request to Zookeeper for this path. But it doesn't exist. That's OK, because you are trying to create it. So, you see "INFO" from Zookeeper but no error from Kafka. Once Kafka gets this message, it tries to create your topic. To do so, it needs to access a path in Zookeeper for topics. So, it sends a request and gets an error NodeExists for /config/topics
. Again, it's normal and Kafka ignores the message.
Long story short, these are all non-issue messages and you should skip them. If it bothers you, change logging configuration of Zookeeper (it's not recommended though).
其实就是提示性信息,不用管它就行了,kafka会直接忽略该信息。
编译不报错,启动时报下列错误:
java.lang.NoSuchMethodError: org.apache.kafka.clients.consumer.KafkaConsumer.subscribe
缘由:编译依赖的kafka client版本和运行时不一致如0.9.1和0.11.0,典型的例如间接依赖,对比下编译依赖的版本和运行时打出来的版本。
默认状况下,执行
./kafka-server-start.sh ../config/server.properties的时候,进程是前台模式的,意味着关掉控制台,kafka就停了。因此须要加-daemon选项之后台模式启动。以下:
./kafka-server-start.sh -daemon ../config/server.properties
咱们使用的Kafka的api,
调用的是KafkaConsumer的poll方法:
给方法调用了pollOnce方法:
该方法又调用了ConsumerCoordinator 的poll方法:
该方法的最后调用了自动offset同步的方法:
关键就在这个方法,这个方法只有在poll的时候才会调用,若是,数据处理时间操过poll的最大时间,就会致使本文开始的错误,而不能提交offset.
Kafka最核心的思想是使用磁盘,而不是使用内存,可能全部人都会认为,内存的速度必定比磁盘快,我也不例外。
在看了Kafka的设计思想,查阅了相应资料再加上本身的测试后,发现磁盘的顺序读写速度(Cassandra, LevelDB, RocksDB也都是这种策略)和内存持平。
并且Linux对于磁盘的读写优化也比较多,包括read-ahead和write-behind,磁盘缓存等。
若是在内存作这些操做的时候,一个是JAVA对象的内存开销很大,另外一个是随着堆内存数据的增多,JAVA的GC时间会变得很长,而利用OS的page cache,gc的开销就节省了很多(JNI是否也能够达到相似效果???,起码netty中的ByteBuffer及Unsafe一大部分是的)。
使用磁盘操做有如下几个好处:
消息删除方面,使用实时标记代替删除。
发送方面,使用批量发送代替实时发送。
在jvm方面,默认kafka用的是cms gc,能够考虑g1垃圾回收期,调整为:-server -XX:+UseG1GC -XX:MaxGCPauseMillis=200 -XX:InitiatingHeapOccupancyPercent=35 -XX:+DisableExplicitGC -Djava.awt.headless=true
Kafka uses tombstones instead of deleting records right away
❖ Kafka producers support record batching. by the size of records and auto-flushed based on time
❖ Batching is good for network IO throughput.
❖ Batching speeds up throughput drastically
With Kafka consumers pull data from brokers
replica.lag.time.max.ms > lag时,leader就把follow从ISRs踢掉
If all replicas are down for a partition, Kafka chooses first replica (not
necessarily in ISR set) that comes alive as the leader
❖ Config unclean.leader.election.enable=true is default
❖ If unclean.leader.election.enable=false, if all replicas are down for a
partition, Kafka waits for the ISR member that comes alive as new
leader.
Outside of using a single ensemble(协调器,zookeeper) for multiple Kafka clusters, it is not recommended
to share the ensemble with other applications, if it can be avoided. Kafka is sensitive
to Zookeeper latency and timeouts, and an interruption in communications with the
ensemble will cause the brokers to behave unpredictably. This can easily cause multiple
brokers to go offline at the same time, should they lose Zookeeper connections,
which will result in offline partitions. It also puts stress on the cluster controller,
which can show up as subtle errors long after the interruption has passed, such as
when trying to perform a controlled shutdown of a broker. Other applications that
can put stress on the Zookeeper ensemble, either through heavy usage or improper
operations, should be segregated to their own ensemble.
不一样于rabbitmq、active mq,kafka默认不支持MQTT协议,若是但愿现有和rabbitmq经过MQTT对接的应用无缝切换,要么本身写gateway,要么借用三方插件,比较正统的主要有https://www.confluent.io/connector/kafka-connect-mqtt/,https://www.infoq.cn/article/fdbcrh6I*9ajCWLvippC
https://kafka.apache.org/documentation(single page模式)
http://cloudurable.com/blog/kafka-architecture/index.html
https://cwiki.apache.org/confluence/display/KAFKA/
http://cloudurable.com/ppt/4-kafka-detailed-architecture.pdf
Learning Apache Kafka Second Edition (针对0.8.x版本)
Kafka: The Definitive Guide(针对0.9.x版本)
https://cwiki.apache.org/confluence/display/KAFKA/Kafka+papers+and+presentations