1.基本结构介绍java
flume是三层架构,agent,collector,storage。每一层均可水平扩展。node
其中,agent就是数据采集方;collector是数据整合方;storage是各类数据落地方,如hdfs。linux
前二者都是由source和sink组成,source是数据读取组件,sink是数据分发组件。web
前二者做为不一样类型node统一归master管理。可在master shell活web中动态配置。shell
2.自带的sourceapache
text 文件,按行发送windows
tail 探测新产生数据,安航发送服务器
fsyslog Tcp(5140) 监听这个端口架构
tailDir("dirname"[, fileregex=".*"[, startFromEnd=false[, recurseDepth=0]]]):监听目录中的文件末尾,使用正则去选定须要监听的文件(不包含目录),recurseDepth为递归监听其下子目录的深度工具
3.想采集windows服务器上的日志文件,因而,研究了flume怎么在windows下部署。
a1.sources = r1 a1.sinks = k1 a1.channels = c1 # Describe/configure the source a1.sources.r1.type = exec a1.sources.r1.channels = c1 a1.sources.r1.command = tail -f D:\hadoopResouce\flume\logs\log_exec_tail.txt # Describe the sink a1.sinks.k1.type = logger # Use a channel which buffers events in memory a1.channels.c1.type = memory a1.channels.c1.capacity = 1000 a1.channels.c1.transactionCapacity = 100 # Bind the source and sink to the channel a1.sources.r1.channels = c1 a1.sinks.k1.channel = c1
其实里面须要修改的就是须要采集的文件全路径
set FLUME_HOME=D:\hadoopResouce\flume set JAVA_HOME=D:\jdk1.8 set JAVA="%JAVA_HOME%\bin\java.exe" set JAVA_OPTS=-Xmx1024m set CONF=%FLUME_HOME%\conf\flume-conf.properties set AGENT=agent %JAVA% %JAVA_OPTS% -Dflume.monitoring.type=http -Dflume.monitoring.port=34545 -Dlog4j.configuration=file:\\\%FLUME_HOME%\conf\log4j.properties -cp "%FLUME_HOME%\lib\*" org.apache.flume.node.Application -f %FLUME_HOME%\conf\flume-conf.properties -n %AGENT%
须要注意的是几处路径的配置
{ SOURCE.seqGenSrc: { EventReceivedCount: "0", Type: "SOURCE", AppendBatchAcceptedCount: "0", EventAcceptedCount: "2532", AppendReceivedCount: "0", StartTime: "1468487063825", AppendAcceptedCount: "0", OpenConnectionCount: "0", AppendBatchReceivedCount: "0", StopTime: "0" }, CHANNEL.memoryChannel: { ChannelCapacity: "100", ChannelFillPercentage: "99.0", Type: "CHANNEL", EventTakeSuccessCount: "2423", ChannelSize: "99", StartTime: "1468487063801", EventTakeAttemptCount: "2424", EventPutAttemptCount: "2524", EventPutSuccessCount: "2523", StopTime: "0" } }
4.读取目录新增长文件内容的配置
a1.sources.r1.type = spooldir a1.sources.r1.channels = c1 a1.sources.r1.spoolDir = /home/master/yang/flume/logs a1.sources.r1.fileHeader = true
5.写入kafka的配置
这里踩的坑较多,一种多是老版本配置,还有一种就是有的人没通过试验的东西就贴出来了。引觉得戒,本身测试经过再贴,保持严谨,省得误人。
//看好类全路径是否都对,网上有不靠谱的写法,org.apache.flume.plugins.SinglePartition 使人鄙视 a1.sinks.k1.type = org.apache.flume.sink.kafka.KafkaSink //不知道为何不少人这里项写为:a1.sinks.k1.metadata.broker.list,多是以前版本有这个吧 a1.sinks.k1.brokerList =master:9092,slave1:9092,slave2:9092 a1.sinks.k1.serializer.class=kafka.serializer.StringEncoder //还有这玩意,这写法太坑人了a1.sinks.k1.custom.topic.name a1.sinks.k1.topic=kafka-storm-cluster a1.sinks.k1.channel=c1
6.读取telnet监控内容配置sink
a1.sources.r1.type= netcat a1.sources.r1.bind= localhost a1.sources.r1.port= 44444
7.经常使用命令:
启动: bin/flume-ng agent -c ./conf/ -f conf/spool.conf -Dflume.root.logger=DEBUG,console -n a1