Spooling Directory Source:apache
This source lets you ingest data by placing files to be ingested into a “spooling” directory on disk.
This source will watch the specified directory for new files, and will parse events out of new files as they appear.
The event parsing logic is pluggable. After a given file has been fully read into the channel, it is renamed to indicate completion (or optionally deleted).缓存
Unlike the Exec source, this source is reliable and will not miss data, even if Flume is restarted or killed. In exchange for this reliability, only immutable, uniquely-named files must be dropped into the spooling directory. Flume tries to detect these problem conditions and will fail loudly if they are violated:app
#定义三大组件的名称
ag1.sources = source1
ag1.sinks = sink1
ag1.channels = channel1this
# 配置source组件
ag1.sources.source1.type = spooldir
ag1.sources.source1.spoolDir = /root/log/
ag1.sources.source1.fileSuffix=.FINISHED
ag1.sources.source1.deserializer.maxLineLength=5120rest
# 配置sink组件
ag1.sinks.sink1.type = hdfs
ag1.sinks.sink1.hdfs.path =hdfs://hdp-01:9000/access_log/%y-%m-%d/%H-%M
ag1.sinks.sink1.hdfs.filePrefix = app_log
ag1.sinks.sink1.hdfs.fileSuffix = .log
ag1.sinks.sink1.hdfs.batchSize= 100
ag1.sinks.sink1.hdfs.fileType = DataStream
ag1.sinks.sink1.hdfs.writeFormat =Text日志
## roll:滚动切换:控制写文件的切换规则
ag1.sinks.sink1.hdfs.rollSize = 512000 ## 按文件体积(字节)来切
ag1.sinks.sink1.hdfs.rollCount = 1000000 ## 按event条数切
ag1.sinks.sink1.hdfs.rollInterval = 60 ## 按时间间隔切换文件orm
## 控制生成目录的规则
ag1.sinks.sink1.hdfs.round = true
ag1.sinks.sink1.hdfs.roundValue = 10
ag1.sinks.sink1.hdfs.roundUnit = minute事务
ag1.sinks.sink1.hdfs.useLocalTimeStamp = trueci
# channel组件配置
ag1.channels.channel1.type = memory
ag1.channels.channel1.capacity = 500000 ## event条数
ag1.channels.channel1.transactionCapacity = 600 ##flume事务控制所须要的缓存容量600条eventkafka
# 绑定source、channel和sink之间的链接
ag1.sources.source1.channels = channel1
ag1.sinks.sink1.channel = channel1
bin/flume-ng agent -c conf/ -f dir-hdfs.conf -n ag1 -Dflume.root.logger=INFO,console
-c : 启动配置
-f : 采集问价
-n : agent名字
-Dflume.root.logger=INFO,console : 日志打印纸控制台
bin/flume-ng agent -n a2 -f /usr/local/devtools/flume/apache-flume-1.7.0-bin/conf/flume-kafkaChannel.properties -Dflume.root.logger=INFO,console