事件序列化器 Flume 的无数据丢失保证,Channel 和事务

小结:html

一、Flume 的持久性保证依赖于使用的持久性Channel 的保证数据库

 

 

经过事件序列化器将Flume事件转化为外部存储格式apache

主要的事件序列化器:编程

一、文本网络

二、带有头信息的文本app

三、Avro序列化框架

 

Flume 1.8.0 User Guide — Apache Flume http://flume.apache.org/FlumeUserGuide.htmlless

deserializer LINE Specify the deserializer used to parse the file into events. Defaults to parsing each line as an event. The class specified must implement EventDeserializer.Builder.
deserializer.*   Varies per event deserializer.

 

 

Following serializers are provided for Hive sink:ide

JSON: Handles UTF8 encoded Json (strict syntax) events and requires no configration. Object names in the JSON are mapped directly to columns with the same name in the Hive table. Internally uses org.apache.hive.hcatalog.data.JsonSerDe but is independent of the Serde of the Hive table. This serializer requires HCatalog to be installed.ui

DELIMITED: Handles simple delimited textual events. Internally uses LazySimpleSerde but is independent of the Serde of the Hive table.

Name Default Description
serializer.delimiter , (Type: string) The field delimiter in the incoming data. To use special characters, surround them with double quotes like “\t”
serializer.fieldnames The mapping from input fields to columns in hive table. Specified as a comma separated list (no spaces) of hive table columns names, identifying the input fields in order of their occurrence. To skip fields leave the column name unspecified. Eg. ‘time,,ip,message’ indicates the 1st, 3rd and 4th fields in input map to time, ip and message columns in the hive table.
serializer.serdeSeparator Ctrl-A (Type: character) Customizes the separator used by underlying serde. There can be a gain in efficiency if the fields in serializer.fieldnames are in same order as table columns, the serializer.delimiter is same as the serializer.serdeSeparator and number of fields in serializer.fieldnames is less than or equal to number of table columns, as the fields in incoming event body do not need to be reordered to match order of table columns. Use single quotes for special characters like ‘\t’. Ensure input fields do not contain this character. NOTE: If serializer.delimiter is a single character, preferably set this to the same character

 

 

Flume 的无数据丢失保证,Channel 和事务 - 51CTO.COM http://book.51cto.com/art/201508/487912.htm

《Flume:构建高可用、可扩展的海量日志采集系统》本书从Flume 的基本概念和设计原理开始讲解,分别介绍了不一样种类的组件、如何配置组件、如何运行Flume Agent 等。同时,分别讨论Source、Channel 和Sink 三种核心组件,不只仅阐述每一个组件的基本概念,并且结合实际的编程案例,深刻、全面地介绍每一个组件的详细用法,而且这部份内容也是整个Flume 框架的重中之重。本节为你们介绍Flume 的无数据丢失保证,Channel 和事务。做者:马延辉/史东杰 译来源:电子工业出版社|2015-08-08 15:47 收藏 分享Flume 的无数据丢失保证,Channel 和事务若是配置正确,Flume 提供了无数据丢失的保证。固然,一旦管道中全部Flume Agent的容量之和被使用完,Flume 将再也不接受来自客户端的数据。此时,客户端须要缓冲数据,不然数据可能会丢失。所以,配置管道可以处理最大预期的停机时间是很是重要的。咱们将在第8 章讨论Flume 管道的配置。Flume 的持久性保证依赖于使用的持久性Channel 的保证。Flume 自带两类Channel :Memory Channel 和File Channel。Memory Channel 是一个内存缓冲区,所以若是Java23 虚拟机(JVM)或机器从新启动,任何缓冲区中的数据将丢失。另外一方面,File Channel是在磁盘上的。即便JVM 或机器从新启动,File Channel 也不丢失数据,只要磁盘上存储的数据仍然是起做用的和可访问的。机器和Agent 一旦开始运行,任何存储在FileChannel 中的数据将最终被访问。Channel 本质上是事务性的。此处的事务不一样于数据库事务。每一个Flume 事务表明一批自动写入到Channel 或从Channel 删除的事件。不管是当Source 将事件写入Channel 时,或Sink 从Channel 读取事件时,它必须在事务的范围以内进行操做。Flume 保证事件至少一次被送到它们的目的地。Flume 只有一次倾力写数据,且不存在任何类型的故障事件只被写一次。可是像网络超时或部分写入存储系统的错误,可能致使事件不止被写一次,由于Flume 将重试写操做直到它们彻底成功。网络超时可能表示写操做的失败,或者只是机器运行缓慢。若是是机器运行缓慢,当Flume 重试这将致使重复。所以,确保每一个事件都有某种形式的惟一标识符一般是一个好主意,若是须要,最终能够用来删除事件数据。

相关文章
相关标签/搜索