HDFS之Node角色

Secondary NameNode:

NameNode是一种logappend方式来存储对dfs的修改操做,editlog。
NameNode启动的时候,会去从fsimage中读取HDFS的状态,而后从editlog中恢复恢复对dfs的修改操做。而后在对fsimage写入新的状态,启动一个新的空的edits file.node

因为NameNode只会在其启动的时候,会合并fsimage和editlog. Editlog会随着时间的增加变得愈来愈大。下次从新启动Namenode的时候,会变得异常缓慢。web

SNN会每隔一段时间来合并fsimage和editslog,来保证editlog的长度限制。
SNN一般会运行在另外一台机器上,SNN和NN的内存需求量是同一个数量级。app

SNN的checkpoint过程会有2个参数来触发:svg

dfs.namenode.checkpoint.period 时间间隔来checkpointoop

set to 1 hour by default, specifies the maximum delay between two consecutive checkpoints, andspa

dfs.namenode.checkpoint.txns 为checkpoint的事务个数阀值.net

set to 1 million by default, defines the number of uncheckpointed transactions on the NameNode which will force an urgent checkpoint, even if the checkpoint period has not been reached.code

SNN会checkpoint造成和NN一样的元数据存储文件结构,能够随时准备被NameNode来进行读取。orm

Checkpoint Node:

NN使用2个文件来持久化它的命名空间Namespace。
一、最近一次的namespace的checkpoint和edits
二、自从上次checkpoint后的a journal log of changesxml

Namenode重启的时候,它会合并fsimage和edits journal来提供一个最新的DFS的元数据。而后NN就用最新的DFS状态来overwrite现有个fsimage,而后开启一个新的edits journal.

Checkpoint Node按期的来对Namespace来建立checkpoint。
它从限制Active的NN来拉去fsimage,而后download到本地,而后在本地合并,最后上传合并后最新的image到activeNN。
Checkpoint Node一般会运行在另外一台机器上,CheckpointNode和NN的内存需求量是同一个数量级。

The location of the Checkpoint (or Backup) node and its accompanying web interface are configured via the dfs.namenode.backup.address and dfs.namenode.backup.http-address configuration variables.

合并的时间间隔和阀值和SNN的参数是同样的dfs.namenode.checkpoint.period,dfs.namenode.checkpoint.txns

启动Checkpoint Node:
The Checkpoint node is started by bin/hdfs namenode -checkpoint on the node specified in the configuration file.

Backup Node:

和checkpointNode提供一样的checkpoint功能。一样也维护了一个in-memory而且最新的fs namespace的副本,会老是和activeNN来进行同步。和NN一块儿接受journal stream文件系统的edits而且持久化到disk。同时将edit应用到本身的namspace内存中,这样来建立backup namespace。

Backup Node没必要从active NN来下载fsimage,由于它被要求是一个checkpoint node或者是一个 snn。

因为它只是一个内存中NameNode的namespace的副本,因此能够更快的进行checkpoint。

StandbyNN。

集群中有且只有一个BackupNode。若是启用了BackupNode,则不容许再注册Checkpoint Node。
bin/hdfs namenode -backup.

dfs.namenode.backup.address

dfs.namenode.backup.http-address

BackupNode支持提供一个选项来运行没有存储介质的NN。容许将全部的持久化状态的责任来交给BackupNode。
可使用-importCheckpoint。

Term理解

  • Role of the name-node – defines name-node functionality.
  • Active name-node (NN) – a name-node in “active” role.
    This is the main (traditional) name-node, unique in the cluster.
  • Checkpoint node (CN) – a name-node in “checkpoint” role.
    This node performs only checkpoints. It does not keep an up-to-date namespace
    state.
  • Backup node (BN) – a name-node in “backup” role.
    Includes all the checkpoint responsibilities, plus it maintains an up-to-date
    namespace state, which is always in sync with the active node.
  • Standby node (SN) – a name-node in “standby” state.
    Standby is a backup node, which is able to take over the active role if the current
    active fails.
  • Image – latest checkpoint of the namespace; corresponds to “fsimage” file.
  • Journal – a collection of journal records (edits) reflecting modifications to the
    namespace since the latest checkpoint; corresponds to “edits” file.
  • Image store – a storage resource, which contains namespace image state.
  • Journal store – a storage resource, which contains namespace journal.
  • Journal Spool – a temporary storage on BN that spools journal records until they
    can be picked up and applied to the namespace.
  • Checkpoint Time – the latest time the image was saved; defines the age of the
    namespace state.

原创文章,转载请注明:

转载自:OopsOutOfMemory盛利的Blog,做者: OopsOutOfMemory

本文连接地址:http://blog.csdn.net/oopsoom/article/details/47278399

注:本文基于署名-非商业性使用-禁止演绎 2.5 中国大陆(CC BY-NC-ND 2.5 CN)协议,欢迎转载、转发和评论,可是请保留本文做者署名和文章连接。如若须要用于商业目的或者与受权方面的协商,请联系我。

image