网络拓扑:参考:http://blog.csdn.net/lastsweetop/article/details/9065667html
分布式的集群一般包含很是多的机器,因为受到机架槽位和交换机网口的限制,一般大型的分布式集群都会跨好几个机架,由多个机架上的机器共同组成一个分布式集群。机架内的机器之间的网络速度一般都会高于跨机架机器之间的网络速度,而且机架之间机器的网络通讯一般受到上层交换机间网络带宽的限制。java
在Hadoop中把网络看作一个树,两个节点之间的距离是它们到最近的共同祖先的距离总和。该树中的层次是没有预先设定的,可是对于数据中心、机架和正在运行的节点一般能够设定等级的。node
distance(/D1/R1/H1,/D1/R1/H1)=0 同一节点中的进程(相同的datanode)
distance(/D1/R1/H1,/D1/R1/H2)=2 同一机架上的不一样节点(同一rack下的不一样datanode)
distance(/D1/R1/H1,/D1/R1/H4)=4 同一数据中心中不一样机架上的节点(同一IDC下的不一样datanode)
distance(/D1/R1/H1,/D2/R3/H7)=6 不一样数据中心中的节点(不一样IDC下的datanode)apache
Hadoop没法自行定义网络拓扑结构,它须要咱们可以理解并辅助定义。(若是网络是平铺的(仅有单一层次),能够不须要进行配置)。api
The HDFS and the Map/Reduce components are rack-aware.网络
The NameNode and the JobTracker obtains the rack id of the slaves in the cluster by invoking an API resolve in an administrator configured module. The API resolves the slave's DNS name (also IP address) to a rack id. What module to use can be configured using the configuration item topology.node.switch.mapping.impl. The default implementation of the same runs a script/command configured using topology.script.file.name. If topology.script.file.name is not set, the rack id /default-rack is returned for any passed IP address. The additional configuration in the Map/Reduce part is mapred.cache.task.levels which determines the number of levels (in the network topology) of caches. So, for example, if it is the default value of 2, two levels of caches will be constructed - one for hosts (host -> task mapping) and another for racks (rack -> task mapping).app
副本存放:分布式
namenode节点选择一个datanode节点去存储block副本得过程就叫作副本存放,这个过程的策略其实就是在可靠性和读写带宽间得权衡。函数
那么咱们来看两个极端现象:oop
// NAMENODES=$($HADOOP_PREFIX/bin/hdfs getconf -namenodes) echo "Starting namenodes on [$NAMENODES]" "$HADOOP_PREFIX/sbin/hadoop-daemons.sh" \ --config "$HADOOP_CONF_DIR" \ --hostnames "$NAMENODES" \ --script "$bin/hdfs" start namenode $nameStartOpt "$HADOOP_PREFIX/sbin/hadoop-daemons.sh" \ --config "$HADOOP_CONF_DIR" \ --script "$bin/hdfs" start datanode $dataStartOpt "$HADOOP_PREFIX/sbin/hadoop-daemons.sh" \ --config "$HADOOP_CONF_DIR" \ --hostnames "$SECONDARY_NAMENODES" \ --script "$bin/hdfs" start secondarynamenode
2.hadoop-daemon.sh
nohup nice -n $HADOOP_NICENESS $hdfsScript --config $HADOOP_CONF_DIR $command "$@" > "$log" 2>&1 < /dev/null &
3.hdfs
exec "$JAVA" $JAVA_HEAP_MAX $HADOOP_OPTS $CLASS "$@"
运行脚本:启动GetConf,NameNode,DataNode,SecondaryNameNode四个类的main函数。