hadoop异常记录

时间 2019-11-08

标签 hadoop 异常记录栏目 Hadoop 繁體版

原文原文链接

下面遇到问题，提供了一些解决办法，但愿有所帮助

1：Shuffle Error: Exceeded MAX_FAILED_UNIQUE_FETCHES; bailing-out
这是reduce预处理阶段shuffle时获取已完成的map的输出失败次数超过上限形成的，上限默认为5。引发此问题的方式可能会有不少种，好比网络链接不正常，链接超时，带宽较差以及端口阻塞等，一般框架内网络状况较好是不会出现此错误的。

2：Too many fetch-failures
Answer:
出现这个问题主要是结点间的连通不够全面。
1) 检查、/etc/hosts
要求本机ip 对应服务器名
要求要包含全部的服务器ip + 服务器名
2) 检查 .ssh/authorized_keys
要求包含全部服务器（包括其自身）的public key

3：处理速度特别的慢出现map很快可是reduce很慢并且反复出现 reduce=0%
Answer:
结合第二点，而后
修改 conf/hadoop-env.sh 中的export HADOOP_HEAPSIZE=4000

4：可以启动datanode，但没法访问，也没法结束的错误
在从新格式化一个新的分布式文件时，须要将你NameNode上所配置的dfs.name.dir这一namenode用来存放NameNode 持久存储名字空间及事务日志的本地文件系统路径删除，同时将各DataNode上的dfs.data.dir的路径 DataNode 存放块数据的本地文件系统路径的目录也删除。如本此配置就是在NameNode上删除/home/hadoop/NameData，在DataNode上删除/home/hadoop/DataNode1和/home/hadoop/DataNode2。这是由于Hadoop在格式化一个新的分布式文件系统时，每一个存储的名字空间都对应了创建时间的那个版本（能够查看/home/hadoop /NameData/current目录下的VERSION文件，上面记录了版本信息），在从新格式化新的分布式系统文件时，最好先删除NameData 目录。必须删除各DataNode的dfs.data.dir。这样才可使namedode和datanode记录的信息版本对应。
注意：删除是个很危险的动做，不能确认的状况下不能删除！！作好删除的文件等统统备份！！

5：java.io.IOException: Could not obtain block: blk_194219614024901469_1100 file=/user/hive/warehouse/src_20090724_log/src_20090724_log
出现这种状况大可能是结点断了，没有链接上。

6：java.lang.OutOfMemoryError: Java heap space
出现这种异常，明显是jvm内存不够得缘由，要修改全部的datanode的jvm内存大小。
Java -Xms1024m -Xmx4096m
通常jvm的最大内存使用应该为总内存大小的一半，咱们使用的8G内存，因此设置为4096m，这一值可能依旧不是最优的值。

Hadoop添加节点的方法
本身实际添加节点过程：
1. 先在slave上配置好环境，包括ssh，jdk，相关config，lib，bin等的拷贝；
2. 将新的datanode的host加到集群namenode及其余datanode中去；
3. 将新的datanode的ip加到master的conf/slaves中；
4. 重启cluster,在cluster中看到新的datanode节点；
5. 运行bin/start-balancer.sh，这个会很耗时间
备注：
1. 若是不balance，那么cluster会把新的数据都存放在新的node上，这样会下降mr的工做效率；
2. 也可调用bin/start-balancer.sh 命令执行，也可加参数 -threshold 5
threshold 是平衡阈值，默认是10%，值越低各节点越平衡，但消耗时间也更长。
3. balancer也能够在有mr job的cluster上运行，默认dfs.balance.bandwidthPerSec很低，为1M/s。在没有mr job时，能够提升该设置加快负载均衡时间。

其余备注：
1. 必须确保slave的firewall已关闭;
2. 确保新的slave的ip已经添加到master及其余slaves的/etc/hosts中，反之也要将master及其余slave的ip添加到新的slave的/etc/hosts中

mapper及reducer个数
url地址： http://wiki.apache.org/hadoop/HowManyMapsAndReduces

[SQL] 纯文本查看 复制代码

          HowManyMapsAndReduces 
        
          Partitioning your jobintomapsandreduces 
        
          Picking the appropriatesizeforthe tasksforyour job can radically change the performanceofHadoop. Increasing the numberoftasks increases the framework overhead, but increasesloadbalancingandlowers the costoffailures.Atone extremeisthe 1 map/1 reducecasewherenothingisdistributed. The other extremeistohave 1,000,000 maps/ 1,000,000 reduceswherethe framework runsoutofresourcesforthe overhead. 
        
          NumberofMaps 
        
          The numberofmapsisusually drivenbythe numberofDFS blocksinthe input files. Although that causes peopletoadjust their DFS blocksizetoadjust the numberofmaps. Therightlevelofparallelismformaps seemstobe around 10-100 maps/node, although we have taken it upto300orsoforvery cpu-light map tasks. Task setup takes awhile, so itisbest if the maps takeatleast aminutetoexecute. 
        
          Actually controlling the numberofmapsissubtle. The mapred.map.tasks parameterisjust a hinttothe InputFormatforthe numberofmaps. ThedefaultInputFormat behavioristosplit the total numberofbytesintotherightnumberoffragments. However,inthedefaultcasethe DFS blocksizeofthe input filesistreatedasanupperboundforinput splits. Alowerboundonthe splitsizecan besetvia mapred.min.split.size. Thus, if you expect 10TBofinput dataandhave 128MB DFS blocks, you'll end up with 82k maps, unless your mapred.map.tasks is even larger. Ultimately the [WWW] InputFormat determines the number of maps. 
        
          The number of map tasks can also be increased manually using the JobConf's conf.setNumMapTasks(intnum). This can be usedtoincrease the numberofmap tasks, but willnotsetthe number below that which Hadoop determines via splitting the input data. 
        
          NumberofReduces 
        
          Therightnumberofreduces seemstobe 0.95or1.75 * (nodes * mapred.tasktracker.tasks.maximum).At0.95allofthe reduces can launch immediatelyandstart transfering map outputsasthe maps finish.At1.75 the faster nodes will finish theirfirstroundofreducesandlaunch asecondroundofreduces doing a much better jobofloadbalancing. 
        
          Currently the numberofreducesislimitedtoroughly 1000bythe buffersizefortheoutputfiles (io.buffer.size* 2 * numReduces << heapSize). This will be fixedatsomepoint, but until itisit provides a pretty firmupperbound. 
        
          The numberofreduces also controls the numberofoutputfilesintheoutputdirectory, but usually thatisnotimportant because thenextmap/reduce step will split themintoeven smaller splitsforthe maps. 
        
          The numberofreduce tasks can also be increasedinthe same wayasthe map tasks, via JobConf's conf.setNumReduceTasks(intnum).

本身的理解：
mapper个数的设置：跟input file 有关系，也跟filesplits有关系，filesplits的上线为dfs.block.size，下线能够经过mapred.min.split.size设置，最后仍是由InputFormat决定。

较好的建议：
The right number of reduces seems to be 0.95 or 1.75 multiplied by (<no. of nodes> * mapred.tasktracker.reduce.tasks.maximum).increasing the number of reduces increases the framework overhead, but increases load balancing and lowers the cost of failures.

[XML] 纯文本查看 复制代码

          <property> 
        
            <name>mapred.tasktracker.reduce.tasks.maximum</name> 
        
            <value>2</value> 
        
            <description>The maximum number of reduce tasks that will be run 
        
            simultaneously by a task tracker. 
        
            </description> 
        
          </property>

单个node新加硬盘
1.修改须要新加硬盘的node的dfs.data.dir，用逗号分隔新、旧文件目录
2.重启dfs

同步hadoop 代码
hadoop-env.sh

[Bash shell] 纯文本查看 复制代码

1 2	# host:path where hadoop code should be rsync'd from. Unset by default. # export HADOOP_MASTER=master:/home/$USER/src/hadoop

用命令合并HDFS小文件
hadoop fs -getmerge <src> <dest>

重启reduce job方法

[Bash shell] 纯文本查看 复制代码

1 2	Introduced recovery of jobs when JobTracker restarts. This facility is off by default. Introduced config parameters"mapred.jobtracker.restart.recover","mapred.jobtracker.job.history.block.size", and"mapred.jobtracker.job.history.buffer.size".

还未验证过。

IO写操做出现问题

[Bash shell] 纯文本查看 复制代码

          0-1246359584298, infoPort=50075, ipcPort=50020):Got exceptionwhileserving blk_-5911099437886836280_1292 to/172.16.100.165: 
        
          java.net.SocketTimeoutException: 480000 millis timeoutwhilewaitingforchannel to be readyforwrite. ch : java.nio.channels.SocketChannel[connectedlocal=/ 
        
          172.16.100.165:50010 remote=/172.16.100.165:50930] 
        
                  at org.apache.hadoop.net.SocketIOWithTimeout.waitForIO(SocketIOWithTimeout.java:185) 
        
                  at org.apache.hadoop.net.SocketOutputStream.waitForWritable(SocketOutputStream.java:159) 
        
                  at org.apache.hadoop.net.SocketOutputStream.transferToFully(SocketOutputStream.java:198) 
        
                  at org.apache.hadoop.hdfs.server.datanode.BlockSender.sendChunks(BlockSender.java:293) 
        
                  at org.apache.hadoop.hdfs.server.datanode.BlockSender.sendBlock(BlockSender.java:387) 
        
                  at org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:179) 
        
                  at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:94) 
        
                  at java.lang.Thread.run(Thread.java:619)

It seems there are many reasons that it can timeout, the example given in
HADOOP-3831 is a slow reading client.

解决办法：在hadoop-site.xml中设置dfs.datanode.socket.write.timeout=0试试；

HDFS退服节点的方法
目前版本的dfsadmin的帮助信息是没写清楚的，已经file了一个bug了，正确的方法以下：
1. 将 dfs.hosts 置为当前的 slaves，文件名用完整路径，注意，列表中的节点主机名要用大名，即 uname -n 能够获得的那个。
2. 将 slaves 中要被退服的节点的全名列表放在另外一个文件里，如 slaves.ex，使用 dfs.host.exclude 参数指向这个文件的完整路径
3. 运行命令 bin/hadoop dfsadmin -refreshNodes
4. web界面或 bin/hadoop dfsadmin -report 能够看到退服节点的状态是 Decomission in progress，直到须要复制的数据复制完成为止
5. 完成以后，从 slaves 里（指 dfs.hosts 指向的文件）去掉已经退服的节点

附带说一下 -refreshNodes 命令的另外三种用途：
2. 添加容许的节点到列表中（添加主机名到 dfs.hosts 里来）
3. 直接去掉节点，不作数据副本备份（在 dfs.hosts 里去掉主机名）
4. 退服的逆操做——中止 exclude 里面和 dfs.hosts 里面都有的，正在进行 decomission 的节点的退服，也就是把 Decomission in progress 的节点从新变为 Normal （在 web 界面叫 in service)

######################################
hadoop 学习借鉴
解决hadoop OutOfMemoryError问题：

[XML] 纯文本查看 复制代码

          <property> 
        
             <name>mapred.child.java.opts</name> 
        
             <value>-Xmx800M -server</value> 
        
          </property>

With the right JVM size in your hadoop-site.xml , you will have to copy this
to all mapred nodes and restart the cluster.
或者：hadoop jar jarfile [main class] -D mapred.child.java.opts=-Xmx800M

Hadoop java.io.IOException: Job failed! at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1232) while indexing.

[Bash shell] 纯文本查看 复制代码

1 2	when i use nutch1.0,get this error: Hadoop java.io.IOException: Job failed! at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1232)whileindexing.

这个也很好解决：
能够删除conf/log4j.properties，而后能够看到详细的错误报告
我这儿出现的是out of memory
解决办法是在给运行主类org.apache.nutch.crawl.Crawl加上参数：-Xms64m -Xmx512m
你的或许不是这个问题，可是能看到详细的错误报告问题就好解决了

distribute cache使用
相似一个全局变量，可是因为这个变量较大，因此不能设置在config文件中，转而使用distribute cache
具体使用方法：(详见《the definitive guide》,P240)
1. 在命令行调用时：调用-files，引入须要查询的文件(能够是local file, HDFS file(使用hdfs://xxx?)), 或者 -archives (JAR,ZIP, tar等)

[Bash shell] 纯文本查看 复制代码

1 2	% hadoop jar job.jar MaxTemperatureByStationNameUsingDistributedCacheFile / -files input/ncdc/metadata/stations-fixed-width.txt input/ncdc/alloutput

2. 程序中调用：

[Java] 纯文本查看 复制代码

          publicvoidconfigure(JobConf conf) { 
        
             metadata =newNcdcStationMetadata(); 
        
             try{ 
        
               metadata.initialize(newFile("stations-fixed-width.txt")); 
        
             }catch(IOException e) { 
        
               thrownewRuntimeException(e); 
        
             } 
        
          }

另一种间接的使用方法：在hadoop-0.19.0中好像没有
调用addCacheFile()或者addCacheArchive()添加文件，
使用getLocalCacheFiles() 或 getLocalCacheArchives() 得到文件

hadoop的job显示web

[Bash shell] 纯文本查看 复制代码

1	There are web-based interfaces to both the JobTracker (MapReduce master) and NameNode (HDFS master)whichdisplay status pages about the state of the entire system. By default, these are located at [WWW] [url]http://job.tracker.addr:50030/[/url] and [WWW] [url]http://name.node.addr:50070/.[/url]

hadoop监控
OnlyXP(52388483) 131702
用nagios做告警，ganglia做监控图表便可

status of 255 error
错误类型：

[Bash shell] 纯文本查看 复制代码

1 2	java.io.IOException: Task processexitwith nonzero status of 255. at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:424)

错误缘由：

[Bash shell] 纯文本查看 复制代码

1	Set mapred.jobtracker.retirejob.interval and mapred.userlog.retain.hours to higher value. By default, their values are 24 hours. These might be the reasonforfailure, though I'm not sure

split size

[Bash shell] 纯文本查看 复制代码

          FileInputFormat input splits: (详见 《the definitive guide》P190) 
        
          mapred.min.split.size: default=1, the smallest valide sizeinbytesforafilesplit. 
        
          mapred.max.split.size: default=Long.MAX_VALUE, the largest valid size.

dfs.block.size: default = 64M, 系统中设置为128M。
若是设置 minimum split size > block size, 会增长块的数量。(猜测从其余节点拿去数据的时候，会合并block，致使block数量增多)
若是设置maximum split size < block size, 会进一步拆分block。

split size = max(minimumSize, min(maximumSize, blockSize));
其中 minimumSize < blockSize < maximumSize.

sort by value
hadoop 不提供直接的sort by value方法，由于这样会下降mapreduce性能。
但能够用组合的办法来实现，具体实现方法见《the definitive guide》, P250
基本思想：
1. 组合key/value做为新的key；
2. 重载partitioner，根据old key来分割；
conf.setPartitionerClass(FirstPartitioner.class);
3. 自定义keyComparator：先根据old key排序，再根据old value排序；
conf.setOutputKeyComparatorClass(KeyComparator.class);
4. 重载GroupComparator, 也根据old key 来组合； conf.setOutputValueGroupingComparator(GroupComparator.class);

small input files的处理
对于一系列的small files做为input file，会下降hadoop效率。
有3种方法能够将small file合并处理：
1. 将一系列的small files合并成一个sequneceFile，加快mapreduce速度。
详见WholeFileInputFormat及SmallFilesToSequenceFileConverter,《the definitive guide》, P194
2. 使用CombineFileInputFormat集成FileinputFormat，可是未实现过；
3. 使用hadoop archives(相似打包)，减小小文件在namenode中的metadata内存消耗。(这个方法不必定可行，因此不建议使用)
方法：
将/my/files目录及其子目录归档成files.har，而后放在/my目录下
bin/hadoop archive -archiveName files.har /my/files /my

查看files in the archive:
bin/hadoop fs -lsr har://my/files.har

skip bad records

[Java] 纯文本查看 复制代码

          JobConf conf =newJobConf(ProductMR.class); 
        
          conf.setJobName("ProductMR"); 
        
          conf.setOutputKeyClass(Text.class); 
        
          conf.setOutputValueClass(Product.class); 
        
          conf.setMapperClass(Map.class); 
        
          conf.setReducerClass(Reduce.class); 
        
          conf.setMapOutputCompressorClass(DefaultCodec.class); 
        
          conf.setInputFormat(SequenceFileInputFormat.class); 
        
          conf.setOutputFormat(SequenceFileOutputFormat.class); 
        
          String objpath ="abc1"; 
        
          SequenceFileInputFormat.addInputPath(conf,newPath(objpath)); 
        
          SkipBadRecords.setMapperMaxSkipRecords(conf, Long.MAX_VALUE); 
        
          SkipBadRecords.setAttemptsToStartSkipping(conf,0); 
        
          SkipBadRecords.setSkipOutputPath(conf,newPath("data/product/skip/")); 
        
          String output ="abc"; 
        
          SequenceFileOutputFormat.setOutputPath(conf,newPath(output)); 
        
          JobClient.runJob(conf);

For skipping failed tasks try : mapred.max.map.failures.percent

restart 单个datanode
若是一个datanode 出现问题，解决以后须要从新加入cluster而不重启cluster，方法以下：

[Bash shell] 纯文本查看 复制代码

1 2	bin/hadoop-daemon.sh start datanode bin/hadoop-daemon.sh start jobtracker

Namenode in safe mode
解决方法
bin/hadoop dfsadmin -safemode leave

java.net.NoRouteToHostException: No route to host
j解决方法：

[Bash shell] 纯文本查看 复制代码

1	sudo/etc/init.d/iptablesstop

更改namenode后，在hive中运行select 依旧指向以前的namenode地址
这是由于：

[Bash shell] 纯文本查看 复制代码

          When youcreate a table, hive actually stores the location of the table (e.g. 
        
          hdfs://ip:port/user/root/...)inthe SDS and DBS tablesinthe metastore . So when I bring up a new cluster the master has a new IP, but hive's metastore is still pointing to the locations within the old 
        
          cluster. I could modify the metastore to update with the new IP everytime I bring up a cluster. But the easier and simpler solution was to just use an elastic IPforthe master

因此要将metastore中的以前出现的namenode地址所有更换为现有的namenode地址

Your DataNodes won't start, and you see something like this in logs/*datanode*:
Incompatible namespaceIDs in /tmp/hadoop-ross/dfs/data
缘由：

[Bash shell] 纯文本查看 复制代码

1	Your Hadoop namespaceID became corrupted. Unfortunately the easiest thing todoreformat the HDFS.

解决方法：
You need to do something like this:

[Bash shell] 纯文本查看 复制代码

          bin/stop-all.sh 
        
          rm-Rf/tmp/hadoop-your-username/* 
        
          bin/hadoopnamenode -format

12：You can run Hadoop jobs written in Java (like the grep example), but your HadoopStreaming jobs (such as the Python example that fetches web page titles) won't work.
缘由：
You might have given only a relative path to the mapper and reducer programs. The tutorial originally just specified relative paths, but absolute paths are required if you are running in a real cluster.
解决方法：
Use absolute paths like this from the tutorial:

[Bash shell] 纯文本查看 复制代码

          bin/hadoopjar contrib/hadoop-0.15.2-streaming.jar / 
        
            -mapper  $HOME/proj/hadoop/multifetch.py         / 
        
            -reducer $HOME/proj/hadoop/reducer.py            / 
        
            -input   urls/*                                  / 
        
            -output  titles

09/08/31 18:25:45 INFO hdfs.DFSClient: Exception in createBlockOutputStream java.io.IOException:Bad connect ack with firstBadLink 192.168.1.11:50010

[Bash shell] 纯文本查看 复制代码

          > 09/08/3118:25:45 INFO hdfs.DFSClient: Abandoning block blk_-8575812198227241296_1001 
        
          > 09/08/3118:25:51 INFO hdfs.DFSClient: ExceptionincreateBlockOutputStream java.io.IOException: 
        
          Bad connect ack with firstBadLink 192.168.1.16:50010 
        
          > 09/08/3118:25:51 INFO hdfs.DFSClient: Abandoning block blk_-2932256218448902464_1001 
        
          > 09/08/3118:25:57 INFO hdfs.DFSClient: ExceptionincreateBlockOutputStream java.io.IOException: 
        
          Bad connect ack with firstBadLink 192.168.1.11:50010 
        
          > 09/08/3118:25:57 INFO hdfs.DFSClient: Abandoning block blk_-1014449966480421244_1001 
        
          > 09/08/3118:26:03 INFO hdfs.DFSClient: ExceptionincreateBlockOutputStream java.io.IOException: 
        
          Bad connect ack with firstBadLink 192.168.1.16:50010 
        
          > 09/08/3118:26:03 INFO hdfs.DFSClient: Abandoning block blk_7193173823538206978_1001 
        
          > 09/08/3118:26:09 WARN hdfs.DFSClient: DataStreamer Exception: java.io.IOException: Unable 
        
          to create new block. 
        
          >         at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2731) 
        
          >         at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:1996) 
        
          >         at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2182) 
        
          > 
        
          > 09/08/3118:26:09 WARN hdfs.DFSClient: Error Recoveryforblock blk_7193173823538206978_1001 
        
          bad datanode[2] nodes == null 
        
          > 09/08/3118:26:09 WARN hdfs.DFSClient: Could not get block locations. Sourcefile"/user/umer/8GB_input" 
        
          - Aborting... 
        
          > put: Bad connect ack with firstBadLink 192.168.1.16:50010

解决方法：
I have resolved the issue:
What i did:

1) '/etc/init.d/iptables stop' -->stopped firewall
2) SELINUX=disabled in '/etc/selinux/config' file.-->disabled selinux
I worked for me after these two changes

解决jline.ConsoleReader.readLine在Windows上不生效问题方法
在 CliDriver.java的main()函数中，有一条语句reader.readLine，用来读取标准输入，但在Windows平台上该语句老是返回null，这个reader是一个实例jline.ConsoleReader实例，给Windows Eclipse调试带来不便。
咱们能够经过使用java.util.Scanner.Scanner来替代它，将原来的
while ((line=reader.readLine(curPrompt+"> ")) != null)
复制代码
替换为：
Scanner sc = new Scanner(System.in);
while ((line=sc.nextLine()) != null)

从新编译发布，便可正常从标准输入读取输入的SQL语句了。

某次正常运行mapreduce实例时,抛出错误

java.io.IOException: All datanodes xxx.xxx.xxx.xxx:xxx are bad. Aborting…
at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.processDatanodeError(DFSClient.java:2158)
at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.access$1400(DFSClient.java:1735)
at org.apache.hadoop.dfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:1889)
java.io.IOException: Could not get block locations. Aborting…
at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.processDatanodeError(DFSClient.java:2143)
at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.access$1400(DFSClient.java:1735)
at org.apache.hadoop.dfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:1889)
经查明，问题缘由是linux机器打开了过多的文件致使。用命令ulimit -n能够发现linux默认的文件打开数目为1024，修改/ect/security/limit.conf，增长hadoop soft 65535

再从新运行程序（最好全部的datanode都修改），问题解决

运行一段时间后hadoop不能stop-all.sh的问题，显示报错
no tasktracker to stop ，no datanode to stop
问题的缘由是hadoop在stop的时候依据的是datanode上的mapred和dfs进程号。而默认的进程号保存在/tmp下，linux默认会每隔一段时间（通常是一个月或者7天左右）去删除这个目录下的文件。所以删掉hadoop-hadoop-jobtracker.pid和hadoop- hadoop-namenode.pid两个文件后，namenode天然就找不到datanode上的这两个进程了。
在配置文件中的export HADOOP_PID_DIR能够解决这个问题

问题：
Incompatible namespaceIDs in /usr/local/hadoop/dfs/data: namenode namespaceID = 405233244966; datanode namespaceID = 33333244
缘由：
在每次执行hadoop namenode -format时，都会为NameNode生成namespaceID,，可是在hadoop.tmp.dir目录下的DataNode仍是保留上次的 namespaceID，由于namespaceID的不一致，而致使DataNode没法启动，因此只要在每次执行hadoop namenode -format以前，先删除hadoop.tmp.dir目录就能够启动成功。请注意是删除hadoop.tmp.dir对应的本地目录，而不是HDFS 目录。

Problem: NameNode is not formatted
solution: 是由于HDFS尚未格式化，只须要运行hadoop namenode -format一下，而后再启动便可

bin/hadoop jps后报以下异常：

[Bash shell] 纯文本查看 复制代码

          Exceptioninthread"main"java.lang.NullPointerException 
        
                  at sun.jvmstat.perfdata.monitor.protocol.local.LocalVmManager.activeVms(LocalVmManager.java:127) 
        
                  at sun.jvmstat.perfdata.monitor.protocol.local.MonitoredHostProvider.activeVms(MonitoredHostProvider.java:133) 
        
                  at sun.tools.jps.Jps.main(Jps.java:45)

缘由为：
系统根目录/tmp文件夹被删除了。从新创建/tmp文件夹便可。
bin/hive中出现 unable to create log directory /tmp/...也多是这个缘由