今天修改了hadoop集群的配置文件而须要重启集群,可是却报错以下:
java
[hadoop@master ~]# stop-dfs.sh Stopping namenodes on [master] master1: no namenode to stop master2: no namenode to stop slave2: no datanode to stop slave1: no datanode to stop
问题的缘由是hadoop在stop的时候依据的是datanode上的journalnode和dfs的pid。而默认的进程号保存在/tmp下,linux 默认会每隔一段时间(通常是一个月或者7天左右)去删除这个目录下的文件。node
所以删掉hadoop-hadoop-journalnode.pid和hadoop-hadoop-datanode.pid两个文件后,namenode天然就找不到datanode上的这两个进程了。linux
在配置文件hadoop_env.sh中配置export HADOOP_PID_DIR能够解决这个问题, 也能够在hadoop-deamon.sh中修改,它会调用hadoop_env.sh。修改HADOOP_PID_DIR的路径为“/var/hadoop_pid”,记得手动在“/var”目录下建立hadoop_pid文件夹并将owner权限分配给hadoop用户。shell
[hadoop@slave3 ~]$ ls /var/hadoop_pid/ hadoop-hadoop-datanode.pid hadoop-hadoop-journalnode.pid
而后手动在出错的Slave上杀死Datanode的进程(kill -9 pid),再从新运行start-dfs..sh时发现没有“no datanode to stop”和“no namenode to stop”的出现,问题解决。app
[hadoop@master1 ~]$ start-dfs.sh 16/04/13 17:20:26 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Starting namenodes on [master1 master2] master1: starting namenode, logging to /data/usr/hadoop/logs/hadoop-hadoop-namenode-master1.out master2: starting namenode, logging to /data/usr/hadoop/logs/hadoop-hadoop-namenode-master2.out slave4: starting datanode, logging to /data/usr/hadoop/logs/hadoop-hadoop-datanode-slave4.out slave3: starting datanode, logging to /data/usr/hadoop/logs/hadoop-hadoop-datanode-slave3.out slave2: starting datanode, logging to /data/usr/hadoop/logs/hadoop-hadoop-datanode-slave2.out slave1: starting datanode, logging to /data/usr/hadoop/logs/hadoop-hadoop-datanode-slave1.out Starting journal nodes [master1 master2 slave1 slave2 slave3] slave3: starting journalnode, logging to /data/usr/hadoop/logs/hadoop-hadoop-journalnode-slave3.out master1: starting journalnode, logging to /data/usr/hadoop/logs/hadoop-hadoop-journalnode-master1.out slave1: starting journalnode, logging to /data/usr/hadoop/logs/hadoop-hadoop-journalnode-slave1.out master2: starting journalnode, logging to /data/usr/hadoop/logs/hadoop-hadoop-journalnode-master2.out slave2: starting journalnode, logging to /data/usr/hadoop/logs/hadoop-hadoop-journalnode-slave2.out 16/04/13 17:20:40 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Starting ZK Failover Controllers on NN hosts [master1 master2] master1: starting zkfc, logging to /data/usr/hadoop/logs/hadoop-hadoop-zkfc-master1.out master2: starting zkfc, logging to /data/usr/hadoop/logs/hadoop-hadoop-zkfc-master2.out