hadoop节点地址localhost问题

问题描述

hadoop集群安装完毕,在yarn的控制台显示节点id和节点地址都是localhostjava

hadoop@master sbin]$ yarn node -list
20/12/17 12:21:19 INFO client.RMProxy: Connecting to ResourceManager at master/172.16.8.42:18040
Total Nodes:1
         Node-Id             Node-State Node-Http-Address       Number-of-Running-Containers
 localhost:43141                RUNNING    localhost:8042                                  0

提交做业时在yarn的日志中也打印出节点信息为127.0.0.1,而且使用该ip做为节点IP,确定链接出错node

2020-12-17 00:53:30,721 INFO org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher: Setting up container Container: [ContainerId: container_1607916354082_0008_01_000001, AllocationRequestId: 0, Version: 0, NodeId: localhost:43141, NodeHttpAddress: localhost:8042, Resource: <memory:2048, vCores:1>, Priority: 0, Token: Token { kind: ContainerToken, service: 127.0.0.1:35845 }, ExecutionType: GUARANTEED, ] for AM appattempt_1607916354082_0008_000001

020-12-17 00:56:30,801 INFO org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher: Error launching appattempt_1607916354082_0008_000001. Got exception: java.net.ConnectException: Call From master/172.16.8.42 to localhost:43141 failed on connection exception: java.net.ConnectException: Connection refused; For more details see:  http://wiki.apache.org/hadoop/ConnectionRefused
       at sun.reflect.GeneratedConstructorAccessor46.newInstance(Unknown Source)
       at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
       at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
       at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:827)
       at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:757)
       at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1553)
       at org.apache.hadoop.ipc.Client.call(Client.java:1495)
       at org.apache.hadoop.ipc.Client.call(Client.java:1394)
       at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232)
       at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:118)

问题缘由

在hadoop的源码中,获取节点信息的代码以下apache

private NodeId buildNodeId(InetSocketAddress connectAddress,String hostOverride) {
       if (hostOverride != null) {
           connectAddress = NetUtils.getConnectAddress(
                   new InetSocketAddress(hostOverride, connectAddress.getPort()));
       }
       return NodeId.newInstance(
               connectAddress.getAddress().getCanonicalHostName(),
               connectAddress.getPort());
   }

其中主机名是经过connectAddress.getAddress().getCanonicalHostName()进行获取,咱们知道获取主机名还能够经过getHostName获取,那么这两种有什么区别?getCanonicalHostName获取的是全域名,getHostName获取的是主机名,好比主机名是definesys但可能dns上面配的域名是definesys.com,getCanonicalHostName就是经过dns进行解析获取全域名,实际上getAddress获取到的是127.0.0.1,在hosts文件中是这样配置的bash

127.0.0.1     localhost       localhost.localdomain

所以解析成了localhostapp

解决方案

在hadoop的推荐方案里是这么写的dom

  • If the error message says the remote service is on "127.0.0.1" or "localhost" that means the configuration file is telling the client that the service is on the local server. If your client is trying to talk to a remote system, then your configuration is broken.
  • Check that there isn't an entry for your hostname mapped to 127.0.0.1 or 127.0.1.1 in /etc/hosts (Ubuntu is notorious for this).

翻译过来是建议删除127.0.0.1 和 127.0.1.1在hosts中的配置,删除后恢复正常,问题解决。ide

相关文章
相关标签/搜索