hadoop集群安装完毕,在yarn的控制台显示节点id和节点地址都是localhostjava
hadoop@master sbin]$ yarn node -list 20/12/17 12:21:19 INFO client.RMProxy: Connecting to ResourceManager at master/172.16.8.42:18040 Total Nodes:1 Node-Id Node-State Node-Http-Address Number-of-Running-Containers localhost:43141 RUNNING localhost:8042 0
提交做业时在yarn的日志中也打印出节点信息为127.0.0.1,而且使用该ip做为节点IP,确定链接出错node
2020-12-17 00:53:30,721 INFO org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher: Setting up container Container: [ContainerId: container_1607916354082_0008_01_000001, AllocationRequestId: 0, Version: 0, NodeId: localhost:43141, NodeHttpAddress: localhost:8042, Resource: <memory:2048, vCores:1>, Priority: 0, Token: Token { kind: ContainerToken, service: 127.0.0.1:35845 }, ExecutionType: GUARANTEED, ] for AM appattempt_1607916354082_0008_000001 020-12-17 00:56:30,801 INFO org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher: Error launching appattempt_1607916354082_0008_000001. Got exception: java.net.ConnectException: Call From master/172.16.8.42 to localhost:43141 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused at sun.reflect.GeneratedConstructorAccessor46.newInstance(Unknown Source) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:423) at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:827) at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:757) at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1553) at org.apache.hadoop.ipc.Client.call(Client.java:1495) at org.apache.hadoop.ipc.Client.call(Client.java:1394) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:118)
在hadoop的源码中,获取节点信息的代码以下apache
private NodeId buildNodeId(InetSocketAddress connectAddress,String hostOverride) { if (hostOverride != null) { connectAddress = NetUtils.getConnectAddress( new InetSocketAddress(hostOverride, connectAddress.getPort())); } return NodeId.newInstance( connectAddress.getAddress().getCanonicalHostName(), connectAddress.getPort()); }
其中主机名是经过connectAddress.getAddress().getCanonicalHostName()
进行获取,咱们知道获取主机名还能够经过getHostName
获取,那么这两种有什么区别?getCanonicalHostName获取的是全域名,getHostName获取的是主机名,好比主机名是definesys但可能dns上面配的域名是definesys.com,getCanonicalHostName就是经过dns进行解析获取全域名,实际上getAddress获取到的是127.0.0.1,在hosts文件中是这样配置的bash
127.0.0.1 localhost localhost.localdomain
所以解析成了localhostapp
在hadoop的推荐方案里是这么写的dom
翻译过来是建议删除127.0.0.1 和 127.0.1.1在hosts中的配置,删除后恢复正常,问题解决。ide