cloudera server与agent失连问题

#该主机已与 Cloudera Manager Server 未创建联系
1
 
1
#该主机已与 Cloudera Manager Server 未创建联系

server端monitor服务正常agent连不上
#该主机已与 Cloudera Manager Server 创建联系。 该主机未与 Host Monitor 创建联系。
[20/Feb/2020 16:51:51 +0000] 22086 MonitorDaemon-Reporter firehoses INFO Creating a connection to the ACTIVITYMONITOR.
[20/Feb/2020 16:51:51 +0000] 22086 MonitorDaemon-Reporter firehoses INFO Creating a connection to the SERVICEMONITOR.
[20/Feb/2020 16:51:51 +0000] 22086 MonitorDaemon-Reporter firehoses INFO Creating a connection to the HOSTMONITOR.
[20/Feb/2020 16:51:51 +0000] 22086 MonitorDaemon-Reporter throttling_logger ERROR Error sending messages to firehose: mgmt-HOSTMONITOR-d592ed6aea0516a09027c2cf834d8979
Traceback (most recent call last):
  File "/opt/cloudera/cm-agent/lib/python2.7/site-packages/cmf/monitor/firehose.py", line 121, in _send
    self._port)
  File "/opt/cloudera/cm-agent/lib/python2.7/site-packages/avro/ipc.py", line 469, in __init__
    self.conn.connect()
  File "/usr/lib64/python2.7/httplib.py", line 833, in connect
    self.timeout, self.source_address)
  File "/usr/lib64/python2.7/socket.py", line 571, in create_connection
    raise err
error: [Errno 111] Connection refused
15
 
1
#该主机已与 Cloudera Manager Server 创建联系。 该主机未与 Host Monitor 创建联系。
2
[20/Feb/2020 16:51:51 +0000] 22086 MonitorDaemon-Reporter firehoses INFO Creating a connection to the ACTIVITYMONITOR.
3
[20/Feb/2020 16:51:51 +0000] 22086 MonitorDaemon-Reporter firehoses INFO Creating a connection to the SERVICEMONITOR.
4
[20/Feb/2020 16:51:51 +0000] 22086 MonitorDaemon-Reporter firehoses INFO Creating a connection to the HOSTMONITOR.
5
[20/Feb/2020 16:51:51 +0000] 22086 MonitorDaemon-Reporter throttling_logger ERROR Error sending messages to firehose: mgmt-HOSTMONITOR-d592ed6aea0516a09027c2cf834d8979
6
Traceback (most recent call last):
7
  File "/opt/cloudera/cm-agent/lib/python2.7/site-packages/cmf/monitor/firehose.py", line 121, in _send
8
    self._port)
9
  File "/opt/cloudera/cm-agent/lib/python2.7/site-packages/avro/ipc.py", line 469, in __init__
10
    self.conn.connect()
11
  File "/usr/lib64/python2.7/httplib.py", line 833, in connect
12
    self.timeout, self.source_address)
13
  File "/usr/lib64/python2.7/socket.py", line 571, in create_connection
14
    raise err
15
error: [Errno 111] Connection refused
参考:

server日志里
2020-02-20 17:25:06,371 WARN New I/O boss #388:com.cloudera.server.cmf.log.AgentResponseAsyncHandler: (2 skipped) Exception thrown while trying to get log search results from agent on host: creative
java.net.ConnectException: Connection timed out: creative/172.19.40.203:9000
。。
2020-02-20 17:35:17,209 ERROR ParcelUpdateService:com.cloudera.parcel.components.ParcelDownloaderImpl: (10 skipped) Unable to retrieve remote parcel repository manifest
java.util.concurrent.ExecutionException: java.net.UnknownHostException: archive.cloudera.com: Name or service not known

cloudera agent monitor firehose error: [Errno 111] Connection refused
#从新添加主机
2020-02-20 20:19:57,879 ERROR scm-web-4143:com.cloudera.cmf.model.DbCommand: Command null(DeployClusterClientConfig) has completed. finalstate:FINISHED, success:false, msg:Command Deploy Client Configuration is not currently available for execution.
2020-02-20 20:19:57,894 INFO scm-web-4143:com.cloudera.enterprise.JavaMelodyFacade: Exiting HTTP Operation: Method:POST, Path:/v7/clusters/LogServerClu/commands/deployClientConfig, Status:200
2020-02-20 20:19:57,978 WARN scm-web-4105:com.cloudera.cmf.command.flow.SeqFlowCmd: Invalid command state json
com.cloudera.enterprise.JsonUtil2$JsonRuntimeException: com.fasterxml.jackson.databind.exc.MismatchedInputException: No content to map due to end-of-input
 at [Source: (String)""; line: 1, column: 0]
	at com.cloudera.enterprise.JsonUtil2.valueFromString(JsonUtil2.java:193)
8
 
1
cloudera agent monitor firehose error: [Errno 111] Connection refused
2
#从新添加主机
3
2020-02-20 20:19:57,879 ERROR scm-web-4143:com.cloudera.cmf.model.DbCommand: Command null(DeployClusterClientConfig) has completed. finalstate:FINISHED, success:false, msg:Command Deploy Client Configuration is not currently available for execution.
4
2020-02-20 20:19:57,894 INFO scm-web-4143:com.cloudera.enterprise.JavaMelodyFacade: Exiting HTTP Operation: Method:POST, Path:/v7/clusters/LogServerClu/commands/deployClientConfig, Status:200
5
2020-02-20 20:19:57,978 WARN scm-web-4105:com.cloudera.cmf.command.flow.SeqFlowCmd: Invalid command state json
6
com.cloudera.enterprise.JsonUtil2$JsonRuntimeException: com.fasterxml.jackson.databind.exc.MismatchedInputException: No content to map due to end-of-input
7
 at [Source: (String)""; line: 1, column: 0]
8
 at com.cloudera.enterprise.JsonUtil2.valueFromString(JsonUtil2.java:193)
不是JDK的缘由!
搞了一天最终大法:
把170,171,172,221四台agent停掉,停掉170 server;而后再重启server,四个agent
#四台
systemctl stop cloudera-scm-agent
systemctl stop cloudera-scm-server
#170
systemctl start cloudera-scm-server
#四台
systemctl start cloudera-scm-agent
7
 
1
#四台
2
systemctl stop cloudera-scm-agent
3
systemctl stop cloudera-scm-server
4
#170
5
systemctl start cloudera-scm-server
6
#四台
7
systemctl start cloudera-scm-agent
仍是没解决221节点(内网ip映射)从cloudera删除集群:四台节点都是配置221的公网ip映射;而后重新添加到集群。
#scm-status.log
20/Feb/2020 21:56:44 +0000] 5440 MainThread _cplogging   INFO     [20/Feb/2020:21:56:44] ENGINE Started monitor thread 'Autoreloader'.
[20/Feb/2020 21:56:44 +0000] 5440 MainThread _cplogging   INFO     [20/Feb/2020:21:56:44] ENGINE Started monitor thread '_TimeoutMonitor'.
[20/Feb/2020 21:56:44 +0000] 5440 HTTPServer Thread-3 _cplogging   ERROR    [20/Feb/2020:21:56:44] ENGINE Error in HTTP server: shutting down
Traceback (most recent call last):
  File "/opt/cloudera/cm-agent/lib/python2.7/site-packages/cherrypy/process/servers.py", line 225, in _start_http_thread
    self.httpserver.start()
  File "/opt/cloudera/cm-agent/lib/python2.7/site-packages/cheroot/server.py", line 1326, in start
    raise socket.error(msg)
error: No socket could be created -- (('47.103.112.221', 9000): [Errno 99] Cannot assign requested address)

[20/Feb/2020 21:56:44 +0000] 5440 HTTPServer Thread-3 _cplogging   INFO     [20/Feb/2020:21:56:44] ENGINE Bus STOPPING
[20/Feb/2020 21:56:44 +0000] 5440 HTTPServer Thread-3 _cplogging   INFO     [20/Feb/2020:21:56:44] ENGINE HTTP Server cherrypy._cpwsgi_server.CPWSGIServer(('creative', 9000)) already shut down
[20/Feb/2020 21:56:44 +0000] 5440 HTTPServer Thread-3 _cplogging   INFO     [20/Feb/2020:21:56:44] ENGINE Stopped thread '_TimeoutMonitor'.
[20/Feb/2020 21:56:44 +0000] 5440 HTTPServer Thread-3 _cplogging   INFO     [20/Feb/2020:21:56:44] ENGINE Stopped thread 'Autoreloader'.
[20/Feb/2020 21:56:44 +0000] 5440 HTTPServer Thread-3 _cplogging   INFO     [20/Feb/2020:21:56:44] ENGINE Bus STOPPED
[20/Feb/2020 21:56:44 +0000] 5440 HTTPServer Thread-3 _cplogging   INFO     [20/Feb/2020:21:56:44] ENGINE Bus EXITING
[20/Feb/2020 21:56:44 +0000] 5440 HTTPServer Thread-3 _cplogging   INFO     [20/Feb/2020:21:56:44] ENGINE Bus EXITED
#scm-agent.log
[20/Feb/2020 21:56:35 +0000] 5322 MainThread _cplogging   INFO     [20/Feb/2020:21:56:35] ENGINE Serving on http://127.0.0.1:9001
[20/Feb/2020 21:56:35 +0000] 5322 MainThread _cplogging   INFO     [20/Feb/2020:21:56:35] ENGINE Bus STARTED
[20/Feb/2020 21:56:37 +0000] 5322 MainThread main         ERROR    Top-level exception: <Fault 40: 'ABNORMAL_TERMINATION: status_server'>
Traceback (most recent call last):
  File "/opt/cloudera/cm-agent/lib/python2.7/site-packages/cmf/main.py", line 107, in main_impl
    ag.start(legacy_supervisor)
  File "/opt/cloudera/cm-agent/lib/python2.7/site-packages/cmf/agent.py", line 839, in start
    self.supervisor_client.start_process(STATUS_SERVER_PROC)
  File "/opt/cloudera/cm-agent/lib/python2.7/site-packages/cmf/util/__init__.py", line 531, in new_fn
    return fn(self, *args, **kwargs)
  File "/opt/cloudera/cm-agent/lib/python2.7/site-packages/cmf/supervisor.py", line 406, in start_process
    raise RetryableProcessException(fault)
RetryableProcessException: <Fault 40: 'ABNORMAL_TERMINATION: status_server'>
    
###查看ip及hostname对应关系
[root@creative cloudera-scm-agent]# python -c 'import socket; print socket.getfqdn(), socket.gethostbyname(socket.getfqdn())'
creative 47.103.112.221
36
 
1
#scm-status.log
2
20/Feb/2020 21:56:44 +0000] 5440 MainThread _cplogging   INFO     [20/Feb/2020:21:56:44] ENGINE Started monitor thread 'Autoreloader'.
3
[20/Feb/2020 21:56:44 +0000] 5440 MainThread _cplogging   INFO     [20/Feb/2020:21:56:44] ENGINE Started monitor thread '_TimeoutMonitor'.
4
[20/Feb/2020 21:56:44 +0000] 5440 HTTPServer Thread-3 _cplogging   ERROR   [20/Feb/2020:21:56:44] ENGINE Error in HTTP server: shutting down
5
Traceback (most recent call last):
6
  File "/opt/cloudera/cm-agent/lib/python2.7/site-packages/cherrypy/process/servers.py", line 225, in _start_http_thread
7
    self.httpserver.start()
8
  File "/opt/cloudera/cm-agent/lib/python2.7/site-packages/cheroot/server.py", line 1326, in start
9
    raise socket.error(msg)
10
error: No socket could be created -- (('47.103.112.221', 9000): [Errno 99] Cannot assign requested address)
11
12
[20/Feb/2020 21:56:44 +0000] 5440 HTTPServer Thread-3 _cplogging   INFO     [20/Feb/2020:21:56:44] ENGINE Bus STOPPING
13
[20/Feb/2020 21:56:44 +0000] 5440 HTTPServer Thread-3 _cplogging   INFO     [20/Feb/2020:21:56:44] ENGINE HTTP Server cherrypy._cpwsgi_server.CPWSGIServer(('creative', 9000)) already shut down
14
[20/Feb/2020 21:56:44 +0000] 5440 HTTPServer Thread-3 _cplogging   INFO     [20/Feb/2020:21:56:44] ENGINE Stopped thread '_TimeoutMonitor'.
15
[20/Feb/2020 21:56:44 +0000] 5440 HTTPServer Thread-3 _cplogging   INFO     [20/Feb/2020:21:56:44] ENGINE Stopped thread 'Autoreloader'.
16
[20/Feb/2020 21:56:44 +0000] 5440 HTTPServer Thread-3 _cplogging   INFO     [20/Feb/2020:21:56:44] ENGINE Bus STOPPED
17
[20/Feb/2020 21:56:44 +0000] 5440 HTTPServer Thread-3 _cplogging   INFO     [20/Feb/2020:21:56:44] ENGINE Bus EXITING
18
[20/Feb/2020 21:56:44 +0000] 5440 HTTPServer Thread-3 _cplogging   INFO     [20/Feb/2020:21:56:44] ENGINE Bus EXITED
19
#scm-agent.log
20
[20/Feb/2020 21:56:35 +0000] 5322 MainThread _cplogging   INFO     [20/Feb/2020:21:56:35] ENGINE Serving on http://127.0.0.1:9001
21
[20/Feb/2020 21:56:35 +0000] 5322 MainThread _cplogging   INFO     [20/Feb/2020:21:56:35] ENGINE Bus STARTED
22
[20/Feb/2020 21:56:37 +0000] 5322 MainThread main         ERROR    Top-level exception: <Fault 40: 'ABNORMAL_TERMINATION: status_server'>
23
Traceback (most recent call last):
24
  File "/opt/cloudera/cm-agent/lib/python2.7/site-packages/cmf/main.py", line 107, in main_impl
25
    ag.start(legacy_supervisor)
26
  File "/opt/cloudera/cm-agent/lib/python2.7/site-packages/cmf/agent.py", line 839, in start
27
    self.supervisor_client.start_process(STATUS_SERVER_PROC)
28
  File "/opt/cloudera/cm-agent/lib/python2.7/site-packages/cmf/util/__init__.py", line 531, in new_fn
29
    return fn(self, *args, **kwargs)
30
  File "/opt/cloudera/cm-agent/lib/python2.7/site-packages/cmf/supervisor.py", line 406, in start_process
31
    raise RetryableProcessException(fault)
32
RetryableProcessException: <Fault 40: 'ABNORMAL_TERMINATION: status_server'>
33
    
34
###查看ip及hostname对应关系
35
[root@creative cloudera-scm-agent]# python -c 'import socket; print socket.getfqdn(), socket.gethostbyname(socket.getfqdn())'
36
creative 47.103.112.221
最终删除agent重新安装用公网ip配置hosts文件映射
creative: IOException thrown while collecting data from host: Connection refused (Connection refused)
#agent.log
[20/Feb/2020 22:48:42 +0000] 11398 MonitorDaemon-Reporter throttling_logger ERROR (10 skipped) Error sending messages to firehose: mgmt-HOSTMONITOR-d592ed6aea0516a09027c2cf834d8979
Traceback (most recent call last):
  File "/opt/cloudera/cm-agent/lib/python2.7/site-packages/cmf/monitor/firehose.py", line 121, in _send
    self._port)
  File "/opt/cloudera/cm-agent/lib/python2.7/site-packages/avro/ipc.py", line 469, in __init__
    self.conn.connect()
  File "/usr/lib64/python2.7/httplib.py", line 833, in connect
    self.timeout, self.source_address)
  File "/usr/lib64/python2.7/socket.py", line 571, in create_connection
    raise err
error: [Errno 111] Connection refused
#/var/log/cloudera-scm-firehose
#activemontor日志
2020-02-20 21:01:43,753 WARN com.cloudera.cmf.BasicScmProxy: Exception while getting current fragments hashes
java.net.ConnectException: Connection refused (Connection refused)
...
2020-02-20 21:02:40,203 INFO com.cloudera.cmon.firehose.Main: Starting Firehose. JVM Args: [-XX:+UseConcMarkSweepGC, -XX:+UseParNewGC, -Dmgmt.log.file=mgmt-cmf-mgmt-ACTIVITYMONITOR-hz-seeing-bg-01.log.out, -Djava.awt.headless=true, -Djava.net.preferIPv4Stack=true, -Dfirehose.schema.dir=/opt/cloudera/cm/schema, -Xms1073741824, -Xmx1073741824, -XX:+HeapDumpOnOutOfMemoryError, -XX:HeapDumpPath=/tmp/mgmt_mgmt-ACTIVITYMONITOR-d592ed6aea0516a09027c2cf834d8979_pid43982.hprof, -XX:OnOutOfMemoryError=/opt/cloudera/cm-agent/service/common/killparent.sh], Args: [--pipeline-type, ACTIVITY_MONITORING_TREE, --mgmt-home, /opt/cloudera/cm], Version: 6.2.0 (#968826 built by jenkins on 20190314-1704 git: 16bbe6211555460a860cf22d811680b35755ea81)
...#hostmontor日志
2020-02-20 21:02:45,838 WARN com.cloudera.cmon.firehose.HMONToSMONHostSubjectRecordPublisher: Failed to send messages to SMON.
java.lang.reflect.UndeclaredThrowableException
        at com.sun.proxy.$Proxy23.writeStatusRecords(Unknown Source)
        at com.cloudera.cmon.firehose.BasicFirehoseClient.writeStatusRecords(BasicFirehoseClient.java:75)
        at com.cloudera.cmon.firehose.HMONToSMONHostSubjectRecordPublisher.processRecords(HMONToSMONHostSubjectRecordPublisher.java:107)
        at com.cloudera.cmon.tstore.leveldb.LDBSubjectRecordStore.write(LDBSubjectRecordStore.java:399)
        at com.cloudera.cmon.kaiser.HMONTestRunner.runHostTestsForSession(HMONTestRunner.java:86)
        at com.cloudera.cmon.kaiser.HMONTestRunner.runTestsForSession(HMONTestRunner.java:66)
        at com.cloudera.cmon.kaiser.BaseTestRunner.runTestsOnAllSubjects(BaseTestRunner.java:143)
        at com.cloudera.cmon.kaiser.KaiserService$KaiserServiceRunner.run(KaiserService.java:138)
        at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.avro.AvroRemoteException: java.net.ConnectException: Connection refused (Connection refused)
33
 
1
creative: IOException thrown while collecting data from host: Connection refused (Connection refused)
2
#agent.log
3
[20/Feb/2020 22:48:42 +0000] 11398 MonitorDaemon-Reporter throttling_logger ERROR (10 skipped) Error sending messages to firehose: mgmt-HOSTMONITOR-d592ed6aea0516a09027c2cf834d8979
4
Traceback (most recent call last):
5
  File "/opt/cloudera/cm-agent/lib/python2.7/site-packages/cmf/monitor/firehose.py", line 121, in _send
6
    self._port)
7
  File "/opt/cloudera/cm-agent/lib/python2.7/site-packages/avro/ipc.py", line 469, in __init__
8
    self.conn.connect()
9
  File "/usr/lib64/python2.7/httplib.py", line 833, in connect
10
    self.timeout, self.source_address)
11
  File "/usr/lib64/python2.7/socket.py", line 571, in create_connection
12
    raise err
13
error: [Errno 111] Connection refused
14
#/var/log/cloudera-scm-firehose
15
#activemontor日志
16
2020-02-20 21:01:43,753 WARN com.cloudera.cmf.BasicScmProxy: Exception while getting current fragments hashes
17
java.net.ConnectException: Connection refused (Connection refused)
18
...
19
2020-02-20 21:02:40,203 INFO com.cloudera.cmon.firehose.Main: Starting Firehose. JVM Args: [-XX:+UseConcMarkSweepGC, -XX:+UseParNewGC, -Dmgmt.log.file=mgmt-cmf-mgmt-ACTIVITYMONITOR-hz-seeing-bg-01.log.out, -Djava.awt.headless=true, -Djava.net.preferIPv4Stack=true, -Dfirehose.schema.dir=/opt/cloudera/cm/schema, -Xms1073741824, -Xmx1073741824, -XX:+HeapDumpOnOutOfMemoryError, -XX:HeapDumpPath=/tmp/mgmt_mgmt-ACTIVITYMONITOR-d592ed6aea0516a09027c2cf834d8979_pid43982.hprof, -XX:OnOutOfMemoryError=/opt/cloudera/cm-agent/service/common/killparent.sh], Args: [--pipeline-type, ACTIVITY_MONITORING_TREE, --mgmt-home, /opt/cloudera/cm], Version: 6.2.0 (#968826 built by jenkins on 20190314-1704 git: 16bbe6211555460a860cf22d811680b35755ea81)
20
...#hostmontor日志
21
2020-02-20 21:02:45,838 WARN com.cloudera.cmon.firehose.HMONToSMONHostSubjectRecordPublisher: Failed to send messages to SMON.
22
java.lang.reflect.UndeclaredThrowableException
23
        at com.sun.proxy.$Proxy23.writeStatusRecords(Unknown Source)
24
        at com.cloudera.cmon.firehose.BasicFirehoseClient.writeStatusRecords(BasicFirehoseClient.java:75)
25
        at com.cloudera.cmon.firehose.HMONToSMONHostSubjectRecordPublisher.processRecords(HMONToSMONHostSubjectRecordPublisher.java:107)
26
        at com.cloudera.cmon.tstore.leveldb.LDBSubjectRecordStore.write(LDBSubjectRecordStore.java:399)
27
        at com.cloudera.cmon.kaiser.HMONTestRunner.runHostTestsForSession(HMONTestRunner.java:86)
28
        at com.cloudera.cmon.kaiser.HMONTestRunner.runTestsForSession(HMONTestRunner.java:66)
29
        at com.cloudera.cmon.kaiser.BaseTestRunner.runTestsOnAllSubjects(BaseTestRunner.java:143)
30
        at com.cloudera.cmon.kaiser.KaiserService$KaiserServiceRunner.run(KaiserService.java:138)
31
        at java.lang.Thread.run(Thread.java:748)
32
Caused by: org.apache.avro.AvroRemoteException: java.net.ConnectException: Connection refused (Connection refused)
33
                                                                                            
smon服务的端口9999和firehose端口9998
1
 
1
smon服务的端口9999和firehose端口9998
经过对比只有server服务器启动9999,9998端口并且agent必须能访问两个端口
而221阿里云机器没法访问IDC170(server)机器9999端口
内网机器才能够,不能经过server公网ip访问,尽管是一台机器
将9999相关的端口绑定成通配符地址:clouderamanagerserver-配置-activemonitor修改成通配符地址
1
 
1
将9999相关的端口绑定成通配符地址:clouderamanagerserver-配置-activemonitor修改成通配符地址
cd /var/log/cloudera-scm-firehose
    #只有hostmonitor报错了activemonitor不报错了
    2020-02-21 11:18:07,529 INFO com.cloudera.cmon.tstore.leveldb.LDBPartitionManager: Opening partition LDBPartitionMetadataWrapper{tableName=ts_subject, partiti
onName=ts_subject_2020-02-11T07:41:01.428Z, startTime=2020-02-11T07:41:01.428Z, endTime=null, version=9, state=CLOSED}
2020-02-21 11:18:07,546 WARN com.cloudera.cmon.firehose.HMONToSMONHostSubjectRecordPublisher: Failed to send messages to SMON.
java.lang.reflect.UndeclaredThrowableException
        at com.sun.proxy.$Proxy23.writeStatusRecords(Unknown Source)
。。。
        at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.avro.AvroRemoteException: java.net.ConnectException: Connection refused (Connection refused)
        at org.apache.avro.ipc.specific.SpecificRequestor.invoke(SpecificRequestor.java:104)
        ... 9 more
Caused by: java.net.ConnectException: Connection refused (Connection refused)
13
 
1
cd /var/log/cloudera-scm-firehose
2
    #只有hostmonitor报错了activemonitor不报错了
3
    2020-02-21 11:18:07,529 INFO com.cloudera.cmon.tstore.leveldb.LDBPartitionManager: Opening partition LDBPartitionMetadataWrapper{tableName=ts_subject, partiti
4
onName=ts_subject_2020-02-11T07:41:01.428Z, startTime=2020-02-11T07:41:01.428Z, endTime=null, version=9, state=CLOSED}
5
2020-02-21 11:18:07,546 WARN com.cloudera.cmon.firehose.HMONToSMONHostSubjectRecordPublisher: Failed to send messages to SMON.
6
java.lang.reflect.UndeclaredThrowableException
7
        at com.sun.proxy.$Proxy23.writeStatusRecords(Unknown Source)
8
。。。
9
        at java.lang.Thread.run(Thread.java:748)
10
Caused by: org.apache.avro.AvroRemoteException: java.net.ConnectException: Connection refused (Connection refused)
11
        at org.apache.avro.ipc.specific.SpecificRequestor.invoke(SpecificRequestor.java:104)
12
        ... 9 more
13
Caused by: java.net.ConnectException: Connection refused (Connection refused)
接着一样操做:勾上便可
MainThread main ERROR Top-level exception: <Fault 40: 'ABNORMAL_TERMINATION: status_server'>
    #查看cloudera-scm-eventserver
2020-02-21 11:34:07,569 INFO org.apache.avro.ipc.NettyServer: [id: 0xe2bcd0eb, /192.168.20.170:51594 => /192.168.20.170:7184] OPEN
2020-02-21 11:34:07,570 INFO org.apache.avro.ipc.NettyServer: [id: 0xe2bcd0eb, /192.168.20.170:51594 => /192.168.20.170:7184] BOUND: /192.168.20.170:7184
2020-02-21 11:34:07,570 INFO org.apache.avro.ipc.NettyServer: [id: 0xe2bcd0eb, /192.168.20.170:51594 => /192.168.20.170:7184] CONNECTED: /192.168.20.170:51594
2020-02-21 11:34:07,576 ERROR com.cloudera.cmf.eventcatcher.server.EventMetricsPublisher: Could not publish metrics to HMON:
java.lang.reflect.UndeclaredThrowableException
。。。
2020-02-21 11:34:07,590 ERROR com.cloudera.cmf.eventcatcher.server.EventMetricsPublisher: Could not publish metrics to SMON:
java.lang.reflect.UndeclaredThrowableException
        at com.sun.proxy.$Proxy22.writeMetrics(Unknown Source)
        at com.cloudera.cmon.firehose.BasicFirehoseClient.writeMetrics(BasicFirehoseClient.java:87)
        at com.cloudera.cmf.eventcatcher.server.EventMetricsPublisher.publishToSMON(EventMetricsPublisher.java:233)
        at com.cloudera.cmf.eventcatcher.server.EventMetricsPublisher.run(EventMetricsPublisher.java:110)
        at com.cloudera.enterprise.PeriodicEnterpriseService$UnexceptionablePeriodicRunnable.run(PeriodicEnterpriseService.java:67)
        at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.avro.AvroRemoteException: java.net.ConnectException: Connection refused (Connection refused)
        at org.apache.avro.ipc.specific.SpecificRequestor.invoke(SpecificRequestor.java:104)
        ... 6 more
Caused by: java.net.ConnectException: Connection refused (Connection refused)
#最后开启servermonitor的通配符,仍是上面的错误查看agent scm-status.log 
[21/Feb/2020 11:57:55 +0000] 16366 MainThread _cplogging   INFO     [21/Feb/2020:11:57:55] ENGINE Started monitor thread '_TimeoutMonitor'.
[21/Feb/2020 11:57:55 +0000] 16366 HTTPServer Thread-3 _cplogging   ERROR    [21/Feb/2020:11:57:55] ENGINE Error in HTTP server: shutting down
Traceback (most recent call last):
  File "/opt/cloudera/cm-agent/lib/python2.7/site-packages/cherrypy/process/servers.py", line 225, in _start_http_thread
    self.httpserver.start()
  File "/opt/cloudera/cm-agent/lib/python2.7/site-packages/cheroot/server.py", line 1326, in start
    raise socket.error(msg)
error: No socket could be created -- (('47.103.112.221', 9000): [Errno 99] Cannot assign requested address)
#supervisord
2020-02-21 11:42:12,122 INFO gave up: status_server entered FATAL state, too many start retries too quickly
2020-02-21 11:57:46,783 INFO spawned: 'status_server' with pid 16328
2020-02-21 11:57:47,355 INFO exited: status_server (exit status 70; not expected)
x
 
1
MainThread main ERROR Top-level exception: <Fault 40: 'ABNORMAL_TERMINATION: status_server'>
2
    #查看cloudera-scm-eventserver
3
2020-02-21 11:34:07,569 INFO org.apache.avro.ipc.NettyServer: [id: 0xe2bcd0eb, /192.168.20.170:51594 => /192.168.20.170:7184] OPEN
4
2020-02-21 11:34:07,570 INFO org.apache.avro.ipc.NettyServer: [id: 0xe2bcd0eb, /192.168.20.170:51594 => /192.168.20.170:7184] BOUND: /192.168.20.170:7184
5
2020-02-21 11:34:07,570 INFO org.apache.avro.ipc.NettyServer: [id: 0xe2bcd0eb, /192.168.20.170:51594 => /192.168.20.170:7184] CONNECTED: /192.168.20.170:51594
6
2020-02-21 11:34:07,576 ERROR com.cloudera.cmf.eventcatcher.server.EventMetricsPublisher: Could not publish metrics to HMON:
7
java.lang.reflect.UndeclaredThrowableException
8
。。。
9
2020-02-21 11:34:07,590 ERROR com.cloudera.cmf.eventcatcher.server.EventMetricsPublisher: Could not publish metrics to SMON:
10
java.lang.reflect.UndeclaredThrowableException
11
        at com.sun.proxy.$Proxy22.writeMetrics(Unknown Source)
12
        at com.cloudera.cmon.firehose.BasicFirehoseClient.writeMetrics(BasicFirehoseClient.java:87)
13
        at com.cloudera.cmf.eventcatcher.server.EventMetricsPublisher.publishToSMON(EventMetricsPublisher.java:233)
14
        at com.cloudera.cmf.eventcatcher.server.EventMetricsPublisher.run(EventMetricsPublisher.java:110)
15
        at com.cloudera.enterprise.PeriodicEnterpriseService$UnexceptionablePeriodicRunnable.run(PeriodicEnterpriseService.java:67)
16
        at java.lang.Thread.run(Thread.java:748)
17
Caused by: org.apache.avro.AvroRemoteException: java.net.ConnectException: Connection refused (Connection refused)
18
        at org.apache.avro.ipc.specific.SpecificRequestor.invoke(SpecificRequestor.java:104)
19
        ... 6 more
20
Caused by: java.net.ConnectException: Connection refused (Connection refused)
21
#最后开启servermonitor的通配符,仍是上面的错误查看agent scm-status.log 
22
[21/Feb/2020 11:57:55 +0000] 16366 MainThread _cplogging   INFO     [21/Feb/2020:11:57:55] ENGINE Started monitor thread '_TimeoutMonitor'.
23
[21/Feb/2020 11:57:55 +0000] 16366 HTTPServer Thread-3 _cplogging   ERROR   [21/Feb/2020:11:57:55] ENGINE Error in HTTP server: shutting down
24
Traceback (most recent call last):
25
  File "/opt/cloudera/cm-agent/lib/python2.7/site-packages/cherrypy/process/servers.py", line 225, in _start_http_thread
26
    self.httpserver.start()
27
  File "/opt/cloudera/cm-agent/lib/python2.7/site-packages/cheroot/server.py", line 1326, in start
28
    raise socket.error(msg)
29
error: No socket could be created -- (('47.103.112.221', 9000): [Errno 99] Cannot assign requested address)
30
#supervisord
31
2020-02-21 11:42:12,122 INFO gave up: status_server entered FATAL state, too many start retries too quickly
32
2020-02-21 11:57:46,783 INFO spawned: 'status_server' with pid 16328
33
2020-02-21 11:57:47,355 INFO exited: status_server (exit status 70; not expected)
34
9000是内网ip绑定,是否是这个缘由=》agent换成内网映射
server 映射是内网ip
server是外网映射

虽然这样可是这台机器显示警告阀值50的时候前面是27 entropy爆红,后面集群本身调节出100的阀值,主机就正常了
最终效果

#补充
159启动cloudera-manager失败发现启动过程当中event-server失败,后面接着三个monitor就失败了
所以查看event-server日志
2020-02-21 23:27:04,647 INFO com.cloudera.enterprise.DebugServer: Running debug HTTP server on 0.0.0.0:8084
2020-02-21 23:27:04,766 ERROR com.cloudera.cmf.eventcatcher.server.EventCatcherService: Error starting EventServer
org.jboss.netty.channel.ChannelException: Failed to bind to: 0.0.0.0/0.0.0.0:7184
        at org.jboss.netty.bootstrap.ServerBootstrap.bind(ServerBootstrap.java:298)
        at org.apache.avro.ipc.CustomNettyServer.<init>(CustomNettyServer.java:76)
        at com.cloudera.cmf.eventcatcher.server.AvroEventStoreServer.<init>(AvroEventStoreServer.java:107)
        at com.cloudera.cmf.eventcatcher.server.EventCatcherService.main(EventCatcherService.java:179)
Caused by: java.net.BindException: Address already in use

1
 
1
2020-02-21 23:27:04,647 INFO com.cloudera.enterprise.DebugServer: Running debug HTTP server on 0.0.0.0:8084
2
2020-02-21 23:27:04,766 ERROR com.cloudera.cmf.eventcatcher.server.EventCatcherService: Error starting EventServer
3
org.jboss.netty.channel.ChannelException: Failed to bind to: 0.0.0.0/0.0.0.0:7184
4
        at org.jboss.netty.bootstrap.ServerBootstrap.bind(ServerBootstrap.java:298)
5
        at org.apache.avro.ipc.CustomNettyServer.<init>(CustomNettyServer.java:76)
6
        at com.cloudera.cmf.eventcatcher.server.AvroEventStoreServer.<init>(AvroEventStoreServer.java:107)
7
        at com.cloudera.cmf.eventcatcher.server.EventCatcherService.main(EventCatcherService.java:179)
8
Caused by: java.net.BindException: Address already in use
netstat -nltpa
1
 
1
netstat -nltpa
#链接等待关闭
ss -ano|grep 7184 #查看进程加上-p就能看到进程号
1
 
1
#链接等待关闭
2
ss -ano|grep 7184 #查看进程加上-p就能看到进程号