一 环境web
1.1 操做系统操作系统
[root@host-xxxsoft]# lsb_release -a
LSB Version: :base-4.0-amd64:base-4.0-noarch:core-4.0-amd64:core-4.0-noarch:graphics-4.0-amd64:graphics-4.0-noarch:printing-4.0-amd64:printing-4.0-noarch
Distributor ID: CentOS
Description: CentOS release 6.6 (Final)
Release: 6.6
Codename: Final
[root@host-xxx soft]# 日志
1.2 zabbix 版本 agent 和server 以及webfront 都市2.4.6server
[wls81@host-xxxx sbin]$ ./zabbix_agent --version
Zabbix agent v2.4.6 (revision 54796) (10 August 2015)
Compilation time: Nov 2 2015 21:29:13
进程
1.3 目前我这边监控了791台虚拟机ip
二 问题虚拟机
特此说明:此问题不是zabbix web页面 出现红色的 zabbix server is not runningio
2.1 web 端监控
页面显示zabbix_server 不在运行配置
zabbixserver 还报以下错误
Less than 25% free in the trends cache
2.2 agent 端日志
28079:20161012:121243.196 active check configuration update from [192.168.176.25:10051] started to fail (cannot connect to [[192.168.176.25]:10051]: [4] Interrupted system call)
28079:20161012:122102.894 active check configuration update from [192.168.176.25:10051] is working again
28079:20161012:130105.458 active check configuration update from [192.168.176.25:10051] started to fail (ZBX_TCP_READ() failed: [4] Interrupted system call)
28079:20161012:153008.930 active check configuration update from [192.168.176.25:10051] is working again
28079:20161012:160811.493 active check configuration update from [192.168.176.25:10051] started to fail (ZBX_TCP_READ() failed: [4] Interrupted system call)
28079:20161013:104855.178 active check configuration update from [192.168.176.25:10051] is working again
28079:20161013:112258.667 active check configuration update from [192.168.176.25:10051] started to fail (cannot connect to [[192.168.176.25]:10051]: [4] Interrupted system call)
而且 从agent端 telent server端 10051 不通
2.3 zabbix server
zabbix_server 进程是活的,端口10051 也是监听的。
三解决思路
仍是看日志
最后是定位这个配置,默认小了致使的。
### Option: TrendCacheSize # Size of trend cache, in bytes. # Shared memory size for storing trends data. # # Mandatory: no # Range: 128K-2G # Default: # TrendCacheSize=4M TrendCacheSize=400M