前面搭建的Nagios服务虽然能显示信息,能报警。可是在企业工做中还会须要一个历史趋势图,跟踪每个业务的长期趋势,而且能以图形的方式展现,例如:根据磁盘的剩余趋势,肯定是否须要提早购买磁盘。php
PNP是一款配合Nagios出图的软件,官方站点为:http://www.pnp4nagios.orghtml
[root@Nagios 6]# yum -y install cairo pango zlib zlib-devel freetype freetype-devel gd gd-devel [root@Nagios 6]# rpm -qa cairo pango zlib zlib-devel freetype freetype-devel gd gd-devel freetype-2.3.11-17.el6.x86_64 zlib-1.2.3-29.el6.x86_64 zlib-devel-1.2.3-29.el6.x86_64 freetype-devel-2.3.11-17.el6.x86_64 gd-2.0.35-11.el6.x86_64 gd-devel-2.0.35-11.el6.x86_64 cairo-1.8.8-6.el6_6.x86_64 pango-1.28.1-11.el6.x86_64 #而后安装rrdtool依赖的libart_lgpl相关软件包,这个软件包要优先于rrdtool安装 [root@Nagios 6]# yum -y install libart_lgpl libart_lgpl-devel [root@Nagios 6]# rpm -qa libart_lgpl libart_lgpl-devel libart_lgpl-2.3.20-5.1.el6.x86_64 libart_lgpl-devel-2.3.20-5.1.el6.x86_64 #PNP工具最终是经过rrdtool实现的画图,所以须要提早安装rrdtool [root@Nagios 6]# yum -y install rrdtool rrdtool-devel [root@Nagios 6]# rpm -qa rrdtool rrdtool-devel rrdtool-1.3.8-10.el6.x86_64 rrdtool-devel-1.3.8-10.el6.x86_64 [root@Nagios 6]# which rrdtool /usr/bin/rrdtool
此处选择0.4.14的PNP版本,若是选择高版本在出图方面可能会有坑,正常状况下,选04版本已经足够了,所以,若是没有特殊需求,建议最好彻底按照书本测试步骤,在弄清楚以后再变通版本。linux
PNP软件没法yum安装,可经过编译的方式进行安装,操做过程以下:ios
[root@Nagios ~]# yum -y install perl-Time-HiRes [root@Nagios ~]# cd nagios/ [root@Nagios nagios]# ll pnp-0.4.14.tar.gz -rw-r--r--. 1 root root 455593 Aug 12 12:22 pnp-0.4.14.tar.gz [root@Nagios nagios]# tar xf pnp-0.4.14.tar.gz -C /usr/src/ [root@Nagios nagios]# cd /usr/src/pnp-0.4.14/ [root@Nagios pnp-0.4.14]# ./configure \ > --with-rrdtool \ > --with-perfdata-dir=/usr/local/nagios/share/perfdata/ [root@Nagios pnp-0.4.14]# make all [root@Nagios pnp-0.4.14]# make install [root@Nagios pnp-0.4.14]# make install-config [root@Nagios pnp-0.4.14]# make install-init [root@Nagios pnp-0.4.14]# ll /usr/local/nagios/libexec/ | grep process -rwxr-xr-x 1 nagios nagios 31813 Aug 19 23:04 process_perfdata.pl
若是configure后出现以下警告信息,请忽略:web
################# # WARNING:The RRDs Perl Modules are not found on your System # Using RRDs will speedup things in larg ##################
PNP提供了一个获取数据出图的Perl脚本,能够用以下命令查到:面试
[root@Nagios pnp-0.4.14]# ll /usr/local/nagios/libexec/ | grep process -rwxr-xr-x 1 nagios nagios 31813 Aug 19 23:04 process_perfdata.pl
1)执行编辑命令vi,须要改nagios.cfg主配置文件833行,将以下参数对应的值从0改成1,表示记录数据。shell
[root@Nagios nagios]# sed -n '833p' /usr/local/nagios/etc/nagios.cfg process_performance_data=0 #默认0,改成1 #而后继续向下大概在845,846行的位置,找到以下两项,取消参数开头的注释。 [root@Nagios nagios]# sed -n '845,846p' /usr/local/nagios/etc/nagios.cfg #host_perfdata_command=process-host-perfdata #取消注释 #service_perfdata_command=process-service-perfdata #取消注释
2)执行编辑命令vi,须要修改commands.cfg配置文件,定义出图获取数据的命令。vim
[root@Nagios nagios]# sed -n '227,238p' /usr/local/nagios/etc/objects/commands.cfg # 'process-host-perfdata' command definition define command{ command_name process-host-perfdata command_line /usr/bin/printf "%b" "$LASTHOSTCHECK$\t$HOSTNAME$\t$HOSTSTATE$\t$HOSTATTEMPT$\t$HOSTSTATETYPE$\t$HOSTEXECUTIONTIME$\t$HOSTOUTPUT$\t$HOSTPERFDATA$\n" >>/usr/local/nagios/var/host-perfdata.out } # 'process-service-perfdata' command definition define command{ command_name process-service-perfdata command_line /usr/bin/printf "%b" "$LASTSERVICECHECK$\t$HOSTNAME$\t$SERVICEDESC$\t$SERVICESTATE$\t$SERVICEATTEMPT$\t$SERVICESTATETYPE$\t$SERVICEEXECUTIONTIME$\t$SERVICELATENCY$\t$SERVICEOUTPUT$\t$SERVICEPERFDATA$\n" >>/usr/local/nagios/var/service-perfdata.out }
如今删除上述的默认配置,而后将其修改成以下的配置内容:浏览器
[root@Nagios nagios]# sed -n '227,238p' /usr/local/nagios/etc/objects/commands.cfg # 'process-host-perfdata' command definition define command{ command_name process-host-perfdata command_line /usr/local/nagios/libexec/process_perfdata.pl } # 'process-service-perfdata' command definition define command{ command_name process-service-perfdata command_line /usr/local/nagios/libexec/process_perfdata.pl }
3)执行检查语法命令bash
[root@Nagios nagios]# /etc/init.d/nagios checkconfig #..以上省略若干... Total Warnings: 0 Total Errors: 0 Things look okay - No serious problems were detected during the pre-flight check OK
4)执行命令使Nagios配置文件生效。
[root@Nagios nagios]# /etc/init.d/nagios reload Running configuration check...done. Reloading nagios configuration...done
5)此时打开浏览器访问“http://192.168.0.200/nagios/pnp/”,应该会出现以下图所示的图形界面,可是没有业务数据显示。
若是同窗们打开出现以下错误:
若是出现上图中的错误,先别着急,可能过一下子从新访问上述地址就会恢复正常。
若是过了很长时间从新访问上述地址还不正常,能够执行以下命令看看,而后再访问试试:
yum -y install php-gd gd gd-devel
7.1结尾的图形是没有具体的业务数据图形趋势的,由于那时尚未为Nagios的各个主机和具体要监控的服务配置获取数据信息,下面是让各个主机或服务获取数据的配置。
若是要让全部的主机获取数据并出趋势图,则需编辑Nagios的主机hosts.cfg文件,不过,只要在每个被监控主机的配置下面增长同一个参数项“process_perf_data 1”便可。操做步骤以下:
[root@Nagios nagios]# cd /usr/local/nagios/etc/objects/ [root@Nagios objects]# cat hosts.cfg # Define a host for the local machine define host{ use linux-server host_name web01 alias web01 address 192.168.0.223 process_perf_data 1 #为web01增长1此行,表示记录web01主机状态数据 } define host{ use linux-server host_name web02 alias web02 address 192.168.0.224 process_perf_data 1 #为Web02增长此行,表示记录web02主机状态数据 } define hostgroup{ hostgroup_name linux-servers alias Linux Servers members web01,web02
若是须要全部的主机对应的服务获取数据并出趋势图,则要编辑Nagios的服务配置文件services.cfg,固然,也只须要在每个对应服务下面增长同一个参数项便可,即“process_perf_data 1”,配置步骤以下:
[root@Nagios objects]# cat /usr/local/nagios/etc/objects/services.cfg define service { use generic-service host_name web01,web02 service_description Disk Partition check_command check_nrpe!check_disk process_perf_data 1 #为每一个service添加此行 } define service { use generic-service host_name web01,web02 service_description Swap Useage check_command check_nrpe!check_swap process_perf_data 1 #为每一个service添加此行 } define service { use generic-service host_name web01,web02 service_description MEM Useage check_command check_nrpe!check_mem process_perf_data 1 #为每一个service添加此行 } define service { use generic-service host_name web01,web02 service_description Current Load check_command check_nrpe!check_load process_perf_data 1 #为每一个service添加此行 } define service { use generic-service host_name web01,web02 service_description Disk lostat check_command check_nrpe!check_iostat!5!11 process_perf_data 1 #为每一个service添加此行 } define service { use generic-service host_name web01,web02 service_description PING check_command check_ping!100.0,20%!500.0,60% process_perf_data 1 #为每一个service添加此行 } #url examples http://www.yunjisuan.com define service { use generic-service host_name web01 service_description www_url check_command check_weburl! -H www.yunjisuan.com process_perf_data 1 #为每一个service添加此行 } define service { use generic-service host_name web01 service_description www_url check_command check_http process_perf_data 1 #为每一个service添加此行 } define service { use generic-service host_name web01 ervice_description www_static_url check_command check_weburl! -H www.yunjisuan.com -u /static/test.html process_perf_data 1 #为每一个service添加此行 } define service { use generic-service host_name web01 service_description www_yunjisuan_url check_command check_weburl! -H www.yunjisuan.com -u "/article/index.phpm=article&a=list&id=670" process_perf_data 1 #为每一个service添加此行 } #tcp examples define service { use generic-service host_name web01 service_description ssh_22 check_command check_tcp! 22 process_perf_data 1 #为每一个service添加此行 } define service { use generic-service host_name web01 service_description http_80 check_command check_tcp! 80 process_perf_data 1 #为每一个service添加此行 }
因为每一个主机对应的服务内容太多了,所以能够采起在全部服务对应的统一模板里添加配置参数的方式,这样可以使全部的服务均可以生效。这里每一个服务使用的模板就是由服务里的“use generic-service”这个选项肯定的,查看与模板文件里服务模板generic-service名对应的服务参数:
[root@Nagios objects]# sed -n '154,177p' /usr/local/nagios/etc/objects/templates.cfg | awk -F ";" '{print $1}' name generic-service active_checks_enabled 1 passive_checks_enabled 1 parallelize_check 1 obsess_over_service 1 check_freshness 0 notifications_enabled 1 event_handler_enabled 1 flap_detection_enabled 1 failure_prediction_enabled 1 process_perf_data 1 retain_status_information 1 retain_nonstatus_information 1 is_volatile 0 check_period 24x7 max_check_attempts 3 normal_check_interval 10 retry_check_interval 2 contact_groups admins notification_options w,u,c,r notification_interval 60 notification_period 24x7 register 0 }
提示:
为了看的清晰,这里去掉了全部注释,服务的模板里默认已经配置了“process_perf_data 1”,即凡是使用templates.cfg模板文件里名字为generic-service的模板,均做为服务的模板,这样就至关于全部服务都执行generic-service模板里的配置了。
配置完毕重启Nagios服务:
[root@Nagios objects]# /etc/init.d/nagios reload Running configuration check...done. Reloading nagios configuration...done
到此,若是等一段时间,而后查看PNP URL就能够发现生成了图形数据,有些数据须要压力测试或者真实环境才能看到,例如主机负载等。趋势图以下图所示:
在整合PNP URL超连接到Nagios Web界面后,会在全部的主机或主机对应服务的前面,出现一个闪电样的超连接1图标,单击超连接,就能够查看到对应的主机或服务实际的监控状态趋势图。
默认状况PNP的URL为http://192.168.0.200/nagios/pnp/index.php和Nagios不在一个界面里,因此查看主机或服务对应的趋势图很费劲。那么如何完善呢?
咱们能够直接在host.cfg里在但愿出图的主机里配置以下一行参数:
action_url /nagios/pnp/index.php?host=$HOSTNAME$ #实际上就是给URL传个主机参数
而后编辑host.cfg,增长上述配置。配置结果以下:
[root@Nagios objects]# cat /usr/local/nagios/etc/objects/hosts.cfg # Define a host for the local machine define host{ use linux-server host_name web01 alias web01 address 192.168.0.223 process_perf_data 1 action_url /nagios/pnp/index.php?host=$HOSTNAME$ #添加超连接图标 } define host{ use linux-server host_name web02 alias web02 address 192.168.0.224 process_perf_data 1 action_url /nagios/pnp/index.php?host=$HOSTNAME$ #添加超连接图标 } define hostgroup{ hostgroup_name linux-servers alias Linux Servers members web01,web02 }
接着,检查语法从新加载Nagios
[root@Nagios objects]# /etc/init.d/nagios reload Running configuration check...done. Reloading nagios configuration...done
若是配置过程都正确,打开浏览器访问Nagios界面,最终能够看到以下图所示的图形。图中,右边方框里标记的白色方格里,中间带波浪线的就是超连接图标。单击进去便可看到一个主机全部的服务图。
和上述主机添加超连接图标的配置几乎同样,执行“vi /usr/local/nagios/etc/objects/services.cfg”,添加以下内容:
action_url /nagios/pnp/index.php?host=$HOSTNAME$&srv=$SERVICEDESC$ #实际上就是给URL传了一个主机的参数和一个主机对应服务的参数
给具体服务增长超连接配置方法是,直接在define service {}大括号中增长参数便可,具体配置的内容以下“action_url参数部分”:
[root@Nagios objects]# cat /usr/local/nagios/etc/objects/services.cfg define service { use generic-service host_name web01,web02 service_description Disk Partition check_command check_nrpe!check_disk process_perf_data 1 action_url /nagios/pnp/index.php?host=$HOSTNAME$&srv=$SERVICEDESC$ #给具体服务增长超连接配置 } define service { use generic-service host_name web01,web02 service_description Swap Useage check_command check_nrpe!check_swap process_perf_data 1 action_url /nagios/pnp/index.php?host=$HOSTNAME$&srv=$SERVICEDESC$ #给具体服务增长超连接配置 } define service { use generic-service host_name web01,web02 service_description MEM Useage check_command check_nrpe!check_mem process_perf_data 1 action_url /nagios/pnp/index.php?host=$HOSTNAME$&srv=$SERVICEDESC$ #给具体服务增长超连接配置 } define service { use generic-service host_name web01,web02 service_description Current Load check_command check_nrpe!check_load process_perf_data 1 } define service { use generic-service host_name web01,web02 service_description Disk lostat check_command check_nrpe!check_iostat!5!11 process_perf_data 1 } define service { use generic-service host_name web01,web02 service_description PING check_command check_ping!100.0,20%!500.0,60% process_perf_data 1 } #url examples http://www.yunjisuan.com define service { use generic-service host_name web01 service_description www_url check_command check_weburl! -H www.yunjisuan.com process_perf_data 1 } define service { use generic-service host_name web01 service_description www_url check_command check_http process_perf_data 1 } define service { use generic-service host_name web01 service_description www_static_url check_command check_weburl! -H www.yunjisuan.com -u /static/test.html process_perf_data 1 } define service { use generic-service host_name web01 service_description www_yunjisuan_url check_command check_weburl! -H www.yunjisuan.com -u "/article/index.phpm=article&a=list&id=670" process_perf_data 1 } #tcp examples define service { use generic-service host_name web01 service_description ssh_22 check_command check_tcp! 22 process_perf_data 1 } define service { use generic-service host_name web01 service_description http_80 check_command check_tcp! 80 process_perf_data 1 }
配置完成后的效果图以下:
也能够快速设置让所有的服务出图,找到templates.cfg模板文件,找到默认的服务名generic-service,在这个服务名大括号的内部结尾增长“action_url /nagios/pnp/index.php?host=$HOSTNAME$&srv=$SERVICEDESC$ ”一行便可。
[root@Nagios objects]# sed -n '153,178p' /usr/local/nagios/etc/objects/templates.cfg | awk -F ";" '{print $1}' define service{ name generic-service active_checks_enabled 1 passive_checks_enabled 1 parallelize_check 1 obsess_over_service 1 check_freshness 0 notifications_enabled 1 event_handler_enabled 1 flap_detection_enabled 1 failure_prediction_enabled 1 process_perf_data 1 retain_status_information 1 retain_nonstatus_information 1 is_volatile 0 check_period 24x7 max_check_attempts 3 normal_check_interval 10 retry_check_interval 2 contact_groups admins notification_options w,u,c,r notification_interval 60 notification_period 24x7 register 0 action_url /nagios/pnp/index.php?host=$HOSTNAME$&srv=$SERVICEDESC$ #在最后添加此行 }
这样全部主机的全部服务都将增长出图的超连接图标了。
如今,人要检查语法并从新加载Nagios
[root@Nagios objects]# /etc/init.d/nagios reload Running configuration check...done. Reloading nagios configuration...done
所有主机和服务的监控图最终结果以下图所示:
此时,单击任意一个超连接图标,就能够查看对应的主机或服务的业务趋势图了,到此,Nagios的主机和服务出图的配置就完成了,是否是很简单?
想真正绘制出业务的趋势图全靠下面命令生成的数据。这些历史数据要备份好。
[root@Nagios objects]# ll /usr/local/nagios/share/perfdata/ total 8 drwxr-xr-x 2 nagios nagios 4096 Aug 20 02:10 web01 drwxr-xr-x 2 nagios nagios 4096 Aug 20 02:03 web02 [root@Nagios objects]# tree /usr/local/nagios/share/perfdata/ /usr/local/nagios/share/perfdata/ |-- web01 | |-- Current_Load.rrd | |-- Current_Load.xml | |-- Disk_Partition.rrd | |-- Disk_Partition.xml | |-- Disk_lostat.rrd | |-- Disk_lostat.xml | |-- MEM_Useage.rrd | |-- MEM_Useage.xml | |-- PING.rrd | |-- PING.xml | |-- Swap_Useage.rrd | |-- Swap_Useage.xml | |-- http_80.rrd | |-- http_80.xml | |-- ssh_22.rrd | |-- ssh_22.xml | |-- www_static_url.rrd | |-- www_static_url.xml | |-- www_url.rrd | |-- www_url.xml | |-- www_yunjisuan_url.rrd | `-- www_yunjisuan_url.xml `-- web02 |-- PING.rrd `-- PING.xml 2 directories, 24 files
要将Nagios故障报警给管理员时,经常使用的方式包括邮件报警和手机报警,下面分别介绍
- 普通邮件报警就是在故障发生或恢复时,将报警信息发到系统管理员或相关维护人员的信箱中,通常来讲最好使用公司内部信箱做为报警信箱。同窗们回家学习测试时若是用QQ,126等信箱可能会有收不到邮件的状况或者被看成垃圾邮件了。
- 通常白天上班时,邮件报警还算比较及时,可是若是人不在计算机旁,邮件报警就不行了,所以,邮件报警只适合不是特别重要的业务,或者做为发送大量报警信息中的一个辅助方式,如硬盘,内存,及日志相关等不须要及时解决的服务报警。故而,在生产环境中,邮件报警通常会结合其余报警方式一块儿使用。
- 那么,下面就来看一下邮件报警的基本配置方法。
首先,添加监控报警的接收Email地址
[root@Nagios objects]# sed -n '35p' /usr/local/nagios/etc/objects/contacts.cfg | awk -F ";" '{print $1}' email 215379068@qq.com #将本行内容改为你的QQ邮箱
打开postfix服务
[root@Nagios objects]# /etc/init.d/postfix start Starting postfix: [ OK ] [root@Nagios objects]# echo "/etc/init.d/postfix start" >> /etc/rc.local [root@Nagios objects]# tail -3 /etc/rc.local touch /var/lock/subsys/local /etc/init.d/nagios start /etc/init.d/postfix start
用命令测试发邮件:
[root@Nagios objects]# echo "this is test email" | mail -s "yunjisuan" 215379068@qq.com #将邮件从QQ拦截名单取出,而后添加白名单
特别警示!
同窗们在家玩Nagios必定要用本身的QQ玩,谁给我发,我和谁急-_-!
templates.cfg系统定义模板
#模板:generic-service [root@Nagios objects]# sed -n '153,178p' /usr/local/nagios/etc/objects/templates.cfg | awk -F ";" '{print $1}' define service{ name generic-service active_checks_enabled 1 passive_checks_enabled 1 parallelize_check 1 obsess_over_service 1 check_freshness 0 notifications_enabled 1 event_handler_enabled 1 flap_detection_enabled 1 failure_prediction_enabled 1 process_perf_data 1 retain_status_information 1 retain_nonstatus_information 1 is_volatile 0 check_period 24x7 #告诉Nagios检查服务的时间段 max_check_attempts 3 #对Nagios服务的最大检查次数 normal_check_interval 10 #两次检查的时间间隔 retry_check_interval 2 #从新检查时间间隔 contact_groups admins #指定联系人主 notification_options w,u,c,r #定义何种异常能够被通知(email),w即warn表示警告状态,r即recover,表示恢复状态 notification_interval 60 #服务出现异常,故障一直没解决,Nagios再次对联系人发出通知的时间间隔 notification_period 24x7 #指定email的时间段 register 0 action_url /nagios/pnp/index.php?host=$HOSTNAME$&srv=$SERVICEDESC$ } #模板:generic-contact [root@Nagios objects]# sed -n '28,37p' /usr/local/nagios/etc/objects/templates.cfg | awk -F ";" '{print $1}' define contact{ name generic-contact #联系人名称 service_notification_period 24x7 #服务异常,发送通知时间段 host_notification_period 24x7 #主机异常,发送通知时间段 service_notification_options w,u,c,r,f,s #何种异常进行通知 host_notification_options d,u,r,f,s #何种异常进行通知 service_notification_commands notify-service-by-email #定义服务异常发送邮件命令,commands.cfg文件里定义 host_notification_commands notify-host-by-email #定义主机异常发送邮件命令,commands.cfg文件里定义 register 0 }
commands.cfg命令定义模板
#定义发送邮件命令 [root@Nagios objects]# sed -n '27,37p' commands.cfg # 'notify-host-by-email' command definition define command{ command_name notify-host-by-email #主机异常发送邮件命令的定义 command_line /usr/bin/printf "%b" "***** Nagios *****\n\nNotification Type: $NOTIFICATIONTYPE$\nHost: $HOSTNAME$\nState: $HOSTSTATE$\nAddress: $HOSTADDRESS$\nInfo: $HOSTOUTPUT$\n\nDate/Time: $LONGDATETIME$\n" | /bin/mail -s "** $NOTIFICATIONTYPE$ Host Alert: $HOSTNAME$ is $HOSTSTATE$ **" $CONTACTEMAIL$ } # 'notify-service-by-email' command definition define command{ command_name notify-service-by-email #服务异常发送邮件命令的定义 command_line /usr/bin/printf "%b" "***** Nagios *****\n\nNotification Type: $NOTIFICATIONTYPE$\n\nService: $SERVICEDESC$\nHost: $HOSTALIAS$\nAddress: $HOSTADDRESS$\nState: $SERVICESTATE$\n\nDate/Time: $LONGDATETIME$\n\nAdditional Info:\n\n$SERVICEOUTPUT$\n" | /bin/mail -s "** $NOTIFICATIONTYPE$ Service Alert: $HOSTALIAS$/$SERVICEDESC$ is $SERVICESTATE$ **" $CONTACTEMAIL$ }
contacts.cfg联系人定义模板
[root@Nagios objects]# cat contacts.cfg | egrep -v "#|^$" | awk -F ";" '{print $1}' define contact{ contact_name nagiosadmin #定义成员 use generic-contact alias Nagios Admin #成员别名 email 215379068@qq.com #成员邮箱 } define contactgroup{ contactgroup_name admins #联系人组名 alias Nagios Administrators #别名 members nagiosadmin #组员名单定义 }
前文在部署Nagios服务时已经安装了nagios-plugins-1.4.16.tar.gz,这个软件包就是Nagios的插件安装包,安装后,执行ls -l /usr/local/nagios/libexec能够看到以下插件内容:
[root@Nagios objects]# ls -l /usr/local/nagios/libexec/ total 5288 lrwxrwxrwx 1 root root 27 Aug 18 08:29 check_111 -> /service/scripts/check_test -rwxr-xr-x. 1 nagios nagios 376524 Aug 14 10:11 check_apt -rwxr-xr-x. 1 nagios nagios 2245 Aug 14 10:11 check_breeze -rwxr-xr-x. 1 nagios nagios 128296 Aug 14 10:11 check_by_ssh lrwxrwxrwx. 1 root root 9 Aug 14 10:11 check_clamd -> check_tcp -rwxr-xr-x. 1 nagios nagios 85694 Aug 14 10:11 check_cluster -r-sr-xr-x. 1 root nagios 123603 Aug 14 10:11 check_dhcp -rwxr-xr-x. 1 nagios nagios 417895 Aug 14 10:11 check_disk -rwxr-xr-x. 1 nagios nagios 9148 Aug 14 10:11 check_disk_smb -rwxr-xr-x. 1 nagios nagios 80689 Aug 14 10:11 check_dummy -rwxr-xr-x. 1 nagios nagios 3056 Aug 14 10:11 check_file_age -rwxr-xr-x. 1 nagios nagios 6318 Aug 14 10:11 check_flexlm lrwxrwxrwx. 1 root root 9 Aug 14 10:11 check_ftp -> check_tcp -rwxr-xr-x. 1 nagios nagios 520614 Aug 14 10:11 check_http -r-sr-xr-x. 1 root nagios 133689 Aug 14 10:11 check_icmp -rwxr-xr-x. 1 nagios nagios 93416 Aug 14 10:11 check_ide_smart -rwxr-xr-x. 1 nagios nagios 15137 Aug 14 10:11 check_ifoperstatus -rwxr-xr-x. 1 nagios nagios 12601 Aug 14 10:11 check_ifstatus lrwxrwxrwx. 1 root root 9 Aug 14 10:11 check_imap -> check_tcp -rwxr-xr-x. 1 nagios nagios 6890 Aug 14 10:11 check_ircd lrwxrwxrwx. 1 root root 9 Aug 14 10:11 check_jabber -> check_tcp -rwxr-xr-x. 1 nagios nagios 106573 Aug 14 10:11 check_load -rwxr-xr-x. 1 nagios nagios 6020 Aug 14 10:11 check_log -rwxr-xr-x. 1 nagios nagios 20287 Aug 14 10:11 check_mailq -rwxr-xr-x. 1 nagios nagios 93142 Aug 14 10:11 check_mrtg -rwxr-xr-x. 1 nagios nagios 92487 Aug 14 10:11 check_mrtgtraf -rwxr-xr-x. 1 nagios nagios 105606 Aug 14 10:11 check_nagios lrwxrwxrwx. 1 root root 9 Aug 14 10:11 check_nntp -> check_tcp lrwxrwxrwx. 1 root root 9 Aug 14 10:11 check_nntps -> check_tcp -rwxrwxr-x. 1 nagios nagios 76744 Aug 14 10:32 check_nrpe -rwxr-xr-x. 1 nagios nagios 127679 Aug 14 10:11 check_nt -rwxr-xr-x. 1 nagios nagios 130078 Aug 14 10:11 check_ntp -rwxr-xr-x. 1 nagios nagios 119167 Aug 14 10:11 check_ntp_peer -rwxr-xr-x. 1 nagios nagios 117728 Aug 14 10:11 check_ntp_time -rwxr-xr-x. 1 nagios nagios 159372 Aug 14 10:11 check_nwstat -rwxr-xr-x. 1 nagios nagios 8324 Aug 14 10:11 check_oracle -rwxr-xr-x. 1 nagios nagios 108934 Aug 14 10:11 check_overcr -rwxr-xr-x. 1 nagios nagios 132691 Aug 14 10:11 check_ping -rwxr-xr-x 1 nagios nagios 6184 Aug 19 23:04 check_pnp_rrds.pl lrwxrwxrwx. 1 root root 9 Aug 14 10:11 check_pop -> check_tcp -rwxr-xr-x. 1 nagios nagios 396833 Aug 14 10:11 check_procs -rwxr-xr-x. 1 nagios nagios 106492 Aug 14 10:11 check_real -rwxr-xr-x. 1 nagios nagios 9584 Aug 14 10:11 check_rpc -rwxr-xr-x. 1 nagios nagios 1412 Aug 14 10:11 check_sensors lrwxrwxrwx. 1 root root 9 Aug 14 10:11 check_simap -> check_tcp -rwxr-xr-x. 1 nagios nagios 446511 Aug 14 10:11 check_smtp lrwxrwxrwx. 1 root root 9 Aug 14 10:11 check_spop -> check_tcp -rwxr-xr-x. 1 nagios nagios 103000 Aug 14 10:11 check_ssh lrwxrwxrwx. 1 root root 9 Aug 14 10:11 check_ssmtp -> check_tcp -rwxr-xr-x. 1 nagios nagios 108233 Aug 14 10:11 check_swap -rwxr-xr-x. 1 nagios nagios 160386 Aug 14 10:11 check_tcp -rwxr-xr-x. 1 nagios nagios 105022 Aug 14 10:11 check_time lrwxrwxrwx. 1 root root 9 Aug 14 10:11 check_udp -> check_tcp -rwxr-xr-x. 1 nagios nagios 117534 Aug 14 10:11 check_ups -rwxr-xr-x. 1 nagios nagios 83434 Aug 14 10:11 check_users -rwxr-xr-x. 1 nagios nagios 2939 Aug 14 10:11 check_wave -rwxr-xr-x. 1 nagios nagios 109723 Aug 14 10:11 negate -rwxr-xr-x 1 nagios nagios 31813 Aug 19 23:04 process_perfdata.pl -rwxr-xr-x. 1 nagios nagios 103242 Aug 14 10:11 urlize -rwxr-xr-x. 1 nagios nagios 1904 Aug 14 10:11 utils.pm -rwxr-xr-x. 1 nagios nagios 2728 Aug 14 10:11 utils.sh
提示:
默认安装后大概有60个左右的插件,数量比较多,这里只介绍几个常见的。
以上结果内容都是Nagios插件,如今你们应该对Nagios插件有一个基本的了解了。其实,Nagios软件自己仅仅是一个监控的平台,若是要监控具体的主机及服务的状态和数据信息,还必须配置或调用插件或程序文件才能完成任务,所以,若是没有Nagios插件,Nagios就是一个空壳,啥都作不了。
- 既然已经安装了Nagios的插件软件包,为何还要开发Nagios插件呢?
- 首先想说明的是,在生产场景中经常使用的大部分服务都是不须要编写插件就能够完成监控的,check_http,check_tcp,check_nrpe等这些自带的插件已经很强大了。可是,仍然有部分咱们想要监控的服务,是Nagios未自带插件的,如:监控LVS RS的lo网卡的VIP,监控NFS状态,又或是监控iostat,mem,sar系统指标及相关APP应用(MQ队列)等。这个时候咱们有两个选择,一个是去网上搜索,看看有没有别人写过的脚本,拿来使用或修改后使用;另外就是本身开发编写脚本。这里建议你们学会手工编写插件,若是开始不会写,能够把网上别人分享的插件拿来改,改着改着就会写了。
- 若是要开发插件,最好掌握一门开发语言,例如:Shell,Python。
- Nagios插件是Nagios提供的一种经过可扩展的方式部署的程序组件,该插件可经过Shell,Java,C/C++,PHP等多种语言开发,运维或者系统架构人员只要经过修改Nagios配置文件和相应参数,就能很方便的将该插件集成到Nagios中,实现对目标系统的监控。
- Nagios服务为1插件程序提供了两个返回值接口和插件交互:一个是插件执行后的退出状态码,另外一个是插件执行过程当中在控制台打印的1第一行数据。退出状态码能够被Nagios主程序做为判断被监控系统服务状态的依据,控制台打印的第一行数据能够被Nagios主程序做为被监控系统服务状态的补充说明,会显示在Web管理页面,以下图所示:
为了管理Nagios插件,Nagios每查询一个服务的状态时,就会产生一个子进程,并使用来自该命令的输出和退出代码来肯定其具体的状态。Nagios主程序可识别的插件的退出状态码和说明以下:
注意:
此处数字代码的含义曾经有公司面试时考过。最后一种状态一般表示该插件没法肯定服务的状态。例如,可能出现了网络或内部错误。相关状态能够从以下文件中看到:
[root@Nagios objects]# head -7 /usr/local/nagios/libexec/utils.sh #! /bin/sh STATE_OK=0 STATE_WARNING=1 STATE_CRITICAL=2 STATE_UNKNOWN=3 STATE_DEPENDENT=4 #提示:结尾处比列举的还多个状态,但不经常使用
Nagios插件程序中须要调用监控服务规定的操做序列,并根据预先定义的规则,对返回结果进行分析,判断服务的当前状态,而后以指定的状态码退出程序,同时将对该状态的说明不换行输出到控制台。
Nagios的插件开发不限制任何开发语言,只要该插件能被Nagios调用,并获取到相应业务数据就OK,如能在命令行执行输出结果也能够,经常使用的插件语言有Shell,Perl,Python,PHP, C/C++。
如下脚本只是针对访问客户端192.168.0.223的IP的
[root@Nagios libexec]# cat check_url.sh #!/bin/bash # anthor:Mr.chen by 2017-8-20 wget -T 10 --spider 192.168.0.223 >/dev/null 2>&1 #用wget检查192.168.0.223是否是能够访问,-T超时时间 --spider不下载网页 if [ $? -eq 0 ];then #判断上述wget命令返回值,0成功非0失败 echo "URL 192.168.0.223 OK" exit 0 else echo "URL 192.168.0.223 CRITICAL" exit 2 fi
下面利用传参把脚本改进为通用的WebURL插件
[root@Nagios libexec]# cat check_url.sh #!/bin/bash # anthor:Mr.chen by 2017-8-20 PROGNAME=`basename $0` #取脚本名 PROGPATH=`dirname $0` #取脚本路径 usage(){ #打印帮助 echo "Usage: /bin/sh ${PROGPATH}/${PROGNAME} url" exit 1 } [ $# -ne 1 ] && usage #参数个数不是1,打印帮助 wget -T 10 --spider $1 >/dev/null 2>&1 #URL地址改为传参 if [ $? -eq 0 ];then echo "URL $1 OK" exit 0 else echo "URL $1 CRITICAL" exit 2 fi
如下是监控WebURL的插件脚本专业型写法
[root@Nagios libexec]# cat check_url.sh #!/bin/bash # anthor:Mr.chen by 2017-8-20 PROGNAME=`basename $0` PROGPATH=`dirname $0` usage(){ echo "Usage: /bin/sh ${PROGPATH}/${PROGNAME} url" exit 1 } [ $# -ne 1 ] && usage . $PROGPATH/utils.sh if wget -T 20 --spider $1 >/dev/null 2>&1;then echo "URL $1 OK" exit $STATE_OK else echo "URL $1 NO" exit $STATE_CRITICAL fi
最后手工测试如下改进的WebURL插件脚本
[root@Nagios libexec]# sh /usr/local/nagios/libexec/check_url.sh www.yunjisuan.com URL www.yunjisuan.com OK [root@Nagios libexec]# echo $? 0 [root@Nagios libexec]# sh /usr/local/nagios/libexec/check_url.sh bbs.yunjisuan.com URL bbs.yunjisuan.com OK [root@Nagios libexec]# echo $? 0 [root@Nagios libexec]# sh /usr/local/nagios/libexec/check_url.sh blog.yunjisuan.com URL blog.yunjisuan.com NO [root@Nagios libexec]# echo $? 2
Nagios主动模式监控和Nagios客户端的nrpe进程没有关系。
主动模式的全部操做彻底在Nagios主服务器上进行。部署步骤以下:
(1)开发check_url.sh,放到/usr/local/nagios/libexec中,受权为可执行
root@Nagios libexec]# cd /usr/local/nagios/libexec/ [root@Nagios libexec]# chmod +x check_url.sh [root@Nagios libexec]# ll check_url.sh -rwxr-xr-x 1 root root 337 Aug 20 06:38 check_url.sh [root@Nagios libexec]# cat check_url.sh #!/bin/bash # anthor:Mr.chen by 2017-8-20 PROGNAME=`basename $0` PROGPATH=`dirname $0` usage(){ echo "Usage: /bin/sh ${PROGPATH}/${PROGNAME} url" exit 1 } [ $# -ne 1 ] && usage . $PROGPATH/utils.sh if wget -T 20 --spider $1 >/dev/null 2>&1;then echo "URL $1 OK" exit $STATE_OK else echo "URL $1 NO" exit $STATE_CRITICAL fi
(2)在commands.cfg中创建check_url命令:
[root@Nagios objects]# cd /usr/local/nagios/etc/objects/ [root@Nagios objects]# tail -7 commands.cfg # 'check_url' command definition by Mr.chen define command { command_name check_url command_line $USER1$/check_url.sh 192.168.0.223 #加载脚本并传参数 } #提示:$USER1$是Nagios默认变量,为/usr/local/nagios/libexec
(3)在services.cfg里添加监控上述URL地址的服务
能够将服务直接添加进services里也能够,写一个子服务的配置文件,写在/usr/local/nagios/etc/objects/services目录里
#建立须要监控的子服务配置文件 [root@Nagios objects]# pwd /usr/local/nagios/etc/objects [root@Nagios objects]# cd services [root@Nagios services]# pwd /usr/local/nagios/etc/objects/services [root@Nagios services]# vim check_url.cfg [root@Nagios services]# cat check_url.cfg define service { use generic-service host_name web01 service_description http_zhudong_url check_command check_url }
因为/usr/local/nagios/etc/objects/services/*已经被nagios.cfg主配置文件引用,所以无需在include进service.cfg配置文件。
[root@Nagios etc]# cat nagios.cfg | grep "/usr/local/nagios/etc/objects/services" cfg_file=/usr/local/nagios/etc/objects/services.cfg cfg_dir=/usr/local/nagios/etc/objects/services
各个配置文件与Nagios.cfg主配置文件的关系以下图所示:
(4)从新加载Nagios,查看结果
[root@Nagios etc]# /etc/init.d/nagios reload Running configuration check...done. Reloading nagios configuration...done
(5)查看Nagios服务页面监控结果,以下图所示:
备注:
Web01服务器须要可以提供http协议的web访问。
等待刷新....
Nagios被动模式下的全部插件都须要部署在被监控的Nagios客户端。部署步骤以下。
1)在Nagios客户端web01上取/etc/passwd的文件指纹,即md5值。
[root@web01 ~]# md5sum /etc/passwd >/opt/ps.md5 [root@web01 ~]# cat /opt/ps.md5 3660c548ce618df6c066f0db6bedd2af /etc/passwd #记住这个校验码
2)在Nagios客户端web01上开发插件脚本,并测试
#请注意这是在web01客户端的操做 [root@web01 ~]# cd /usr/local/nagios/libexec/ [root@web01 libexec]# vim check_passwd [root@web01 libexec]# vim check_passwd [root@web01 libexec]# cat check_passwd #!/bin/bash # author:Mr.chen by 2017-8-20 OriMd5="3660c548ce618df6c066f0db6bedd2af" #以前记录的校验码 CurrMd5=`md5sum /etc/passwd | cut -c 1-32` #每次都从新生成校验码 if [ "$OriMd5" == "$CurrMd5" ];then echo "/etc/passwd:OK" exit 0 else echo "/etc/passwd:FAILED" exit 2 fi [root@web01 libexec]# sh check_passwd /etc/passwd:OK [root@web01 libexec]# chmod +x check_passwd #提示:还能够用md5sum -c /opt/ps.md5的方法比较
3)在Nagios客户端web01上编辑nrpe.cfg,插入以下的内容后保存
root@web01 libexec]# cd /usr/local/nagios/etc/ [root@web01 etc]# vim nrpe.cfg [root@web01 etc]# tail -1 nrpe.cfg #在文件末尾加入以下内容 command[check_passwd]=/usr/local/nagios/libexec/check_passwd
4)在Nagios客户端web01上重启nrpe,并检查是否重启成功(check_nrpe检验)
[root@web01 etc]# ps -ef | grep nrpe | grep -v grep nagios 1027 1 0 Aug18 ? 00:00:05 /usr/local/nagios/bin/nrpe -c /usr/local/nagios/etc/nrpe.cfg -d [root@web01 etc]# pkill nrpe [root@web01 etc]# ps -ef | grep nrpe | grep -v grep [root@web01 etc]# /usr/local/nagios/bin/nrpe -c /usr/local/nagios/etc/nrpe.cfg -d [root@web01 etc]# ps -ef | grep nrpe | grep -v grep nagios 4362 1 0 06:33 ? 00:00:00 /usr/local/nagios/bin/nrpe -c /usr/local/nagios/etc/nrpe.cfg -d
5)在Nagios服务器端nagios-server上进入service目录,建立配置文件check_passwd_web01.cfg
#请注意这里是Nagios服务器端的操做 [root@Nagios ~]# cd /usr/local/nagios/etc/objects/services [root@Nagios services]# vim check_passwd_web01.cfg [root@Nagios services]# cat check_passwd_web01.cfg define service { use generic-service service_description check_passwd check_command check_nrpe!check_passwd #这里的check_passwd就是Nagios客户端nrpe.cfg里command[check_passwd]=/usr/local/nagios/libexec/check_passwd配置的中括号命令名check_passwd }
6)在Nagios服务器端检查语法
[root@Nagios services]# /etc/init.d/nagios checkconfig #以上省略若干.... Total Warnings: 0 Total Errors: 0 Things look okay - No serious problems were detected during the pre-flight check OK.
7)在Nagios服务器端加载Nagios配置,而后打开Nagios页面查看
[root@Nagios services]# /etc/init.d/nagios reload Running configuration check...done. Reloading nagios configuration...done
等待刷新....