最近线上遇到了悲催的事情:fastdfs的存储服务器其中一块磁盘坏了(存储分区变成read only),但是zabbix监控没有对此进行监控,结果致使客户端上传失败。最后发如今2天前就变成只读了。虽然数据存储有冗余的,影响不大,不过仍是很不爽,没有及时发现问题。针对这个状况,写了个小脚本以实现storage更新延迟高于特定值(如2分钟)就报警。
经过fdfs_monitor来查看全部fastdfs的storage状态信息,更新时间延迟等,思路是经过执行结果last_synced_timestamp的uptime时间值。对Active状态及延迟时间进行监控。脚本以下:php
#!/bin/bash
#storage synchronous delay alarm scripts
# Richard shen 2012/07/11
# BLOG: http://lxsym.blog.51cto.com
Basedir=`dirname $0`
Now_time=`date +%s`
Active=$Basedir/active.txt
IP=$Basedir/ip.txt
Syn_time=$Basedir/syn_time.txt
COMMAND="/usr/local/webserver/fdfs/bin/fdfs_monitor /usr/local/webserver/fdfs/etc/client.conf"
$COMMAND | grep "(" | awk '/ip_addr/{print $5}' >$Active
$COMMAND | grep "(" | awk '/ip_addr/{print $3}' >$IP
$COMMAND | grep last_synced_timestamp | awk '{ print $3,$4}' >$Syn_time
paste $Syn_time $IP $Active > main.log
cat main.log | while read day time ip active
do
sys_time=`date -d "$day $time" +%s`
num=`expr $Now_time - $sys_time`
#Stuts alarm
if [ $active != "ACTIVE" ];then
#邮件报警API,
# echo "$ip State is $active,please check."
fi
#Set alarm time (eg 2m(120s))
if [ $num -gt 120 ];then
#邮件报警API, 如wget -q -O - "http://api.abc.com/sendMail.php?type=abcdG&to=邮件地址&subject=【Storage同步延迟报警:$ip延迟$num秒,请检查~】&body=RT,请检查,谢谢" > /dev/null
# echo "$ip Update time delay $num (s)"
fi
done
rm -rf $Active $IP $Syn_time main.log
QQ交流群:24967504 有问题/建议能够联系我~web