围绕日志,挖掘其中更大价值,一直是咱们团队所关注。在原有日志实时查询基础上,今年SLS在DevOps领域完善了以下功能:html
今天咱们重点介绍下,日志只能聚类和异常告警如何配合,更好的进行异常发现和告警web
一份Sys Log的原始数据,,而且开启了日志聚类服务,具体的状态截图以下:session
经过调整下面截图中红色框1的大小,能够改变图中红色框2的结果,可是对于每一个最细粒度的pattern并不会改变,也就是说:子Pattern的结果是稳定且惟一的,咱们能够经过子Pattern的Signature找到对应的原始日志条目。机器学习
假设,咱们对这个子Pattern要进行监控:函数
msg:vm-111932.tc su: pam_unix(*:session): session closed for user root
对应的 signature_id : __log_signature__: 1814836459146662485
咱们获得了上述pattern对应的原始日志,能够看下具体的数量在时间轴上的直返图:学习
上图中,咱们能够发现,这个模式的日志分布不是很均衡,其中还有一些是没有的,若是直接按照时间窗口统计数量,获得的时序图以下:spa
__log_signature__: 1814836459146662485 | select date_trunc('minute', __time__) as time, COUNT(*) as num from log GROUP BY time order by time ASC limit 10000
上述图中咱们发现时间上并非连续的。所以,咱们须要对这条时序进行补点操做。
__log_signature__: 1814836459146662485 | select time_series(time, '1m', '%Y-%m-%d %H:%i:%s', '0') as time, avg(num) as num from ( select __time__ - __time__ % 60 as time, COUNT(*) as num from log GROUP BY time order by time desc ) GROUP by time order by time ASC limit 10000
使用时序异常检测函数: ts_predicate_arma3d
__log_signature__: 1814836459146662485 | select ts_predicate_arma(to_unixtime(time), num, 5, 1, 1, 1, 'avg') from ( select time_series(time, '1m', '%Y-%m-%d %H:%i:%s', '0') as time, avg(num) as num from ( select __time__ - __time__ % 60 as time, COUNT(*) as num from log GROUP BY time order by time desc ) GROUP by time order by time ASC ) limit 10000
__log_signature__: 1814836459146662485 | select t1[1] as unixtime, t1[2] as src, t1[3] as pred, t1[4] as up, t1[5] as lower, t1[6] as prob from ( select ts_predicate_arma(to_unixtime(time), num, 5, 1, 1, 1, 'avg') as res from ( select time_series(time, '1m', '%Y-%m-%d %H:%i:%s', '0') as time, avg(num) as num from ( select __time__ - __time__ % 60 as time, COUNT(*) as num from log GROUP BY time order by time desc ) GROUP by time order by time ASC )) , unnest(res) as t(t1)
__log_signature__: 1814836459146662485 | select unixtime, src, pred, up, lower, prob from ( select t1[1] as unixtime, t1[2] as src, t1[3] as pred, t1[4] as up, t1[5] as lower, t1[6] as prob from ( select ts_predicate_arma(to_unixtime(time), num, 5, 1, 1, 1, 'avg') as res from ( select time_series(time, '1m', '%Y-%m-%d %H:%i:%s', '0') as time, avg(num) as num from ( select __time__ - __time__ % 60 as time, COUNT(*) as num from log GROUP BY time order by time desc ) GROUP by time order by time ASC )) , unnest(res) as t(t1) ) where is_nan(src) = false order by unixtime desc limit 2
__log_signature__: 1814836459146662485 | select sum(prob) as sumProb, max(src) as srcMax, max(up) as upMax from ( select unixtime, src, pred, up, lower, prob from ( select t1[1] as unixtime, t1[2] as src, t1[3] as pred, t1[4] as up, t1[5] as lower, t1[6] as prob from ( select ts_predicate_arma(to_unixtime(time), num, 5, 1, 1, 1, 'avg') as res from ( select time_series(time, '1m', '%Y-%m-%d %H:%i:%s', '0') as time, avg(num) as num from ( select __time__ - __time__ % 60 as time, COUNT(*) as num from log GROUP BY time order by time desc ) GROUP by time order by time ASC )) , unnest(res) as t(t1) ) where is_nan(src) = false order by unixtime desc limit 2 )
具体的告警设置以下:unix
这里是日志服务的各类功能的演示 日志服务总体介绍,各类Demo日志
本文为云栖社区原创内容,未经容许不得转载。