基于统计方法对时序数据进行不一样指标(均值、方差、散度、峰度等)结果的判别,经过必定的人工经验设定阈值进行告警。同时能够引入时序历史数据利用环比、同比等策略,经过必定的人工经验设定阈值进行告警。
经过创建不一样的统计指标:窗口均值变化、窗口方差变化等能够较好的解决下图中(1,2,5)所对应的异常点检测;经过局部极值能够检测出图(4)对应的尖点信息;经过时序预测模型能够较好的找到图(3,6)对应的变化趋势,检测出不符合规律的异常点。
如何判别异常?html
PS:git
什么是无监督方法:是否有监督(supervised),主要看待建模的数据是否有标签(label)。若输入数据有标签,则为有监督学习;没标签则为无监督学习。
为什么须要引入无监督方法:在监控创建的初期,用户的反馈是很是稀少且珍贵的,在没有用户反馈的状况下,为了快速创建可靠的监控策略,所以引入无监督方法。
针对单维度指标
针对单维度指标github
iForest(IsolationForest)是基于集成的异常检测方法算法
几点说明网络
论文题目:《Unsupervised Anomaly Detection via Variational Auto-Encoder for Seasonal KPIs in Web Applications》(WWW 2018)架构
标注异常这件事儿,自己很复杂?机器学习
经常使用的有监督的机器学习方法学习
时序分析url
模式分析spa
海量文本智能聚类
具体的SQL逻辑以下:
* | select time, buffer_cnt, log_cnt, buffer_rate, failed_cnt, first_play_cnt, fail_rate from ( select date_trunc('minute', time) as time, sum(buffer_cnt) as buffer_cnt, sum(log_cnt) as log_cnt, case when is_nan(sum(buffer_cnt)*1.0 / sum(log_cnt)) then 0.0 else sum(buffer_cnt)*1.0 / sum(log_cnt) end as buffer_rate, sum(failed_cnt) as failed_cnt, sum(first_play_cnt) as first_play_cnt , case when is_nan(sum(failed_cnt)*1.0 / sum(first_play_cnt)) then 0.0 else sum(failed_cnt)*1.0 / sum(first_play_cnt) end as fail_rate from log group by time order by time ) limit 100000
具体的SQL逻辑以下:
* | select time, log_cnt_cmp[1] as log_cnt_now, log_cnt_cmp[2] as log_cnt_old, case when is_nan(buffer_rate_cmp[1]) then 0.0 else buffer_rate_cmp[1] end as buf_rate_now, case when is_nan(buffer_rate_cmp[2]) then 0.0 else buffer_rate_cmp[2] end as buf_rate_old, case when is_nan(fail_rate_cmp[1]) then 0.0 else fail_rate_cmp[1] end as fail_rate_now, case when is_nan(fail_rate_cmp[2]) then 0.0 else fail_rate_cmp[2] end as fail_rate_old from ( select time, ts_compare(log_cnt, 86400) as log_cnt_cmp, ts_compare(buffer_rate, 86400) as buffer_rate_cmp, ts_compare(fail_rate, 86400) as fail_rate_cmp from ( select date_trunc('minute', time - time % 120) as time, sum(buffer_cnt) as buffer_cnt, sum(log_cnt) as log_cnt, sum(buffer_cnt)*1.0 / sum(log_cnt) as buffer_rate, sum(failed_cnt) as failed_cnt, sum(first_play_cnt) as first_play_cnt , sum(failed_cnt)*1.0 / sum(first_play_cnt) as fail_rate from log group by time order by time) group by time) where time is not null limit 1000000
具体的SQL逻辑以下:
* | select time, case when is_nan(buffer_rate) then 0.0 else buffer_rate end as show_index, isp as index from (select date_trunc('minute', time) as time, sum(buffer_cnt)*1.0 / sum(log_cnt) as buffer_rate, sum(failed_cnt)*1.0 / sum(first_play_cnt) as fail_rate, sum(log_cnt) as log_cnt, sum(failed_cnt) as failed_cnt, sum(first_play_cnt) as first_play_cnt, isp from log group by time, isp order by time) limit 200000
* | select res.name from ( select ts_anomaly_filter(province, res[1], res[2], res[3], res[6], 100, 0) as res from ( select t1.province as province, array_transpose( ts_predicate_arma(t1.time, t1.show_index, 5, 1, 1) ) as res from ( select province, time, case when is_nan(buffer_rate) then 0.0 else buffer_rate end as show_index from ( select province, time, sum(buffer_cnt)*1.0 / sum(log_cnt) as buffer_rate, sum(failed_cnt)*1.0 / sum(first_play_cnt) as fail_rate, sum(log_cnt) as log_cnt, sum(failed_cnt) as failed_cnt, sum(first_play_cnt) as first_play_cnt from log group by province, time) ) t1 inner join ( select DISTINCT province from ( select province, time, sum(log_cnt) as total from log group by province, time ) where total > 200 ) t2 on t1.province = t2.province group by t1.province ) ) limit 100000
具体的SQL的语法分析逻辑能够参照以前的文章:SLS机器学习最佳实战:批量时序异常检测
本文为云栖社区原创内容,未经容许不得转载。