由于我是以 elk stock 结构为目标,因此我会以 elasticsearch + redis + logstash + kibana 为中心来写下面的内容。php
经过 《logstash 最佳实践》 学习html
管理配置文件主要用来发起任务的。输入(input)、处理(filter)、输出(output)。nginx
这个主要指定监听那些文件或输出。个人elk stock 架构中,只有文件和redis两个类型redis
input { # redis redis { host => "127.0.0.1" port => 6379 password => "123456" key => "logstash-queue" data_type => "list" db => 0 } # 文件 file { type => "nginx-access" path => "/usr/local/nginx/logs/access.log" start_position => beginning sincedb_path => "/var/log/logstash/sincedb/nginx" codec => multiline { pattern => "^\d+" negate => true what => "previous" } } }
note: input.file.codec 这个日志内容若是会出现多行,能够经过 ^d+ 进行分割,换行会被转成 nthinkphp
经常使用匹配方式 grok(正则匹配)json
logstash-7.4.0/vendor/bundle/jruby/2.5.0/gems/logstash-patterns-core-4.1.2/patterns 目录下面是预约义的正则匹配。使用方法如 %{IPORHOST:client}ruby
若是没有办法知足,能够本身写正则去匹配。验证正则是否正确能够经过 kibana 里的开发工具(Dev) > Grok调试器(Grok Debugger) 来验证。
也能够经过 http://grokdebug.herokuapp.com/ 验证。架构
filter { if [type] == "nginx-access" { grok { match => { "message" => "%{COMBINEDAPACHELOG}+%{GREEDYDATA:extra_fields}" } } } else if [type] == "nginx-error" { grok { match => ["message" , "(?<timestamp>%{YEAR}[./-]%{MONTHNUM}[./-]%{MONTHDAY}[- ]%{TIME}) \[%{LOGLEVEL:severity}\] %{POSINT:pid}#%{NUMBER}: %{GREEDYDATA:errormessage}(?:, client: (?<clientip>%{IP}|%{HOSTNAME}))(?:, server: %{IPORHOST:server}?)(?:, request: %{QS:request})?(?:, upstream: (?<upstream>\"%{URI}\"|%{QS}))?(?:, host: %{QS:request_host})?(?:, referrer: \"%{URI:referrer}\")?"] } } }
优化方案 app
直接传入日志能够省日志内容匹配部分资源占用。可是并非全部的软件日志都能配置。有些鸡肋。elasticsearch
output { redis { host => "127.0.0.1" port => 6379 password => "123456" key => "logstash-queue" data_type => "list" db => 4 } elasticsearch { hosts => ["http://localhost:9200"] index => "logstash-%{+YYYY.MM.dd}" } }
es 里支持全文索引,可是默认是支持英文的。不符合咱们的需求,咱们须要借用 ik 分词插件才能达到要求。
一、一条数据有不少行的处理办法
使用 input.codec 进行合并, 以 nginx 默认格式日志为例。
2019/09/23 10:39:01 [error] 4130#0: *1 FastCGI sent in stderr: "PHP message: PHP Warning: require(/var/www/study/tp5-study/public/../thinkphp/base.php): failed to open stream: No such file or directory in /var/www/study/tp5-study/public/index.php on line 16 PHP message: PHP Stack trace: PHP message: PHP 1. {main}() /var/www/study/tp5-study/public/index.php:0 PHP message: PHP Fatal error: require(): Failed opening required '/var/www/study/tp5-study/public/../thinkphp/base.php' (include_path='.:') in /var/www/study/tp5-study/public/index.php on line 16 PHP message: PHP Stack trace: PHP message: PHP 1. {main}() /var/www/study/tp5-study/public/index.php:0" while reading response header from upstream, client: 192.168.33.1, server: tp5.study.me, request: "GET /favicon.ico HTTP/1.1", upstream: "fastcgi://127.0.0.1:9000", host: "tp5.study.me", referrer: "http://tp5.study.me/" 2019/09/23 10:40:14 [error] 4130#0: *7 FastCGI sent in stderr: "PHP message: PHP Warning: require(/var/www/study/tp5-study/public/../thinkphp/base.php): failed to open stream: No such file or directory in /var/www/study/tp5-study/public/index.php on line 16 PHP message: PHP Stack trace: PHP message: PHP 1. {main}() /var/www/study/tp5-study/public/index.php:0 PHP message: PHP Fatal error: require(): Failed opening required '/var/www/study/tp5-study/public/../thinkphp/base.php' (include_path='.:') in /var/www/study/tp5-study/public/index.php on line 16 PHP message: PHP Stack trace: PHP message: PHP 1. {main}() /var/www/study/tp5-study/public/index.php:0" while reading response header from upstream, client: 192.168.33.1, server: tp5.study.me, request: "GET /favicon.ico HTTP/1.1", upstream: "fastcgi://127.0.0.1:9000", host: "tp5.study.me"
以上内容可知,第条日志的开头都是由日期组成的。因此咱们以数字开头的进行日志分割。便可
input { stdin { codec => multiline { pattern => "^\d+" negate => true what => "previous" } } }
二、日志默认会带着一个 message,这个message 是未匹配数据的日志。已经把内容提出来了,就没有必要存在原始数据。
filter { grok { match => ["message" , "(?<timestamp>%{YEAR}[./-]%{MONTHNUM}[./-]%{MONTHDAY}[- ]%{TIME}) \[%{LOGLEVEL:severity}\] %{POSINT:pid}#%{NUMBER}: %{GREEDYDATA:message}(?:, client: (?<clientip>%{IP}|%{HOSTNAME}))(?:, server: %{IPORHOST:server}?)(?:, request: %{QS:request})?(?:, upstream: (?<upstream>\"%{URI}\"|%{QS}))?(?:, host: %{QS:request_host})?(?:, referrer: \"%{URI:referrer}\")?"] overwrite => ["message"] } }
经过 overwrite 进行重写。overwrite必需在 filter.grok 里
三、日志抓取中都有 @timestamp,我但愿旧数据的时间写到这个时间里去
注:这个是 logstash 自带的东西,不推荐修改,因此用 timestamp 来代替,不一样的是这个是匹配获得的时间