【Elasticsearch 2.x】issues

issues #1:同一个Index不一样type下同一个Field的映射冲突html

Kibana Sense:
POST /index-1/type-1
{
  "age":25
}

GET /index-1/_mapping
{
  "index-1": {
    "mappings": {
      "type-1": {
        "properties": {
          "age": {
            "type": "long"        => 在index-1/type-1下的age映射为long
          }
        }
      }
    }
  }
}

POST /index-1/type-2
{
  "age":"xx"
}

{
   "error": {
      "root_cause": [
         {
            "type": "mapper_parsing_exception",
            "reason": "failed to parse [age]"
         }
      ],
      "type": "mapper_parsing_exception",
      "reason": "failed to parse [age]",
      "caused_by": {
         "type": "number_format_exception",
         "reason": "For input string: \"xx\""
      }
   },
   "status": 400
}

issue #2:不一样index的任意type下的,同一个Field映射类型不一样,Kibana Mapping Conflictapache

POST /index-21/type-1
{
  "age":"xx",
  "post_date" : "2016-06-03T14:12:12"
}

{
  "index-21": {
    "mappings": {
      "type-1": {
        "properties": {
          "age": {
            "type": "string"       => index-21/type-1/age => 映射为string
          },
          "post_date": {
            "type": "date",
            "format": "strict_date_optional_time||epoch_millis"
          }
        }
      }
    }
  }
}

POST /index-21/type-2
{
  "age":25,
  "post_date" : "2016-06-03T14:12:12"
}

{
  "index-21": {
    "mappings": {
      "type-2": {
        "properties": {
          "age": {
            "type": "long"       => index-21/type-2/age => 映射为long
          },
          "post_date": {
            "type": "date",
            "format": "strict_date_optional_time||epoch_millis"
          }
        }
      }
    }
  }
}

Kibana --> settings 使用index-2* 做为index-pattern会出现Mapping Conflict

issue #2.1:实例 不一样index的任意type下的,同一个Field映射类型不一样 ,以及解决方案json

咱们针对bytes字段,出现冲突的解决方案

input{
	file{
		path => "/opt/logstash-data/input/logstash-tutorial.log"
	}
}

filter{
	grok{
		match => { "message" => "%{COMBINEDAPACHELOG}" }
	}
}

output{
	stdout{
		codec => rubydebug
	}

	elasticsearch{
		hosts => ["xxx.xxx.xxx.xxx"]
		index => "test-%{+YYYY.MM.dd.HH}"
	}
}

上述的logstash的配置文件是解决apache log日志,并写入es,是以时间戳创建索引

创建与es上的数据以下:
{
        "message" => "83.149.9.216 - - [04/Jan/2015:05:13:42 +0000] \"GET /presentations/logstash-monitorama-2013/images/kibana-search.png HTTP/1.1\" 200 203023 \"http://semicomplete.com/presentations/logstash-monitorama-2013/\" \"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/32.0.1700.77 Safari/537.36\"",
       "@version" => "1",
     "@timestamp" => "2016-06-06T02:28:27.152Z",
           "path" => "/opt/logstash-data/input/logstash-tutorial.log",
           "host" => "xxx.xxx.xxx.xxx",
       "clientip" => "83.149.9.216",
          "ident" => "-",
           "auth" => "-",
      "timestamp" => "04/Jan/2015:05:13:42 +0000",
           "verb" => "GET",
        "request" => "/presentations/logstash-monitorama-2013/images/kibana-search.png",
    "httpversion" => "1.1",
       "response" => "200",
          "bytes" => "1200",           ==> 能够bytes字段是string类型的 
       "referrer" => "\"http://semicomplete.com/presentations/logstash-monitorama-2013/\"",
          "agent" => "\"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/32.0.1700.77 Safari/537.36\""
}

假设想把bytes字段类型设置为long,修改logstash配置文件
input{
	file{
		path => "/opt/logstash-data/input/logstash-tutorial.log"
	}
}

filter{
	grok{
		match => { "message" => "%{COMBINEDAPACHELOG}" }
	}
        mutate{
		convert => {
			"bytes" => "integer"
		}
	}
}

output{
	stdout{
		codec => rubydebug
	}

	elasticsearch{
		hosts => ["xxx.xxx.xxx.xxx"]
		index => "test-%{+YYYY.MM.dd.HH}"
	}
}

{
        "message" => "83.149.9.216 - - [04/Jan/2015:05:13:42 +0000] \"GET /presentations/logstash-monitorama-2013/images/kibana-search.png HTTP/1.1\" 200 203023 \"http://semicomplete.com/presentations/logstash-monitorama-2013/\" \"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/32.0.1700.77 Safari/537.36\"",
       "@version" => "1",
     "@timestamp" => "2016-06-06T02:28:27.152Z",
           "path" => "/opt/logstash-data/input/logstash-tutorial.log",
           "host" => "xxx.xxx.xxx.xxx",
       "clientip" => "83.149.9.216",
          "ident" => "-",
           "auth" => "-",
      "timestamp" => "04/Jan/2015:05:13:42 +0000",
           "verb" => "GET",
        "request" => "/presentations/logstash-monitorama-2013/images/kibana-search.png",
    "httpversion" => "1.1",
       "response" => "200",
          "bytes" => 1200,           ==> 能够bytes字段是long类型的 
       "referrer" => "\"http://semicomplete.com/presentations/logstash-monitorama-2013/\"",
          "agent" => "\"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/32.0.1700.77 Safari/537.36\""
}

注意:
假设
第一条数据创建在test-2016.06.06.01
第二条数据创建在test-2016.06.06.01 =>那么虽然bytes字段是long型的,可是会转为string,不会冲突

若
第二条数据创建在test-2016.06.06.02 => 此时bytes字段会是long型的。index的mapping也会是long型的

此时kibana上,若是使用的index_pattern是 [test]-YYYY.MM.DD.HH 就会存在冲突

解决方案:
咱们能够把bytes定制为string,把其余类型的转化为long。或者反之把bytes定制为long,其余类型的转化为string。

假设咱们把bytes定制为long型的,那么logstash的配置文件便可不变。

咱们要把test-2016.06.06.01的bytes字段改成long型。
elasticsearch不支持直接修改已经存在的index,只能重建索引。又不能有数据丢失。

咱们先建立索引test-2016.06.06.01.bak并强制写mapping使得bytes为long型

PUT /test-2016.06.06.01.bak/
{
  "mappings": {
    "logs": {                ==> 默认的type是logs,若是定制,则换成对应的type
      "properties": {
        "bytes": {
          "type": "long"
        }
      }
    }
  }
}

使用stream2es把test-2016.06.06.01数据copy到test-2016.06.06.01.bak
stream2es es --source http://localhost:9200/test-2016.06.06.01 --target http://localhost:9200/test-2016.06.06.01.bak

检查数据量是否完整

删除test-2016.06.06.01
DELETE /test-2016.06.06.01

建立alias,test-2016.06.06.01 指向 test-2016.06.06.01.bak
POST /_aliases
{
  "actions": [
    {
      "add": {
        "index": "test-2016.06.06.01.bak",
        "alias": "test-2016.06.06.01"
      }
    }
  ]
}

issue #3: message内容是json,怎么解析出json对应的key-value 若是发送的信息体是json格式的。怎么解析?ruby

如发送的message是 {"k1":v1,"k2":"v2"}

[root@hfelkcld0002 conf.d]# cat json.conf
input{
	stdin{
	}
}
output{
	stdout{
		codec => rubydebug
	}
}

启动logstash,控制台输入 {"k1":1,"k2":"v2"}
{"k1":1,"k2":"v2"}
{
       "message" => "{\"k1\":1,\"k2\":\"v2\"}",        => json信息没法被解析出字段
      "@version" => "1",
    "@timestamp" => "2016-06-06T04:59:42.792Z",
          "host" => "xxx.xxx.xxx.xxx"
}

logstash的filter提供一个json的filter
https://www.elastic.co/guide/en/logstash/current/plugins-filters-json.html

[root@hfelkcld0002 conf.d]# cat json.conf
input{
	stdin{
	}
}
filter{
	json{
		source => "message"
		target => "metrics"
	}
}
output{
	stdout{
		codec => rubydebug
	}
}

{"k1":1,"k2":"v2"}
{
       "message" => "{\"k1\":1,\"k2\":\"v2\"}",
      "@version" => "1",
    "@timestamp" => "2016-06-06T05:35:58.605Z",
          "host" => "xxx.xxx.xxx.xxx",
       "metrics" => {
        "k1" => 1,
        "k2" => "v2"
    }
}

能够看到,其实数据存放了两份,在message和metrics字段上均存放。
能够把target还执行message,这样就覆盖了原有的message
[root@hfelkcld0002 conf.d]# cat json.conf
input{
	stdin{
	}
}
filter{
	json{
		source => "message"
		target => "message"
	}
}
output{
	stdout{
		codec => rubydebug
	}
}

{"k1":1,"k2":"v2"}
{
       "message" => {
        "k1" => 1,
        "k2" => "v2"
    },
      "@version" => "1",
    "@timestamp" => "2016-06-06T05:40:11.454Z",
          "host" => "xxx.xxx.xxx.xxx"
}
相关文章
相关标签/搜索