elasticsearch的空值处理

本文基于es7.1版本。node

针对空值的测试,使用了以下几种值:null、“null”、“”、[ ];数据库

测试代码太长,先说结论,对于全部类型,null、“”、[ ]都可以被索引,可是没法检索。对于部分数据类型,因为“null”不能转换为对应的类型,所以索引时会报错,可是对于keywork、text等能够索引string类型的字段,“null”被视做普通的string,可被索引与检索。不能够被直接检索的缘由,套用es权威指南中的一句原话: If a field has no values, how is it stored in an inverted index?现实是,空值字段在倒排索引中没有存储,it isn’t stored at all。json

须要注意的是,若是是基于es2.x版本,可以使用exists,或者missing来检索非null/null值。分别等同于关系数据库中的is not null 和is null。可是missing在7.1版本中已不可用。直接使用会报错:“no [query] registered for [missing]”。数组

在程序设计时,为了给null值设置默认值,可以使用null_value属性。相似于关系数据库中的default默认值,但又有不一样,这个请继续往下看第3点。可是须要注意的是,以下三点:app

1,在es中,只有显示设置null时,null_value才会生效,设置空数组如[ ],空字符串如""均不生效。
2,null_value默认值应该匹配数据类型。例如,date类型不能设置字符串默认值。
3,null_value仅可让字段以null_value值被倒排索引存储,以即可以让此文档被检索。并不会替换_source中的实际json文档值。测试

建立测试对象:ui

PUT ac_blog1
{
  "mappings": {
    "properties": {
      "title":{
        "type": "text"
      },
      "body":{
        "type": "text"
      },
      "author":{
        "type": "keyword"
      },
      "views":{
        "type": "long"
      }
    }
  }
}

录入数据:设计

POST ac_blog1/_doc
{
  "views":null
}
POST ac_blog1/_doc
{
  "views":[]
}
POST ac_blog1/_doc
{
  "views":""
}

测试一下,获取所有数据:code

GET ac_blog1/_search
{
  "query": {
    "match_all": {}
  },
  "size":100
}

响应:orm

{
  "took" : 355,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 3,
      "relation" : "eq"
    },
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "ac_blog1",
        "_type" : "_doc",
        "_id" : "HFBiSW0Bf1cVbYphJHEo",
        "_score" : 1.0,
        "_source" : {
          "views" : null
        }
      },
      {
        "_index" : "ac_blog1",
        "_type" : "_doc",
        "_id" : "HVBiSW0Bf1cVbYphPHEa",
        "_score" : 1.0,
        "_source" : {
          "views" : [ ]
        }
      },
      {
        "_index" : "ac_blog1",
        "_type" : "_doc",
        "_id" : "HlBiSW0Bf1cVbYphRXGX",
        "_score" : 1.0,
        "_source" : {
          "views" : ""
        }
      }
    ]
  }
}

可见文档数据都已被索引。下面来查一下:

测试null的状况:

GET ac_blog1/_search
{
  "query": {
    "term": {
      "views":null
    }
  }
}

响应:

{
  "error": {
    "root_cause": [
      {
        "type": "illegal_argument_exception",
        "reason": "field name is null or empty"
      }
    ],
    "type": "illegal_argument_exception",
    "reason": "field name is null or empty"
  },
  "status": 400
}

测试[ ]的状况:

GET ac_blog1/_search
{
  "query": {
    "term": {
      "views":[]
    }
  }
}

响应:

{
  "error": {
    "root_cause": [
      {
        "type": "parsing_exception",
        "reason": "[term] query does not support array of values",
        "line": 4,
        "col": 15
      }
    ],
    "type": "parsing_exception",
    "reason": "[term] query does not support array of values",
    "line": 4,
    "col": 15
  },
  "status": 400
}

测试""的状况:

GET ac_blog1/_search
{
  "query": {
    "term": {
      "views":""
    }
  }
}

响应:

{
  "error": {
    "root_cause": [
      {
        "type": "query_shard_exception",
        "reason": "failed to create query: {\n  \"term\" : {\n    \"views\" : {\n      \"value\" : \"\",\n      \"boost\" : 1.0\n    }\n  }\n}",
        "index_uuid": "f_2YYPS6RAaew5bXcQwlzQ",
        "index": "ac_blog1"
      }
    ],
    "type": "search_phase_execution_exception",
    "reason": "all shards failed",
    "phase": "query",
    "grouped": true,
    "failed_shards": [
      {
        "shard": 0,
        "index": "ac_blog1",
        "node": "oJRDxfVrQlGOJ9eqCGozDg",
        "reason": {
          "type": "query_shard_exception",
          "reason": "failed to create query: {\n  \"term\" : {\n    \"views\" : {\n      \"value\" : \"\",\n      \"boost\" : 1.0\n    }\n  }\n}",
          "index_uuid": "f_2YYPS6RAaew5bXcQwlzQ",
          "index": "ac_blog1",
          "caused_by": {
            "type": "number_format_exception",
            "reason": "empty String"
          }
        }
      }
    ]
  },
  "status": 400
}

由于views为null类型,没法测试“null”的状况,会报错null没法转换为long类型,这个显而易见是es作的处理,并非底层lucene的功能。换用keyword类型的author来测试:

POST ac_blog1/_doc
{
  "author":"null"
}
GET ac_blog1/_search
{
  "query": {
    "term": {
      "author":"null"
    }
  }
}

响应:

{
  "took" : 416,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 1,
      "relation" : "eq"
    },
    "max_score" : 0.2876821,
    "hits" : [
      {
        "_index" : "ac_blog1",
        "_type" : "_doc",
        "_id" : "H1BoSW0Bf1cVbYphtHF9",
        "_score" : 0.2876821,
        "_source" : {
          "author" : "null"
        }
      }
    ]
  }
}

以上。