核心详解

1、文档

在Elasticsearch中，文档以JSON格式进行存储，能够是复杂的结构如：node

{
    "_index": "haoke",
    "_type": "user",
    "_id": "1007",
    "_version": 1,
    "_score": 1,
    "_source": {
        "id": 1007,
        "name": "seven",
        "age": 20,
        "sex": "女",
        "card": {
            "card_number": "123456789"
        }
    }
}

其中，card是一个复杂对象，嵌套的Card对象。数据库

1. 元数据

节点	说明
_index	文档存储的地方
_type	文档表明的对象的类
_id	文档的惟一标识

_indexjson

索引(index)相似于关系型数据库里的“数据库”——它是咱们存储和索引关联数据的地方
提示：
事实上，咱们的数据被存储和索引在分片(shards)中，索引只是一个把一个或多个分片分组在一块儿的逻辑空
间。然而，这只是一些内部细节——咱们的程序彻底不用关心分片。对于咱们的程序而言，文档存储在索引
(index)中。剩下的细节由Elasticsearch关心既可
api

_type数组

在应用中，咱们使用对象表示一些“事物”，例如一个用户、一篇博客、一个评论，或者一封邮件。每一个对象都属于一个类(class)，这个类定义了属性或与对象关联的数据。 user 类的对象可能包含姓名、性别、年龄和Email地址。缓存

在关系型数据库中，咱们常常将相同类的对象存储在一个表里，由于它们有着相同的结构。同理，在Elasticsearch中，咱们使用相同类型(type)的文档表示相同的“事物”，由于他们的数据结构也是相同的。网络

每一个类型(type)都有本身的映射(mapping)或者结构定义，就像传统数据库表中的列同样。全部类型下的文档被存储在同一个索引下，可是类型的映射(mapping)会告诉Elasticsearch不一样的文档如何被索引。数据结构

_type 的名字能够是大写或小写，不能包含下划线或逗号。咱们将使用 blog 作为类型名app

_id分布式

id仅仅是一个字符串，它与 _index 和 _type 组合时，就能够在Elasticsearch中惟一标识一个文档。当建立一个文档，你能够自定义 _id ，也可让Elasticsearch帮你自动生成（32位长度）

二. 查询响应

1. pretty

能够在查询url后面添加pretty参数，使得返回的json更易查看

GET haoke/user/1007?pretty

#响应
{
    "_index": "haoke",
    "_type": "user",
    "_id": "1007",
    "_version": 1,
    "_seq_no": 8,
    "_primary_term": 2,
    "found": true,
    "_source": {
        "id": 1007,
        "name": "seven",
        "age": 20,
        "sex": "女",
        "card": {
            "card_number": "123456789"
        }
    }
}

2.指定响应字段

在响应的数据中，若是咱们不须要所有的字段，能够指定某些须要的字段进行返回

GET /haoke/user/1007?_source=id,name

#响应
{
    "_index": "haoke",
    "_type": "user",
    "_id": "1007",
    "_version": 1,
    "_seq_no": 8,
    "_primary_term": 2,
    "found": true,
    "_source": {
        "name": "seven",
        "id": 1007
    }
}

如不须要返回元数据，仅仅返回原始数据，能够这样：

GET /haoke/user/1007/_source

#响应

{
    "id": 1007,
    "name": "seven",
    "age": 20,
    "sex": "女",
    "card": {
        "card_number": "123456789"
    }
}

原始数据+指定字段

GET /haoke/user/1007/_source?_source=id,name
#响应
{
    "name": "seven",
    "id": 1007
}

3. 判断文档是否存在

若是咱们只须要判断文档是否存在，而不是查询文档内容，那么能够这样：
文件存在时：

HEAD /haoke/user/1007

# 文件存在 响应为空 
Status:200

文件不存在时：

HEAD /haoke/user/1009

# 文件不存在 
Status 404 NotFound

固然，这只表示你在查询的那一刻文档不存在，但并不表示几毫秒后依旧不存在。另外一个进程在这期间可能创
建新文档。

4.批量操做

4.1. 批量查询

POST  /haoke/user/_mget
#参数
{
    "ids":["1001","1002"]
}


#响应

{
    "docs": [
        {
            "_index": "haoke",
            "_type": "user",
            "_id": "1001",
            "_version": 3,
            "_seq_no": 3,
            "_primary_term": 2,
            "found": true,
            "_source": {
                "id": 1001,
                "name": "张三",
                "age": 23,
                "sex": "女"
            }
        },
        {
            "_index": "haoke",
            "_type": "user",
            "_id": "1002",
            "_version": 1,
            "_seq_no": 0,
            "_primary_term": 1,
            "found": true,
            "_source": {
                "id": 1002,
                "name": "张三",
                "age": 20,
                "sex": "男"
            }
        }
    ]
}

若是，某一条数据不存在，不影响总体响应，须要经过found的值进行判断是否查询到数据。

found:false 标识数据不存在

POST /haoke/user/_mget
# 参数

{
    "ids":["1001","1006"]
}
# 响应
{
    "docs": [
        {
            "_index": "haoke",
            "_type": "user",
            "_id": "1001",
            "_version": 3,
            "_seq_no": 3,
            "_primary_term": 2,
            "found": true,
            "_source": {
                "id": 1001,
                "name": "张三",
                "age": 23,
                "sex": "女"
            }
        },
        {
            "_index": "haoke",
            "_type": "user",
            "_id": "1006",
            "found": false
        }
    ]
}

4.2 _bulk操做

在Elasticsearch中，支持批量的插入、修改、删除操做，都是经过_bulk的api完成的。
请求格式：

{ action: { metadata }}\n
{ request body }\n
{ action: { metadata }}\n
{ request body }\n
...

4.2.1 批量添加

示例

注意传参最后一行必定回车

POST /haoke/user/_bulk

#参数：

{"create":{"_index":"haoke","_type":"user","_id":2001}}
{"id":2001,"name":"name1","age": 20,"sex": "男"}
{"create":{"_index":"haoke","_type":"user","_id":2002}}
{"id":2002,"name":"name2","age": 20,"sex": "男"}
{"create":{"_index":"haoke","_type":"user","_id":2003}}
{"id":2003,"name":"name3","age": 20,"sex": "男"}

#响应

{
    "took": 11,
    "errors": false,
    "items": [
        {
            "create": {
                "_index": "haoke",
                "_type": "user",
                "_id": "2001",
                "_version": 1,
                "result": "created",
                "_shards": {
                    "total": 1,
                    "successful": 1,
                    "failed": 0
                },
                "_seq_no": 9,
                "_primary_term": 2,
                "status": 201
            }
        },
        {
            "create": {
                "_index": "haoke",
                "_type": "user",
                "_id": "2002",
                "_version": 1,
                "result": "created",
                "_shards": {
                    "total": 1,
                    "successful": 1,
                    "failed": 0
                },
                "_seq_no": 3,
                "_primary_term": 2,
                "status": 201
            }
        },
        {
            "create": {
                "_index": "haoke",
                "_type": "user",
                "_id": "2003",
                "_version": 1,
                "result": "created",
                "_shards": {
                    "total": 1,
                    "successful": 1,
                    "failed": 0
                },
                "_seq_no": 4,
                "_primary_term": 2,
                "status": 201
            }
        }
    ]
}

4.2.2 批量删除

POST /haoke/user/_bulk

#参数
{"delete":{"_index":"haoke","_type":"user","_id":2001}}
{"delete":{"_index":"haoke","_type":"user","_id":2002}}
{"delete":{"_index":"haoke","_type":"user","_id":2003}}

#响应
{
    "took": 11,
    "errors": false,
    "items": [
        {
            "delete": {
                "_index": "haoke",
                "_type": "user",
                "_id": "2001",
                "_version": 2,
                "result": "deleted",
                "_shards": {
                    "total": 1,
                    "successful": 1,
                    "failed": 0
                },
                "_seq_no": 10,
                "_primary_term": 2,
                "status": 200
            }
        },
        {
            "delete": {
                "_index": "haoke",
                "_type": "user",
                "_id": "2002",
                "_version": 2,
                "result": "deleted",
                "_shards": {
                    "total": 1,
                    "successful": 1,
                    "failed": 0
                },
                "_seq_no": 5,
                "_primary_term": 2,
                "status": 200
            }
        },
        {
            "delete": {
                "_index": "haoke",
                "_type": "user",
                "_id": "2003",
                "_version": 2,
                "result": "deleted",
                "_shards": {
                    "total": 1,
                    "successful": 1,
                    "failed": 0
                },
                "_seq_no": 6,
                "_primary_term": 2,
                "status": 200
            }
        }
    ]
}

其余操做就相似了。
一次请求多少性能最高？

整个批量请求须要被加载到接受咱们请求节点的内存里，因此请求越大，给其它请求可用的内存就越小。有一个最佳的bulk请求大小。超过这个大小，性能再也不提高并且可能下降。
最佳大小，固然并非一个固定的数字。它彻底取决于你的硬件、你文档的大小和复杂度以及索引和搜索的负载。
幸运的是，这个最佳点(sweetspot)仍是容易找到的：试着批量索引标准的文档，随着大小的增加，当性能开始下降，说明你每一个批次的大小太大了。开始的数量能够在1000~5000个文档之间，若是你的文档很是大，可使用较小的批次。
一般着眼于你请求批次的物理大小是很是有用的。一千个1kB的文档和一千个1MB的文档大不相同。一个好的批次最好保持在5-15MB大小间。

5.分页

和SQL使用 LIMIT 关键字返回只有一页的结果同样，Elasticsearch接受 from 和 size 参数：

size: 结果数，默认10
from: 跳过开始的结果数，默认0

示例：

GET /_search?size=1&from=1

#响应
{
    "took": 2,
    "timed_out": false,
    "_shards": {
        "total": 2,
        "successful": 2,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": {
            "value": 4,
            "relation": "eq"
        },
        "max_score": 1.0,
        "hits": [
            {
                "_index": "haoke",
                "_type": "user",
                "_id": "2003",
                "_score": 1.0,
                "_source": {
                    "id": 2003,
                    "name": "name3",
                    "age": 20,
                    "sex": "男"
                }
            }
        ]
    }
}

在集群系统中深度分页
为了理解为何深度分页是有问题的，让咱们假设在一个有5个主分片的索引中搜索。当咱们请求结果的第一页（结果1到10）时，每一个分片产生本身最顶端10个结果真后返回它们给请求节点(requesting node)，它再排序这全部的50个结果以选出顶端的10个结果。

如今假设咱们请求第1000页——结果10001到10010。工做方式都相同，不一样的是每一个分片都必须产生顶端的10010个结果。而后请求节点排序这50050个结果并丢弃50040个！

你能够看到在分布式系统中，排序结果的花费随着分页的深刻而成倍增加。这也是为何网络搜索引擎中任何
语句不能返回多于1000个结果的缘由。

6.映射

前面咱们建立的索引以及插入数据，都是由Elasticsearch进行自动判断类型，有些时候咱们是须要进行明确字段类型的，不然，自动判断的类型和实际需求是不相符的。
自动判断的规则以下：

string类型在ElasticSearch 旧版本中使用较多，从ElasticSearch 5.x开始再也不支持string，由text和keyword类型替代。
text 类型，当一个字段是要被全文搜索的，好比Email内容、产品描述，应该使用text类型。设置text类型之后，字段内容会被分析，在生成倒排索引之前，字符串会被分析器分红一个一个词项。text类型的字段不用于排序，不多用于聚合。
keyword类型适用于索引结构化的字段，好比email地址、主机名、状态码和标签。若是字段须要进行过滤(好比查找已发布博客中status属性为published的文章)、排序、聚合。keyword类型的字段只能经过精确值搜索到

6.1 建立明确类型的索引

PUT /ela
#参数
{
    "settings": {
        "index": {
            "number_of_shards": "1"
        }
    },
    "mappings": {
            "properties": {
                "name": { "type": "text"},
                "age": {"type": "integer" },
                "mail": {"type": "keyword" },
                "hobby": {  "type": "text"}
            }
    }
}
#响应
{
    "acknowledged": true,
    "shards_acknowledged": true,
    "index": "ela"
}

6.2 查寻已经建立的映射：

GET  /ela/_mapping

#响应
{
    "ela": {
        "mappings": {
            "properties": {
                "age": {
                    "type": "integer"
                },
                "hobby": {
                    "type": "text"
                },
                "mail": {
                    "type": "keyword"
                },
                "name": {
                    "type": "text"
                }
            }
        }
    }
}

6.3 插入数据

POST /ela/_bulk

#参数

{"index":{"_index":"ela"}}
{"name":"张三","age": 20,"mail": "111@qq.com","hobby":"羽毛球、乒乓球、足球"}
{"index":{"_index":"ela"}}
{"name":"李四","age": 21,"mail": "222@qq.com","hobby":"羽毛球、乒乓球、足球、篮球"}
{"index":{"_index":"ela"}}
{"name":"王五","age": 22,"mail": "333@qq.com","hobby":"羽毛球、篮球、游泳、听音乐"}
{"index":{"_index":"ela"}}
{"name":"赵六","age": 23,"mail": "444@qq.com","hobby":"跑步、游泳"}
{"index":{"_index":"ela"}}
{"name":"孙七","age": 24,"mail": "555@qq.com","hobby":"听音乐、看电影"}


#响应
{
    "took": 12,
    "errors": false,
    "items": [
        {
            "index": {
                "_index": "ela",
                "_type": "_doc",
                "_id": "1r5njnUBAdKB-kbRFcG6",
                "_version": 1,
                "result": "created",
                "_shards": {
                    "total": 2,
                    "successful": 1,
                    "failed": 0
                },
                "_seq_no": 0,
                "_primary_term": 1,
                "status": 201
            }
        },
        {
            "index": {
                "_index": "ela",
                "_type": "_doc",
                "_id": "175njnUBAdKB-kbRFcG6",
                "_version": 1,
                "result": "created",
                "_shards": {
                    "total": 2,
                    "successful": 1,
                    "failed": 0
                },
                "_seq_no": 1,
                "_primary_term": 1,
                "status": 201
            }
        },
        {
            "index": {
                "_index": "ela",
                "_type": "_doc",
                "_id": "2L5njnUBAdKB-kbRFcG6",
                "_version": 1,
                "result": "created",
                "_shards": {
                    "total": 2,
                    "successful": 1,
                    "failed": 0
                },
                "_seq_no": 2,
                "_primary_term": 1,
                "status": 201
            }
        },
        {
            "index": {
                "_index": "ela",
                "_type": "_doc",
                "_id": "2b5njnUBAdKB-kbRFcG6",
                "_version": 1,
                "result": "created",
                "_shards": {
                    "total": 2,
                    "successful": 1,
                    "failed": 0
                },
                "_seq_no": 3,
                "_primary_term": 1,
                "status": 201
            }
        },
        {
            "index": {
                "_index": "ela",
                "_type": "_doc",
                "_id": "2r5njnUBAdKB-kbRFcG6",
                "_version": 1,
                "result": "created",
                "_shards": {
                    "total": 2,
                    "successful": 1,
                    "failed": 0
                },
                "_seq_no": 4,
                "_primary_term": 1,
                "status": 201
            }
        }
    ]
}

6.4 测试搜索

POST /ela/_search

#参数
{
    "query": {
        "match": {
            "hobby": "音乐"
        }
    }
}

#响应
{
    "took": 18,
    "timed_out": false,
    "_shards": {
        "total": 1,
        "successful": 1,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": {
            "value": 2,
            "relation": "eq"
        },
        "max_score": 1.9159472,
        "hits": [
            {
                "_index": "ela",
                "_type": "_doc",
                "_id": "2r5njnUBAdKB-kbRFcG6",
                "_score": 1.9159472,
                "_source": {
                    "name": "孙七",
                    "age": 24,
                    "mail": "555@qq.com",
                    "hobby": "听音乐、看电影"
                }
            },
            {
                "_index": "ela",
                "_type": "_doc",
                "_id": "2L5njnUBAdKB-kbRFcG6",
                "_score": 1.5506182,
                "_source": {
                    "name": "王五",
                    "age": 22,
                    "mail": "333@qq.com",
                    "hobby": "羽毛球、篮球、游泳、听音乐"
                }
            }
        ]
    }
}

7. 结构化查询

7.1 term查询

term 主要用于精确匹配哪些值，好比数字，日期，布尔值或 not_analyzed 的字符串(未经分析的文本数据类型)：

POST /ela/_search

#参数
{
    "query": {
        "term": {
            "age":20
        }
    }
}
#响应
{
    "took": 1,
    "timed_out": false,
    "_shards": {
        "total": 1,
        "successful": 1,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": {
            "value": 1,
            "relation": "eq"
        },
        "max_score": 1.0,
        "hits": [
            {
                "_index": "ela",
                "_type": "_doc",
                "_id": "1r5njnUBAdKB-kbRFcG6",
                "_score": 1.0,
                "_source": {
                    "name": "张三",
                    "age": 20,
                    "mail": "111@qq.com",
                    "hobby": "羽毛球、乒乓球、足球"
                }
            }
        ]
    }
}

7.2 terms查询

terms 跟 term 有点相似，但 terms 容许指定多个匹配条件。若是某个字段指定了多个值，那么文档须要一块儿去作匹配

POST /ela/_search
{
    "query": {
        "terms": {
            "age":[20,21]
        }
    }
}
#响应
"hits": [
            {
                "_index": "ela",
                "_type": "_doc",
                "_id": "1r5njnUBAdKB-kbRFcG6",
                "_score": 1.0,
                "_source": {
                    "name": "张三",
                    "age": 20,
                    "mail": "111@qq.com",
                    "hobby": "羽毛球、乒乓球、足球"
                }
            },
            {
                "_index": "ela",
                "_type": "_doc",
                "_id": "175njnUBAdKB-kbRFcG6",
                "_score": 1.0,
                "_source": {
                    "name": "李四",
                    "age": 21,
                    "mail": "222@qq.com",
                    "hobby": "羽毛球、乒乓球、足球、篮球"
                }
            }
]

7.3 range查询

range 过滤容许咱们按照指定范围查找一批数据：

#POST /ela/_search

#
{
    "query": {
        "range": {
            "age":{
                "gte":20,
                "lt":22
            }
        }
    }
}
#
"hits": [
            {
                "_index": "ela",
                "_type": "_doc",
                "_id": "1r5njnUBAdKB-kbRFcG6",
                "_score": 1.0,
                "_source": {
                    "name": "张三",
                    "age": 20,
                    "mail": "111@qq.com",
                    "hobby": "羽毛球、乒乓球、足球"
                }
            },
            {
                "_index": "ela",
                "_type": "_doc",
                "_id": "175njnUBAdKB-kbRFcG6",
                "_score": 1.0,
                "_source": {
                    "name": "李四",
                    "age": 21,
                    "mail": "222@qq.com",
                    "hobby": "羽毛球、乒乓球、足球、篮球"
                }
            }
        ]

范围操做符包含：
gt :: 大于
gte :: 大于等于
lt :: 小于
lte :: 小于等于

7.4 exists查询

exists 查询能够用于查找文档中是否包含指定字段或没有某个字段，相似于SQL语句中的 IS_NULL 条件

#POST /ela/_search
#
{
    "query": {
        "exists": {
            "field": "age"    
        }
    }
}
#

参考：https://blog.csdn.net/qq_29202513/article/details/103710554

7.5. match查询

match 查询是一个标准查询，无论你须要全文本查询仍是精确查询基本上都要用到它

#POST /ela/_search
{
    "query": {
        "match": {
            "hobby": "羽毛球"
        }
    }
}

若是你使用 match 查询一个全文本字段，它会在真正查询以前用分析器先分析 match 一下查询字符：

{ "match": { "age": 26 }}
{ "match": { "date": "2014-09-01" }}
{ "match": { "public": true }}
{ "match": { "tag": "full_text" }}

7.6 bool 查询

bool 查询能够用来合并多个条件查询结果的布尔逻辑，它包含一下操做符：

must :: 多个查询条件的彻底匹配,至关于 and 。
must_not :: 多个查询条件的相反匹配，至关于 not 。
should :: 至少有一个查询条件匹配, 至关于 or 。

这些参数能够分别继承一个查询条件或者一个查询条件的数组：

{
    "bool": {
        "must": { "term": { "folder": "inbox" }},
        "must_not": { "term": { "tag": "spam" }},
        "should": [
            { "term": { "starred": true }},
            { "term": { "unread": true }}
        ]
    }
}

8. 过滤查询

Elasticsearch也支持过滤查询，如term、range、match等。
示例：查询年龄为20岁的用户。

POST /ela/_search
#
{
    "query": {
        "bool": {
            "filter": {
                "term": {
                    "age": 20
                }
            }
        }
    }
}
#        
"hits": [
            {
                "_index": "ela",
                "_type": "_doc",
                "_id": "1r5njnUBAdKB-kbRFcG6",
                "_score": 0.0,
                "_source": {
                    "name": "张三",
                    "age": 20,
                    "mail": "111@qq.com",
                    "hobby": "羽毛球、乒乓球、足球"
                }
            }
        ]

查询和过滤的对比

一条过滤语句会询问每一个文档的字段值是否包含着特定值。
查询语句会询问每一个文档的字段值与特定值的匹配程度如何。
一条查询语句会计算每一个文档与查询语句的相关性，会给出一个相关性评分 _score，而且按照相关性对匹配到的文档进行排序。这种评分方式很是适用于一个没有彻底配置结果的全文本搜索。
一个简单的文档列表，快速匹配运算并存入内存是十分方便的，每一个文档仅须要1个字节。这些缓存的过滤结果集与后续请求的结合使用是很是高效的。
查询语句不只要查找相匹配的文档，还须要计算每一个文档的相关性，因此通常来讲查询语句要比过滤语句更耗时，而且查询结果也不可缓存

建议：作精确匹配搜索时，最好用过滤语句，由于过滤语句能够缓存数据

ElasticSearch 核心详解