elasticsearch 学习记录

时间 2019-11-19

原文原文链接

es基于lucene实现。https://www.elastic.co/guide/cn/elasticsearch/guide/current/index.htmlhtml

es文档元数据
一个文档不三个必须的元数据元素以下：
_index
文档在哪存放。一个索引应该是因共同的特性被分组到一块儿的文档集合。
_type
文档表示的对象类别。索引子分区。
_id
文档惟一标识。能够用户自定义或者自动生成.自动生成的 ID 是 URL-safe、基于 Base64 编码且长度为20个字符的 GUID 字符串。这些 GUID 字符串由可修改的 FlakeID 模式生成，这种模式容许多个节点并行生成惟一 ID ，且互相之间的冲突几率几乎为零。mysql

非必须的元素:web

_version，记录文档版本号。在 Elasticsearch 中每一个文档都有一个版本号。当每次对文档进行修改时（包括删除）， _version 的值会递增。sql

es查询结果标识json

took：查询话费时间；ubuntu

shards：查询过程参与的分片数。数组

timeout：是否超时，timeout时间能够设置：并发

GET /_search?timeout=10ms

hits:app

total:它包含 total 字段来表示匹配到的文档总数，而且一个 hits 数组包含所查询结果的前十个文档。curl

每一个结果还有一个 _score ，它衡量了文档与查询的匹配程度。默认状况下，首先返回最相关的文档结果，就是说，返回的文档是按照 _score 降序排列的。在这个例子中，咱们没有指定任何查询，故全部的文档具备相同的相关性，所以对全部的结果而言 1 是中性的 _score 。max_score 值是与查询所匹配文档的 _score 的最大值。

t@ubuntu:~$ curl -XGET 'localhost:9200/_search?pretty'
{
  "took" : 759,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "failed" : 0
  },
  "hits" : {
    "total" : 3,
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "website",
        "_type" : "blog",
        "_id" : "123",
        "_score" : 1.0,
        "_source" : {
          "title" : "My first blog entry",
          "text" : "I am starting to get the hang of this...",
          "date" : "2014/01/02"
        }
      },
      {
        "_index" : "website",
        "_type" : "blog",
        "_id" : "AV6ZDic0Gtb1Pf5XS4Nu",
        "_score" : 1.0,
        "_source" : {
          "title" : "My second blog entry",
          "text" : "Still trying this out...",
          "date" : "2014/01/01"
        }
      },
      {
        "_index" : "website",
        "_type" : "blog",
        "_id" : "1",
        "_score" : 1.0,
        "_source" : {
          "title" : "My first blog entry",
          "text" : "Starting to get the hang of this...",
          "views" : 2,
          "tags" : [
            "testing"
          ]
        }
      }
    ]
  }
}

es增删改查

1,保存文档

PUT /website/blog/123
{
  "title": "My first blog entry",
  "text":  "Just trying this out...",
  "date":  "2014/01/01"
}

2，查询文档

根据id查询

GET /website/blog/123?pretty

查询部分字段，仅查询title和text字段

GET /website/blog/123?_source=title,text

查询结果不须要元数据

GET /website/blog/123/_source

查询文档是否存在

curl -i -XHEAD http://localhost:9200/website/blog/123

存在：200

不存在:404

搜索返回指定字段

http://ip:9200/index/type/_search?_source=createTime

查询多个文档：

GET /_mget
{
   "docs" : [
      {
         "_index" : "website",
         "_type" :  "blog",
         "_id" :    2
      },
      {
         "_index" : "website",
         "_type" :  "pageviews",
         "_id" :    1,
         "_source": "views"
      }
   ]
}

多索引，多类型查询：

/_search
在全部的索引中搜索全部的类型
/gb/_search
在 gb 索引中搜索全部的类型
/gb,us/_search
在 gb 和 us 索引中搜索全部的文档
/g*,u*/_search
在任何以 g 或者 u 开头的索引中搜索全部的类型
/gb/user/_search
在 gb 索引中搜索 user 类型
/gb,us/user,tweet/_search
在 gb 和 us 索引中搜索 user 和 tweet 类型
/_all/user,tweet/_search
在全部的索引中搜索 user 和 tweet 类型

分页:

GET /_search?size=5&from=10

http://localhost:9200/ct_ws/type/_search?sort=createTime:desc&pretty&size=20000&from=0&_source=createTime,url

3，修改文档

根据id，再传一次文档就好,version值会自动递增

PUT /website/blog/123
{
  "title": "My first blog entry",
  "text":  "I am starting to get the hang of this...",
  "date":  "2014/01/02"
}

t@ubuntu:~$ curl -XPUT 'localhost:9200/website/blog/123?pretty' -H 'Content-Type: application/json' -d'
> {
>   "title": "My first blog entry",
>   "text":  "I am starting to get the hang of this...",
>   "date":  "2014/01/02"
> }
> '
{
  "_index" : "website",
  "_type" : "blog",
  "_id" : "123",
  "_version" : 2,
  "result" : "updated",
  "_shards" : {
    "total" : 2,
    "successful" : 1,
    "failed" : 0
  },
  "created" : false
}

在es内部，旧文档会被删除，新文档从新索引；

根据id新增字段

POST /website/blog/1/_update
{
   "doc" : {
      "tags" : [ "testing" ],
      "views": 0
   }
}

根据id修改字段，让views值增1

POST /website/blog/1/_update
{
   "script" : "ctx._source.views+=1"
}

eg:

t@ubuntu:~$ curl -XPOST 'localhost:9200/website/blog/1/_update?pretty' -H 'Content-Type: application/json' -d'
{
   "script" : "ctx._source.views+=1"
}
'
{
  "_index" : "website",
  "_type" : "blog",
  "_id" : "1",
  "_version" : 5,
  "result" : "updated",
  "_shards" : {
    "total" : 2,
    "successful" : 1,
    "failed" : 0
  }
}
t@ubuntu:~$ curl -XGET 'localhost:9200/website/blog/1?pretty'                                                 
{
  "_index" : "website",
  "_type" : "blog",
  "_id" : "1",
  "_version" : 5,
  "found" : true,
  "_source" : {
    "title" : "My first blog entry",
    "text" : "Starting to get the hang of this...",
    "views" : 2,
    "tags" : [
      "testing"
    ]
  }
}
t@ubuntu:~$

4，删除文档

删除文档不会当即将文档从磁盘中删除，只是将文档标记为已删除状态

根据id删除文档

DELETE /website/blog/123

批量操做

POST /_bulk
{ "delete": { "_index": "website", "_type": "blog", "_id": "123" }} 
{ "create": { "_index": "website", "_type": "blog", "_id": "123" }}
{ "title":    "My first blog post" }
{ "index":  { "_index": "website", "_type": "blog" }}
{ "title":    "My second blog post" }
{ "update": { "_index": "website", "_type": "blog", "_id": "123", "_retry_on_conflict" : 3} }
{ "doc" : {"title" : "My updated blog post"} }

清空索引数据,相似于mysql的drop talble操做

curl -XPOST 'http://ip:9600/megacorp/employee/_delete_by_query' -H 'Content-Type: application/json' -d'
{
     "query": {
        "match_all": {}
    }
}
'

并发控制

es经过version对并发读写进行控制:

对已经建立的文档，只有versin=1时才修改

PUT /website/blog/1?version=1 
{
  "title": "My first blog entry",
  "text":  "Starting to get the hang of this..."
}

若是version值不一致，会引起报错：

t@ubuntu:~$ 
t@ubuntu:~$ curl -XPUT 'localhost:9200/website/blog/1?version=1&pretty' -H 'Content-Type: application/json' -d'
> {
>   "title": "My first blog entry",
>   "text":  "Starting to get the hang of this..."
> }
> '
{
  "error" : {
    "root_cause" : [
      {
        "type" : "version_conflict_engine_exception",
        "reason" : "[blog][1]: version conflict, current version [2] is different than the one provided [1]",
        "index_uuid" : "aGPDbTmcTjKyhQ4fEJ6NEw",
        "shard" : "3",
        "index" : "website"
      }
    ],
    "type" : "version_conflict_engine_exception",
    "reason" : "[blog][1]: version conflict, current version [2] is different than the one provided [1]",
    "index_uuid" : "aGPDbTmcTjKyhQ4fEJ6NEw",
    "shard" : "3",
    "index" : "website"
  },
  "status" : 409
}
t@ubuntu:~$

使用外部版本号：

PUT /website/blog/2?version=5&version_type=external
{
  "title": "My first external blog entry",
  "text":  "Starting to get the hang of this..."
}