Index索引java
Node节点node
文档会被序列化成JSON格式,保存在Elasticsearch中。正则表达式
每一个文档都有一个Unique ID数据库
JSON文档格式灵活,不须要预先定义格式。express
{ "_index" : "movies", "_type" : "_doc", "_id" : "8609", "_score" : 1.0, "_source" : { "year" : 1923, "title" : "Our Hospitality", "@version" : "1", "id" : "8609", "genre" : [ "Comedy" ] } }
_index
:文档所属的索引名_type
:文档所属的类型名_id
: 文档惟一id_score
:文档相关性打分_source
:文档的原始JSON数据_version
:文档的版本信息索引是文档的容器,是一类文档的集合。json
Index
:体现了逻辑空间
的概念。每一个索引都有本身的Mapping定义,用于定义包含的文档的字段名和字段类型。Shard
:体现了物理空间
的概念,索引中的数据分布在 Shard
上。索引的 Mapping
和 Setting
数组
Mapping
:定义文档字段的类型。Setting
:定义不一样的数据分布。动词:保存一个文档到Elasticsearh的过程也叫索引(indexing)app
7.0以前
,一个Index能够设置多个Types7.0开始
,一个Index只能建立一个Type:_doc
elasticsearch
6.0开始
,Type被Deprated。GET /_cat/indices?v
: 查看索引GET /_cat/indices?v&health=green
:查看状态为绿的索引GET /_cat/indices?v&s=docs.count:desc
:按照文档个数对索引进行排序高可用性分布式
可扩展性
elasticsearch
-E cluster.name=demo
进行设置节点是一个Elasticsearch实例
-E node.name=node1
指定。data
目录下每一个节点启动后,默认就是一个Master-eligible节点。
node.master:false
禁止每一个节点上都保存了集群的状态,只有Master节点才能修改集群的状态信息
集群状态,为了一个集群中必要的信息
Mapping
、Setting
信息Data Node
Coordinating Node
默认
都起到了Coordinating Node的做用。生产环境,应该设置单一角色的节点。
节点类型 | 配置参数 | 默认值 |
---|---|---|
master eligible | node.master | true |
data | node.data | true |
ingest | node.ingest | true |
coordinating only | / | 每一个节点默认都是Coordinating Node |
machine learning | node.ml | true, 须要enable x-pack |
主分片(Primary Shard):
副本分片 (Replica Shard):
生产环境中分片的设置,须要提早作好容量规划。
分片数设置太小
分片数设置过大
GET /_cluster/health { "cluster_name" : "learn_es", "status" : "green", "timed_out" : false, "number_of_nodes" : 3, "number_of_data_nodes" : 3, "active_primary_shards" : 7, "active_shards" : 14, "relocating_shards" : 0, "initializing_shards" : 0, "unassigned_shards" : 0, "delayed_unassigned_shards" : 0, "number_of_pending_tasks" : 0, "number_of_in_flight_fetch" : 0, "task_max_waiting_in_queue_millis" : 0, "active_shards_percent_as_number" : 100.0 }
green
: 主分片与副本分片都正常分配。yellow
: 主分片正常分配,有副本分片未能正常分配。red
: 有主分片未能分配。四种基本操做:
Index
:添加文档
POST <index>/_doc
: 添加的文档id为系统自动生成。PUT <index>/_doc/<_id>
:若是该id的文档不存在则添加,存在则更新同时增长版本号(version
字段)。POST <index>/_create/<_id>
:若是该id的文档已存在,则报错。PUT <index>/_create/<_id>
:若是该id的文档已存在,则报错。Get
:读取文档
GET <index>/_doc/<_id>
:获取该id文档的元信息GET <index>/_source/<_id>
:获取该id文档元信息中的 _source
字段HEAD <index>/_doc/<_id>
:判断该id文档是否存在,存在返回200,不存在返回404HEAD <index>/_source/<_id>
:判断该id文档中的_source
字段是否存在,存在返回200,不存在返回404Update
:更新文档
POST <index>/_update/<_id>
:更新部分文档,body体中使用doc
字段。Delete
:删除文档
DELETE /<index>/_doc/<_id>
:删除该id的文档,若是文档不存在 什么都不作自动生成
文档id和 指定
文档id。自动生成文档id。
POST <index>/_doc
demo:
POST users/_doc { "user" : "Mike", "phone" : "15512345678" } ----------- { "_index" : "users", "_type" : "_doc", "_id" : "RfXT_28B5V-KMglJX8bm", "_version" : 1, "result" : "created", "_shards" : { "total" : 2, "successful" : 2, "failed" : 0 }, "_seq_no" : 3, "_primary_term" : 1 }
指定文档id
PUT <index>/_doc/<_id>
或者 POST | PUT <index>/_create/<_id>
demo 1:PUT <index>/_doc/<_id>
PUT users/_doc/1 { "user" : "John", "phone" : "15812345678" } -------- # 不存在该id的文档时,直接新增 { "_index" : "users", "_type" : "_doc", "_id" : "1", "_version" : 1, "result" : "created", "_shards" : { "total" : 2, "successful" : 2, "failed" : 0 }, "_seq_no" : 4, "_primary_term" : 1 } # 存在该id的文档时,替换文档(删除现有的,建立新的,version +1) { "_index" : "users", "_type" : "_doc", "_id" : "1", "_version" : 23, "result" : "updated", "_shards" : { "total" : 2, "successful" : 2, "failed" : 0 }, "_seq_no" : 26, "_primary_term" : 1 }
demo 2: POST | PUT <index>/_create/<_id>
POST users/_create/2 { "user" : "Dave", "phone" : "15912345678" } --------- # 不存在该id的文档时,直接新增 { "_index" : "users", "_type" : "_doc", "_id" : "2", "_version" : 1, "result" : "created", "_shards" : { "total" : 2, "successful" : 2, "failed" : 0 }, "_seq_no" : 27, "_primary_term" : 1 } # 存在该id的文档时,version冲突,报错。 { "error": { "root_cause": [ { "type": "version_conflict_engine_exception", "reason": "[2]: version conflict, document already exists (current version [1])", "index_uuid": "mjgjxIROT72xLMHnYNiUxw", "shard": "0", "index": "users" } ], "type": "version_conflict_engine_exception", "reason": "[2]: version conflict, document already exists (current version [1])", "index_uuid": "mjgjxIROT72xLMHnYNiUxw", "shard": "0", "index": "users" }, "status": 409 }
根据id查找文档
GET <index>/_doc/<_id>
demo:
GET users/_doc/2 -------- # 该id的文档存在,返回文档元信息 { "_index" : "users", "_type" : "_doc", "_id" : "2", "_version" : 1, "_seq_no" : 27, "_primary_term" : 1, "found" : true, "_source" : { "user" : "Dave", "phone" : "15912345678" } } # 该id的文档不存在,返回找不到 { "_index" : "users", "_type" : "_doc", "_id" : "2", "found" : false }
更新指定id的文档:
POST <index>/_update/<_id>
demo:更新部分文档
POST users/_update/1 { "doc": { "age":28 } } -------- # 该id的文档存在,且字段值有变更 则更新文档;若是文档存在,且字段值无变更,result为noop { "_index" : "users", "_type" : "_doc", "_id" : "1", "_version" : 27, "result" : "updated", "_shards" : { "total" : 2, "successful" : 2, "failed" : 0 }, "_seq_no" : 34, "_primary_term" : 1 }
demo2:按照脚本更新文档
# index the doc PUT users/_doc/2 { "name" : "John", "counter" : 1 } { "_index" : "users", "_type" : "_doc", "_id" : "2", "_version" : 6, "result" : "updated", "_shards" : { "total" : 2, "successful" : 2, "failed" : 0 }, "_seq_no" : 53, "_primary_term" : 2 } -------- # update the doc POST users/_update/2 { "script": { "source": "ctx._source.counter += params.count", "params": { "count":2 } } } { "_index" : "users", "_type" : "_doc", "_id" : "2", "_version" : 7, "result" : "updated", "_shards" : { "total" : 2, "successful" : 2, "failed" : 0 }, "_seq_no" : 54, "_primary_term" : 2 }
根据id删除文档
Delete <index>/_doc/<_id>
demo:
DELETE users/_doc/2 -------- # 该id的文档存在,直接删除 { "_index" : "users", "_type" : "_doc", "_id" : "2", "_version" : 2, "result" : "deleted", "_shards" : { "total" : 2, "successful" : 2, "failed" : 0 }, "_seq_no" : 31, "_primary_term" : 1 } # 该id的文档不存在,什么都不作 { "_index" : "users", "_type" : "_doc", "_id" : "2", "_version" : 3, "result" : "not_found", "_shards" : { "total" : 2, "successful" : 2, "failed" : 0 }, "_seq_no" : 32, "_primary_term" : 1 }
支持四种类型操做。
Index
Create
Update
Delete
语法;POST _bulk
或者 POST <index>/_bulk
`
newline delimited JSON (NDJSON)结构
action_and_meta_data\n optional_source\n action_and_meta\_data\n optional_source\n .... action_and_meta_data\n optional_source\n
demo:
POST _bulk # index、create:下一行须要跟着source { "index" : { "_index" : "test", "_id" : "1" } } { "field1" : "value1" } { "create" : { "_index" : "test", "_id" : "2" } } { "field2" : "value2" } # update下一行须要跟着doc或者script { "update" : {"_id" : "1", "_index" : "test"} } { "doc" : {"field3" : "value3"} } # delete与标准delete API语法同样 { "delete" : { "_index" : "test", "_id" : "2" } }
GET _mget
或者 GET <index>/_mget
demo:
GET /_mget { "docs" : [ { "_index" : "users", "_id" : "1" }, { "_index" : "twitter", "_id" : "2" } ] } -------- { "docs" : [ { "_index" : "users", "_type" : "_doc", "_id" : "1", "_version" : 31, "_seq_no" : 38, "_primary_term" : 2, "found" : true, "_source" : { "user" : "abc", "class" : 8, "age" : 28, "gender" : "male", "field1" : "value1" } }, { "_index" : "twitter", "_type" : null, "_id" : "2", "error" : { "root_cause" : [ { "type" : "index_not_found_exception", "reason" : "no such index [twitter]", "resource.type" : "index_expression", "resource.id" : "twitter", "index_uuid" : "_na_", "index" : "twitter" } ], "type" : "index_not_found_exception", "reason" : "no such index [twitter]", "resource.type" : "index_expression", "resource.id" : "twitter", "index_uuid" : "_na_", "index" : "twitter" } } ] }
demo2:
GET users/_mget { "docs": [ { "_id" : "2" }, { "_id" : "3" } ] } GET users/_mget { "ids" : ["2", "3"] } -------- { "docs" : [ { "_index" : "users", "_type" : "_doc", "_id" : "2", "_version" : 7, "_seq_no" : 54, "_primary_term" : 2, "found" : true, "_source" : { "name" : "John", "counter" : 3 } }, { "_index" : "users", "_type" : "_doc", "_id" : "3", "found" : false } ] }
倒排索引包含两个部分:
单词词典(Term Dictionary):
倒排列表(Posting List):
倒排索引项:
在如下文档中搜索Elasticsearch
文档内容
文档Id | 文档内容 |
---|---|
1 | Mastering Elasticsearch |
2 | Elasticsearch Server |
3 | Elasticsearch Essentials |
倒排列表
文档Id | 词频TF | 位置 | 偏移 |
---|---|---|---|
1 | 1 | 1 | <10,23> |
2 | 1 | 0 | <0,13> |
3 | 1 | 0 | <0,13> |
能够指定对某些字段不作索引。
Analysis:
Analyzer:
Analyzer由三部分组成。
默认
分词器,按词切分,小写处理。GET /_analyze
POST /_analyze
GET /<index>/_analyze
POST /<index>/_analyze
demo:
POST _analyze { "analyzer": "standard", "text": ["share your experience with NoSql & big data technologies"] }