elasticsearch 第三讲

es的详细介绍

SearchTemplatenode

tmdb 表示的是模板名称 dmdb1 表示的是当前的索引算法

脚本方式编辑sql

##编辑模板
POST _scripts/tmdb
{
  "script": {
    "lang": "mustache",
    "source": {
      "_source": ["title", "overview"],
      "size": 20,
      "query": {
        "multi_match": {
          "query": "{{q}}",
          "fields": ["title", "overview"]
        }
      }
    }
  }
}

## 编辑查询
POST tmdb1/_search/template
{
  "id": "tmdb",
  "params": {
    "q": "basketball with cartoon aliens"
  }
}
aliases 的用户

它至关于 es 某个文档的一个别名,能够把多个索引放入到同一个视图中,也能够添加过滤器,把符合条件的索引数据 查询出来,最后集中成一个别名,查询该别名能够把多个索引里的数据都查询出来json

#### 新增别名
POST _aliases
{
  "actions": [
    {
      "add": {
        "index": "news",
        "alias": "new1"
      }
    },
    {
      "add": {
        "index": "blogs",
        "alias": "new1"
      }
    }
  ]
}

## 查询的时候 会吧对应的news和blogs里的数据都查询出来
POST new1/_search
{
  "query": {
    "match_all": {}
  }
}

### 删除别名
POST _aliases
{
  "actions": [
    {
      "remove": {
        "index": "blogs",
        "alias": "new1"
      }
    },
    {
      "remove": {
        "index": "news",
        "alias": "new1"
      }
    }
  ]
}
function score query

表示的是对结果数据从新算分,而后排序,详情自行百度api

新算分 = 老算分 * 投票数缓存

使用modifier 新算分 = 老算分 * log( 1+投票数)并发

引入 factor 老算分 * log( 1 + factor * 投票数)app

当前的 boostMode 都是为 multipfy, 表示的是老算法和后边的关系,能够为 sum 等等能够查官网信息curl

max boost 表示的是 当前的分数控制的最大范围iphone

DELETE blogs
PUT blogs/_doc/1
{
  "title": "About popularity",
  "content": "In this post we wil talk about...",
  "votes": 0 
}

PUT blogs/_doc/2
{
  "title": "About popularity",
  "content": "In this post we wil talk about...",
  "votes": 100 
}

PUT blogs/_doc/3
{
  "title": "About popularity",
  "content": "In this post we wil talk about...",
  "votes": 1000000 
}

POST blogs/_search
{
  "query": {
    "function_score": {
      "query": {
        "multi_match": {
          "query": "popularity",
          "fields": ["title", "content"]
        }
      },
      "field_value_factor": {
        "field": "votes",
        "modifier": "log1p",
        "factor": 0.1
      },
      "boost_mode": "sum",
      "max_boost": 3
    }
  }
}
suggester 推荐的使用方式

suggest_mode missing表示的是 若是索引中若是存在,则不提供建议,例如lucen solid中的solid; popular 表示的是 推荐出现的频率比较高的词例如:rock 会推荐 rocks ,由于文档里面有两个里面有 rocks; always 表示的是 不管是否存在在索引中都会推荐

POST articles/_bulk 
{"index": {}}
{"body": "lucene is very cool"}
{"index": {}}
{"body": "Elasticsearch builds on top of lucene"}
{"index": {}}
{"body": "Elasticsearch rocks"}
{"index": {}}
{"body": "elastic is the company behind ELK stack"}
{"index":{}}
{"body": "Elk stack rocks"}
{"index":{}}
{"body": "elasticsearch is rock solid"}


POST articles/_search
{
  "suggest": {
    "test1": {
      "text": "lucen solid",
      "term": {
        "field": "body",
        "suggest_mode": "missing"
      }
    }
  }
}
completion suggester

联想词信息,基于fst 内存查找的方式,速度比较快,可是局限也是 只能从首字母开始匹配

DELETE articles

GET articles/_mapping

PUT articles
{
  "mappings": {
    "properties": {
      "title_completion": {
        "type": "completion"
      }
    }
  }
}
POST articles/_bulk 
{"index": {}}
{"title_completion": "lucene is very cool"}
{"index": {}}
{"title_completion": "Elasticsearch builds on top of lucene"}
{"index": {}}
{"title_completion": "Elasticsearch rocks"}
{"index": {}}
{"title_completion": "elastic is the company behind ELK stack"}
{"index":{}}
{"title_completion": "Elk stack rocks"}
{"index":{}}
{"title_completion": "elasticsearch is rock solid"}


POST articles/_search?pretty
{
  "suggest": {
    "articles_suggester": {
      "prefix": "e",
      "completion": {
        "field": "title_completion"
      }
    }
  }
}

completion 能够根据分类进行查找不一样的文档, type为 category 表示任意字符串

DELETE comments

PUT comments
{
  "mappings": {
    "properties":{
      "comment_autocomplete": {
        "type": "completion",
        "contexts": [
            {
              "type": "category",
              "name": "comment_category"
            }
          ]
      }
    }
  }
}

POST comments/_doc/1
{
  "comment": "I love the star war movies",
  "comment_autocomplete": {
    "input": ["star wars"],
    "contexts": {
      "comment_category": "movies"
    }
  }
}

POST comments/_doc/2
{
  "comment": "Where can I find a Starbucks",
  "comment_autocomplete": {
    "input": ["starbucks"],
    "completions": {
      "comment_category": "coffee"
    }
  }
}


POST comments/_search
{
  "suggest": {
    "YOUR_SUGGESTION": {
      "text": "star",
      "completion":{
        "field": "comment_autocomplete",
        "contexts":
            {
              "comment_category": "movies"
            }
          
      }
    }
  }
}
配置夸集群搜索

经过命令的方式,能够吧多个集群里面的数据进行搜索

elasticsearch-7.5.0/bin/elasticsearch -E node.name=cluster1_node -E cluster.name=cluster1 -E path.data=cluster1_data -E discovery.type=single-node -E http.port=9201 -E transport.port=9301
elasticsearch-7.5.0/bin/elasticsearch -E node.name=cluster2_node -E cluster.name=cluster2 -E path.data=cluster2_data -E discovery.type=single-node -E http.port=9202 -E transport.port=9302
elasticsearch-7.5.0/bin/elasticsearch -E node.name=cluster3_node -E cluster.name=cluster3 -E path.data=cluster3_data -E discovery.type=single-node -E http.port=9203 -E transport.port=9303


curl -XPUT "http://localhost:9201/_cluster/settings" -H "Content-Type:application/json" -d '{"persistent":{"cluster":{"remote":{"cluster1":{"seeds":["127.0.0.1:9301"], "transport.ping_schedule":"30s"},"cluster2":{"seeds":["127.0.0.1:9302"],"transport.ping_schedule":"30s","transport.compress": true, "skip_unavailable":true},"cluster3":{"seeds":["127.0.0.1:9303"]}}}}}'

curl -XPUT "http://localhost:9202/_cluster/settings" -H "Content-Type:application/json" -d '{"persistent":{"cluster":{"remote":{"cluster1":{"seeds":["127.0.0.1:9301"], "transport.ping_schedule":"30s"},"cluster2":{"seeds":["127.0.0.1:9302"],"transport.ping_schedule":"30s","transport.compress": true, "skip_unavailable":true},"cluster3":{"seeds":["127.0.0.1:9303"]}}}}}'

curl -XPUT "http://localhost:9203/_cluster/settings" -H "Content-Type:application/json" -d '{"persistent":{"cluster":{"remote":{"cluster1":{"seeds":["127.0.0.1:9301"], "transport.ping_schedule":"30s"},"cluster2":{"seeds":["127.0.0.1:9302"],"transport.ping_schedule":"30s","transport.compress": true, "skip_unavailable":true},"cluster3":{"seeds":["127.0.0.1:9303"]}}}}}'


curl -XPOST "http://localhost:9201/users/_doc" -H "Content-Type:application/json" -d '{"name":"user1", "age": 10}'

curl -XPOST "http://localhost:9202/users/_doc" -H "Content-Type:application/json" -d '{"name":"user2", "age": 20}'

curl -XPOST "http://localhost:9203/users/_doc" -H "Content-Type:application/json" -d '{"name":"user3", "age": 30}'



访问方式
http://localhost:9201/cluster1:users,cluster2:users,cluster3:users/_search

node.master=false 能够设置当前节点不能为主节点

若是配置成功,则若是更改node.master=true的时候,启动当前的服务,则会报错 master not discovered or elected yet, an election requires 不能找到主节点,解决方式是 删除 对应的data数据,可是这样对应的 数据信息也所有删除掉啦;

能够优先启动 原主节点,而后再启动删除之后的 节点,这样数据会从新同步过来

es 若是设置的主分片为3 副本为1,若是数据分布到不一样的机器上,若是某台机子挂掉,则改机子里面的数据对应的副本也会同步到其余机器上,若是挂掉的机器是主分片,则会在副本中从新选举 主分片

主分片建立的时候就肯定不能修改,除非删除索引 从新录入;

文档到分片的路由算法

shard = hash(_routing) % number_of_primary_shards

hash 确保均匀的分布到分片中 默认_routing 是文档的id

能够指定_routing 的值 这里就是 主分片不能修改的缘由

PUT posts/_doc/100?routing=bigdata
{
	"title": "Master Elasticsearch",
	"body": "Let's Rock"
}
es分片和生命周期

单个倒排索引表示的是一个 segment,segment是不可变动的,多个segment就是 index,他对应的是es中的分片

当有文档写入的时候,会生成新的 segment ,查询的时候会查询全部的 segments,对结果汇总 ,删除文档信息 保存在 .del 文件中

es refresh

将index buffer 写入到 segment的过程叫 refresh,

refresh 默认1秒执行一次,refresh成功之后就能够被搜索到啦;

若是系统有大量的数据写入就会有不少的 segment

index buffer 被占满也会触发 refresh,默认值为 JVM的 10%

<img src="/Users/duanlsh/Library/Application Support/typora-user-images/image-20200327015811530.png" alt="image-20200327015811530" style="zoom:50%;" />

transaction log

segment 写入磁盘的过程比较耗时,因此,先把segment写入缓存,以开放查询;

为啦防止数据丢失,因此同时会写入到 Transaction log 中,transaction log会有入盘操做,每一个分片都有一个 transaction log

这样,若是断电的状况下,若是启动先从transaction log中加载到数据,保证数据完整性

<img src="/Users/duanlsh/Library/Application Support/typora-user-images/image-20200327020120587.png" alt="image-20200327020120587" style="zoom:50%;" />

Flush

flush 默认30分钟调用一次,首先调用 refresh 清空 index buffer;

调用 fsync, 将缓存中的 segment 写入到磁盘,保证全部数据 进入到 transaction log中;

清空 transaction log 中的数据;

当 transaction log 满的时候也会调用flush, transaction默认为 512MB大小;

<img src="/Users/duanlsh/Library/Application Support/typora-user-images/image-20200327020850670.png" alt="image-20200327020850670" style="zoom:50%;" />

merge

segment有不少,会按期进行合并;减小 segment的数量和 删除的文件;

强制merge 经过 POST my_index/_forcemerge 进行操做

对文本进行排序

对文本排序须要设置 字段为 fielddata 为true 默认为 docvalues ,更改成 field data

<img src="/Users/duanlsh/Library/Application Support/typora-user-images/image-20200328014528877.png" alt="image-20200328014528877" style="zoom:50%;" />

PUT /kibana_sample_data_ecommerce/_mapping
{
	"properties":{
		"customer_full_name" : {
          "type" : "text",
          "fielddata": true, 
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        }
	}
}
es 分页获取数据
深度分页 from size

es数据是保存到多个分片上的,多个机器上的,当查询 from 990 size 10 的时候,会在每一个分片上获取 1000个文档,而后经过 coordinating Node 聚合全部结果,最后再经过排序获取前 1000个文档,页数越深,占用的内存也越大,es默认限制是10000个文档,能够经过 index.max.result.window 来设置

POST /kibana_sample_data_ecommerce/_search
{
  "from": 1,
  "size": 2, 
  "query": {
    "match_all": {}
  }
}
search_after 他必须的from为0开始

search_after 为 返回的结果信息里面的 sort信息,以此来实现分页效果;能够避免深度分页问题;

可是若是新添加数据,仍然能够搜索的到

//第一次请求:
POST /kibana_sample_data_ecommerce/_search
{
  "size": 2, 
  "query": {
    "match_all": {}
  },
  "sort": [
    {
      "order_date": {
        "order": "desc"
      },
      "_id":{
        "order": "desc"
      }
    }
  ]
}

返回结果
...
	"sort" : [
          1581808954000,
          "gTTfym8BtdKew7ex1Zsk"
        ]
//第二次请求 
POST /kibana_sample_data_ecommerce/_search
{
  "from": 1,
  "size": 2, 
  "query": {
    "match_all": {}
  },
  "search_after":[
          1581808954000,
          "gTTfym8BtdKew7ex1Zsk"
        ],
  "sort": [
    {
      "order_date": {
        "order": "desc"
      },
      "_id":{
        "order": "desc"
      }
    }
  ]
}
scroll api 的用法

他是经过建立快照的方式进行查询;

也就是在生成快照的时候的数据,为最终能查找到的数据,若是中间新增啦数据,是没法查找到的;

查找方式为每次查找数据,都要输入上次查找的id

他的数量是按照第一次查询的数量计算的;

//设置croll保存5分钟
POST /kibana_sample_data_ecommerce/_search?scroll=5m
{
	"size": 1,
	"query": {
		"match_all":{}
	}
}
//吧上面结果的scrollId 获取 再次查询,有效为1分钟
POST _search/scroll
{
	"scroll": "1m",
	"scroll_id": "DXF1ZXJ5QW5kRmV0Y2gBAAAAAAAADZUWSThxQkp4VUZTZy1ZZzE0OGI1OW02Zw=="
}
from size search after scroll
不适合深度分页 能够深度分页,可是只能从0开始日后查询 随机返回效率高
适合 前几页数据查询 适合深度分页查询 适合所有文件获取下载
es并发控制

es 并发采用的是乐观锁;

es 使用 if_seq_no和if_primary_term来更新数据的时候,当前的数据的 对应的值必须的和传递的值相同,不然不能更新

es 也能够经过 version和version_type 为锁来控制对应的值信息

DELETE products
GET products/_search

//此处会返回对应的 seq_no 和 primary_term 的值,就是下面对应的值信息
PUT products/_doc/1
{
  "title": "iphone",
  "count": 100
}

PUT products/_doc/1?if_seq_no=1&if_primary_term=1
{
  "title":"iphone1",
  "count": 100
}
//此处的version必须的大于当前1文档的version不然冲突这个就是es的并发处理乐观锁
PUT products/_doc/1?version=6&version_type=external
{
  "title":"iphone2",
  "count": 1
}
es的聚合分析 min max avg stats terms range histogram
sql es
select count(brand) from table metric
group by bucket

聚合计算 是不能操做text类型的数据的;

terms aggregation 不能对text进行 分桶,能够更改成 filedata类型 能够参考 docvalue和field data 的不区别, keyword 默认支持分桶

aggs 包含 min max avg stats terms range histogram

先分组,而后获取分组内的 top信息用的是 top_hits

获取总分组的数量使用的是 cardinality

DELETE employees

GET employees/_mapping

PUT employees
{
  "mappings": {
    "properties": {
      "age":{
        "type": "integer"
      },
      "gender": {
        "type": "keyword"
      },
      "name": {
        "type": "keyword"
      },
      "salary": {
        "type": "integer"
      },
      "job": {
        "type": "text",
        "fields": {
          "keyword": {
            "type": "keyword",
            "ignore_above":22
          }
        }
      }
    }
  }
}


POST employees/_bulk
{"index":{"_id": "1"}}
{"name":"Emma","age":"32","job":"Product Manager", "gender": "female","salary": "35000"}
{"index":{"_id": "2"}}
{"name":"Underwood","age":"41","job":"Dev Manager", "gender": "male","salary": "50000"}
{"index":{"_id": "3"}}
{"name":"Tran","age":"25","job":"Web Designer", "gender": "male","salary": "18000"}
{"index":{"_id": "4"}}
{"name":"Rivera","age":"26","job":"Web Designer", "gender": "female","salary": "22000"}
{"index":{"_id": "5"}}
{"name":"Rose","age":"25","job":"QA", "gender": "female","salary": "18000"}
{"index":{"_id": "6"}}
{"name":"Lucy","age":"31","job":"QA", "gender": "female","salary": "25000"}
{"index":{"_id": "7"}}
{"name":"Byrd","age":"27","job":"QA", "gender": "male","salary": "20000"}
{"index":{"_id": "8"}}
{"name":"Foster","age":"27","job":"Java Programmer", "gender": "male","salary": "20000"}
{"index":{"_id": "9"}}
{"name":"Gregory","age":"32","job":"Java Programmer", "gender": "male","salary": "22000"}
{"index":{"_id": "10"}}
{"name":"Bryant","age":"20","job":"Java Programmer", "gender": "male","salary": "9000"}
{"index":{"_id": "11"}}
{"name":"Jenny","age":"36","job":"Java Programmer", "gender": "female","salary": "38000"}
{"index":{"_id": "12"}}
{"name":"Mcdonald","age":"31","job":"Java Programmer", "gender": "male","salary": "32000"}
{"index":{"_id": "13"}}
{"name":"Jonthna","age":"30","job":"Java Programmer", "gender": "female","salary": "30000"}
{"index":{"_id": "14"}}
{"name":"Marsha","age":"32","job":"Javascript Programmer", "gender": "male","salary": "25000"}
{"index":{"_id": "15"}}
{"name":"King","age":"33","job":"Java Programmer", "gender": "male","salary": "28000"}
{"index":{"_id": "16"}}
{"name":"Mccarthy","age":"21","job":"Javascript Programmer", "gender": "male","salary": "16000"}
{"index":{"_id": "17"}}
{"name":"Goodwid","age":"25","job":"Javascript Programmer", "gender": "male","salary": "16000"}
{"index":{"_id": "18"}}
{"name":"Catherine","age":"29","job":"Javascript Programmer", "gender": "female","salary": "20000"}
{"index":{"_id": "19"}}
{"name":"Boone","age":"30","job":"DBA", "gender": "male","salary": "30000"}
{"index":{"_id": "20"}}
{"name":"Kathy","age":"29","job":"DBA", "gender": "female","salary": "20000"}


POST employees/_search
{
  "size": 0,
  "aggs": {
    "min_salary": {
      "min": {
        "field": "salary"
      }
    }
  }
}

POST employees/_search
{
  "size": 0,
  "aggs": {
    "max_salary": {
      "max": {
        "field": "salary"
      }
    }
  }
}

POST employees/_search
{
  "size": 0,
  "aggs": {
    "min_salay": {
      "min": {
        "field": "salary"
      }
    },
    "max_salay": {
      "max": {
        "field": "salary"
      }
    },
    "avg_salay": {
      "avg": {
        "field": "salary"
      }
    }
  }
}



POST employees/_search
{
  "size": 20,
  "aggs": {
    "stats_salay":{
      "stats": {
        "field": "salary"
      }
    }
  }
}


POST employees/_search
{
  "size": 0,
  "aggs": {
    "jobs": {
      "terms": {
        "field": "job.keyword"
      }
    }
  }
}

//分桶返回的类别总数量
POST employees/_search
{
  "size": 0,
  "aggs": {
    "cardinate": {
      "cardinality": {
        "field": "job.keyword"
      }
    }
  }
}

POST employees/_search
{
  "size": 0,
  "aggs": {
    "gender": {
      "terms": {
        "field": "age",
        "size": 20
      }
    }
  }
}
### 根据不一样的工种 年龄最大的3员工信息
POST employees/_search
{
  "size": 0,
  "aggs": {
    "result": {
      "terms": {
        "field": "job.keyword"
      },
      "aggs": {
        "old_employees": {
          "top_hits": {
            "size": 3
            , "sort": [
              {"age": {"order": "desc"}}
              ]
          }
        }
      }
    }
  }
}

#### range 分桶,指定key
POST employees/_search
{
  "size": 0,
  "aggs": {
    "range_result": {
      "range": {
        "field": "salary",
        "ranges": [
          {
            "from": 0,
            "to": 10000
          },
          {
            "key": "1w-2w",
            "from": 10000,
            "to": 20000
          },
          {
            "key": ">2w",
            "from": 20000
          }
        ]
      }
    }
  }
}

#### histogram 分桶 按照5000进行分桶统计
POST employees/_search
{
  "size": 0,
  "aggs": {
    "result1": {
      "histogram": {
        "field": "salary",
        "interval": 5000,
        "extended_bounds": {
          "min": 0,
          "max": 100000
        }
      }
    }
  }
}
pipeline 聚合分析

pipeline 表示的是 能够对 聚合的结果进行二次聚合

#### 获取term_job 分桶下的每个值的平均值中的最小值
#### term_job 表示的外部聚合 avg_salary 表示的是外部聚合的内部聚合
POST employees/_search
{
  "size": 0,
  "aggs": {
    "term_job": {
      "terms": {
        "field": "job.keyword"
      },
      "aggs": {
        "avg_salary":{
          "avg": {
            "field": "salary"
          }
        }
      }
    },
    "result":{
      "min_bucket": {
        "buckets_path": "term_job>avg_salary"
      }
    }
  }
}
聚合的做用范围和排序

1使用query进行查询,当进行聚合的时候,是对query的结果进行聚合操做的; eg1

二、能够再 aggs中使用 filter 进行过滤,同时进行agg聚合,当前是在 fillter结果中进行聚合操做,若是在filter的父级进行aggs操做的话,是操做的所有数据 eg2

三、postfield 是对 聚合结果进行筛选,查看匹配对应结果的数据 eg3

四、global 至关因而1和2的整合,当使用global的时候,进行query查询不会对结果统计有影响 eg4

五、聚合排序的时候 能够按照字段key和count进行排序 eg5

六、聚合排序的时候,能够按照另外一个聚合结果进行排序 eg6

eg1

POST employees/_search
{
  "size": 0, 
  "query": {
    "range": {
      "age": {
        "gte": 20
      }
    }
  },
  "aggs": {
    "result": {
      "terms": {
        "field": "job.keyword"
      }
    }
  }
}

eg2

#### 分为两种,一种是过滤结果的统计,一个是整个内容的统计,query 查询结果只能 过滤结果的统计
POST employees/_search
{
  "size": 0, 
  "aggs": {
    "old_persion": {
      "filter": {
        "range": {
          "age": {
            "gte": 40
          }
        }
      },
      "aggs": {
        "jobs": {
          "terms": {
            "field": "job.keyword"
          }
        }
      }
    },
    "all_jobs":{
      "terms": {
        "field": "job.keyword"
      }
    }
  }
}

eg3

POST employees/_search
{
  "aggs": {
    "jobs": {
      "terms": {
        "field": "job.keyword"
      }
    }
  },
  "post_filter": {
    "match":{
      "job.keyword": "Web Designer"
    }
  }
}

eg4

##### global 至关于上面的对全部内容统计的部分处理;此处不是用的filter方式,而是用的global的方式进行处理; 他是忽略掉啦 query的查询条件;
POST employees/_search
{
  "size": 0,
  "query": {
    "range": {
      "age": {
        "gte": 40
      }
    }
  },
  "aggs": {
    "jobs": {
      "terms": {
        "field": "job.keyword"
      }
    },
    "all":{
      "global": {},
      "aggs": {
        "all_result": {
          "terms": {
            "field": "job.keyword"
          }
        }
      }
    }
  }
}

eg5

#### 排序顺序 _key 表示的是按照key执行 _count 按照数量执行排序,顺序是按照后面写的字段优先排序,而后再按照前面写的字段排序,当前就是 先按照 _count 再按照 _key 排序
POST employees/_search
{
  "size": 0, 
  "query": {
    "range": {
      "age": {
        "gte": 20
      }
    }
  },
  "aggs": {
    "NAME": {
      "terms": {
        "field": "job.keyword",
        "order": {
          "_key": "desc",
          "_count": "asc"
        }
      }
    }
  }
}

eg6

#### 按照聚合结果进行排序
POST employees/_search
{
  "size": 0,
  "aggs": {
    "jobs": {
      "terms": {
        "field": "job.keyword",
        "order": {
          "test1": "asc"
        }
      },
      "aggs": {
        "test1": {
          "avg": {
            "field": "salary"
          }
        }
      }
    }
  }
}
分布式系统近似统计算法

TODO 再验证

<img src="/Users/duanlsh/Library/Application Support/typora-user-images/image-20200401013056443.png" alt="image-20200401013056443" style="zoom:50%;" />

nested 对象

nested 对象信息 表示的是数据查询中包含对象的信息

eg1 查询中 搜索 Keanu Hopper 是能够查询到结果的,由于es分析的时候,至关于解析成啦 actors.first_name=["Keanu",Dennis]actors.last_name=["Reeves","Hopper"] 经过之前学的知识可知,只要包含查询值,则会命中,因此能够选中;

eg2 查询搜索 Keanu Hopper 是不能够命中的,由于使用啦 nested 表示的是一个对象,他解析成的是 两个文档, Keanu ReevesDennis Hopper 只有包含着两个钟的一个才会命中

eg1

DELETE my_movie

PUT my_movie
{
  "mappings" : {
    "properties" : {
        "actors" : {
          "properties" : {
            "first_name" : {
              "type" : "keyword"
            },
            "last_name" : {
              "type" : "text"
            }
          }
        },
        "title" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        }
      }
    }
  }
}

POST my_movie/_doc/1
{
  "title": "speed",
  "actors": [
      {
        "first_name": "Keanu",
        "last_name": "Reeves"
      },
      {
        "first_name": "Dennis",
        "last_name": "Hopper"
      }
    ]
}


POST my_movie/_search
{
  
  "query": {
    "bool": {
      "must": [
        {"match": {
        "actors.first_name": "Keanu"
        }},
        {"match": {
        "actors.last_name": "Hopper"
        }}
        ]
    }
  }
}

eg2

DELETE my_movie

PUT my_movie
{
  "mappings" : {
    "properties" : {
        "actors" : {
          "type": "nested", 
          "properties" : {
            "first_name" : {
              "type" : "keyword"
            },
            "last_name" : {
              "type" : "text"
            }
          }
        },
        "title" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        }
      }
    }
  }
}

POST my_movie/_doc/1
{
  "title": "speed",
  "actors": [
      {
        "first_name": "Keanu",
        "last_name": "Reeves"
      },
      {
        "first_name": "Dennis",
        "last_name": "Hopper"
      }
    ]
}

POST my_movie/_search
{
  
  "query": {
    "bool": {
      "must": [
        {
          "nested": {
            "path": "actors",
            "query": {
              "bool": {
                "must": [
                  {"match": {
                    "actors.first_name": "Keanu"
                  }},
                  {"match": {
                    "actors.last_name": "Hopper"
                  }}
                ]
              }
            }
          }
        }
      ]
    }
  }
}
文档的父子关系文档

创建父子文档关系

DELETE my_blogs

PUT my_blogs
{
  "settings": {
    "number_of_shards": 2
  },
  "mappings": {
    "properties": {
      "blog_comments_relation":{
        "type": "join",
        "relations": {
          "blog": "comment"
        }
      },
      "content":{
        "type": "text"
      },
      "title":{
        "type": "keyword"
      }
    }
  }
}

PUT my_blogs/_doc/blog1
{
  "title": "Learning Elasticsearch",
  "content": "learning ELK @ geektime",
  "blog_comments_relation": {
    "name": "blog"
  }
}

PUT my_blogs/_doc/blog2
{
  "title": "Learning Hadoop",
  "content": "learning Hadoop",
  "blog_comments_relation":{
    "name": "blog"
  }
}

PUT my_blogs/_doc/comment1?routing=blog1
{
  "comment": "I am learning ELK",
  "username": "Jack",
  "blog_comments_relation": {
    "name":"comment",
    "parent": "blog1"
  }
}

PUT my_blogs/_doc/comment2?routing=blog2
{
  "comment": "I like Hadoop!!!!!",
  "username": "Jack",
  "blog_comments_relation": {
    "name":"comment",
    "parent": "blog2"
  }
}

PUT my_blogs/_doc/comment3?routing=blog2
{
  "comment": "Hello Hadoop",
  "username": "Bob",
  "blog_comments_relation": {
    "name":"comment",
    "parent": "blog2"
  }
}


POST my_blogs/_search
{
  
}

GET my_blogs/_doc/blog2

POST my_blogs/_search
{
  "query": {
    "parent_id": {
      "type": "comment",
      "id": "blog2"
    }
  }
}

#### 返回子文档信息
POST my_blogs/_search
{
  "query": {
    "has_parent": {
      "parent_type": "blog",
      "query": {
        "match": {
          "content": "Learning hadoop"
        }
      }
    }
  }
}

#### 返回父文档信息
POST my_blogs/_search
{
  "query": {
    "has_child": {
      "type": "comment",
      "query": {
        "match": {
          "username": "Bob"
        }
      }
    }
  }
}
索引重建

当索引类型发生变动,须要重建索引

索引主分片发生变化 须要重建索引

update by query 在现有的索引上重建

reindex 在其余索引上重建

DELETE blogs

PUT blogs/_doc/1
{
  "content":"Hadoop is cool",
  "keyword": "hadoop"
}


GET blogs/_mapping

PUT blogs/_mapping
{
  "properties" : {
        "content" : {
          "type" : "text",
          "fields" : {
            "english" : {
              "type" : "text",
              "analyzer": "english"
            }
          }
        }
      }
}

PUT blogs/_doc/2
{
  "content": "Elasticsearch rocks",
  "keyword": "elasticsearch"
}

POST blogs/_search
{
  "query": {
    "match": {
      "content.english": "hadoop"
    }
  }
}

##### 添加索引部份内容的时候,直接 _update_by_query
POST blogs/_update_by_query
{}


PUT blogs/_mapping
{
  "properties": {
    "keyword" : {
          "type" : "keyword"
          
        }
  }
}

DELETE blogs_fix

#### 更改类型的时候 重建索引
PUT blog_fix
{
  "mappings": {
    "properties" : {
        "content" : {
          "type" : "text",
          "fields" : {
            "english" : {
              "type" : "text",
              "analyzer" : "english"
            },
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        },
        "keyword" : {
          "type" : "keyword"
        }
      }
  }
}

GET blog_fix/_mapping

###### 重建索引,把原来的索引导入进新索引
POST _reindex
{
  "source": {
  	#### 原来索引名
    "index": "blogs",
    #### 获取匹配的索引
    "query": {
      "match": {
        "content": "elasticsearch"
      }
    },
    "size": 1
  },
  "dest": {
  	#### 目标索引
    "index": "blog_fix",
    #### 若是当前索引的数据存在,则抛异常,不存在的数据添加进去
    #### 若是不加这个则 所有覆盖, 可是若是原来已经存在的,添加进来的数据不存在,则直接保留
    "op_type": "create"
  }
}

GET blog_fix/_doc/1


PUT blog_fix/_doc/3
{
  "content": "Elasticsearch rocks copy1",
  "keyword": "elasticsearch copy1"
}

DELETE blog_fix/_doc/1

POST blog_fix/_search
{
  "size": 0,
  "aggs": {
    "blog_keyword": {
      "terms": {
        "field": "keyword",
        "size": 10
      }
    }
  }
}

POST blog_fix/_search
{}
IngestPipeline

至关因而一个管道,能够对添加进去的数据进行 管道过滤处理,好比说新增字段 es,hadoop 能够经过分割管道,在新增的时候指定分割管道,则添加的数据自动转换成 对应的数据; 也能够对原来的数据 指定管道的方式重建索引

### pipleline 的用法
DELETE tech_blogs

PUT tech_blogs/_doc/1
{
  "title": "Introducing big data...",
  "tags": "hadoop,elasticsearch,spark",
  "content": "You know, for big data"
}

GET tech_blogs/_doc/1

#### 测试pipleline对字段的 测试效果
POST _ingest/pipeline/_simulate
{
  "pipeline": {
    "description": "to split blog tags",
    "processors": [
      {
        // 对字段进行分割
        "split": {
          "field": "tags",
          "separator": ","
        }
      },
      {
        // 添加字段
        "set": {
          "field": "view",
          "value": "0"
        }  
      }
    ]
  },
  "docs": [
      {
        "_source" : {
          "tags" : "hadoop,elasticsearch,spark"
        }
      }
    ]
}

//定义一个pipleine
PUT _ingest/pipeline/blog_pipleline
{
  "processors": [
      {
        "split": {
          "field": "tags",
          "separator": ","
        },
        "set": {
          "field": "view",
          "value": "0"
        }
      }
    ]
}

GET _ingest/pipeline/blog_pipleline

// 这样会对文案自动使用上blog_pipleline 对应的信息
POST _ingest/pipeline/blog_pipleline/_simulate
{
  "docs": [
      {
        "_source" : {
          "tags" : "hadoop,elasticsearch,spark"
        }
      }
    ]
}

POST tech_blogs/_doc/2?pipeline=blog_pipleline
{
  "title": "Introducing cloud computering",
  "tags": "openstacks, k8s",
  "content": "You know, for cloud"
}

POST tech_blogs/_doc/3
{
  "title": "Introducing cloud computering",
  "tags": "openstacks, k8s",
  "content": "You know, for cloud"
}


POST tech_blogs/_search
{}

//执行的时候虽然已经使用 blog_pipleline 的数据会报错,可是也会修改为功
POST tech_blogs/_update_by_query?pipeline=blog_pipleline
{}
// 能够经过这样的方法总体更改
POST tech_blogs/_update_by_query?pipeline=blog_pipleline
{
  "query": {
    "bool": {
      "must_not": [
        {
          "exists": {
            "field": "views"
          }
        }
      ]
    }
  } 
}
相关文章
相关标签/搜索