一、 Elasticsearch
的请求与结果node
请求结构
curl -X<VERB> '<PROTOCOL>://<HOST>:<PORT>/<PATH>?<QUERY_STRING>' -d '<BODY>'
- VERB HTTP方法:GET, POST, PUT, HEAD, DELETE
- PROTOCOL http或者https协议(只有在Elasticsearch前面有https代理的时候可用)
- HOST Elasticsearch集群中的任何一个节点的主机名,若是是在本地的节点,那么就叫localhost
- PORT Elasticsearch HTTP服务所在的端口,默认为9200
- PATH API路径(例如_count将返回集群中文档的数量),PATH能够包含多个组件,例如_cluster/stats或者_nodes/stats/jvm
- QUERY_STRING 一些可选的查询请求参数,例如?pretty参数将使请求返回更加美观易读的JSON数据
BODY 一个JSON格式的请求主体(若是请求须要的话)
PUT建立(索引建立)
$ curl -XPUT 'http://localhost:9200/megacorp/employee/3?pretty' -d ' { "first_name" : "Douglas", "last_name" : "Fir", "age" : 35, "about": "I like to build cabinets", "interests": [ "forestry" ] } ’{ "_index" : "megacorp", "_type" : "employee", "_id" : "3", "_version" : 1, "_shards" : { "total" : 2, "successful" : 1, "failed" : 0 }, "created" : true }GET请求(搜索)
检索文档
$ curl -XGET 'http://localhost:9200/megacorp/employee/1?pretty'{ "_index" : "megacorp", "_type" : "employee", "_id" : "1", "_version" : 1, "found" : true, "_source" : { "first_name" : "John", "last_name" : "Smith", "age" : 25, "about" : "I love to go rock climbing", "interests" : [ "sports", "music" ] } }简单搜索
使用
megacorp
索引和employee
类型,可是咱们在结尾使用关键字_search来取代原来的文档ID。响应内容的hits数组中包含了咱们全部的三个文档。默认状况下搜索会返回前10个结果。数据库$ curl -XGET 'http://localhost:9200/megacorp/employee/_search?pretty'{ "took" : 2, "timed_out" : false, "_shards" : { "total" : 5, "successful" : 5, "failed" : 0 }, "hits" : { "total" : 3, "max_score" : 1.0, "hits" : [ { "_index" : "megacorp", "_type" : "employee", "_id" : "2", "_score" : 1.0, "_source" : { "first_name" : "Jane", "last_name" : "Smith", "age" : 32, "about" : "I like to collect rock albums", "interests" : [ "music" ] } }, { "_index" : "megacorp", "_type" : "employee", "_id" : "1", "_score" : 1.0, "_source" : { "first_name" : "John", "last_name" : "Smith", "age" : 25, "about" : "I love to go rock climbing", "interests" : [ "sports", "music" ] } }, { "_index" : "megacorp", "_type" : "employee", "_id" : "3", "_score" : 1.0, "_source" : { "first_name" : "Douglas", "last_name" : "Fir", "age" : 35, "about" : "I like to build cabinets", "interests" : [ "forestry" ] } } ] } }接下来,让咱们搜索姓氏中包含“Smith”的员工。咱们将在命令行中使用轻量级的搜索方法。这种方法常被称做查询字符串(query string)搜索,由于咱们像传递URL参数同样去传递查询语句:数组
$ curl -XGET 'http://localhost:9200/megacorp/employee/_search?q=last_name:Smith&pretty'{ "took" : 4, "timed_out" : false, "_shards" : { "total" : 5, "successful" : 5, "failed" : 0 }, "hits" : { "total" : 2, "max_score" : 0.30685282, "hits" : [ { "_index" : "megacorp", "_type" : "employee", "_id" : "2", "_score" : 0.30685282, "_source" : { "first_name" : "Jane", "last_name" : "Smith", "age" : 32, "about" : "I like to collect rock albums", "interests" : [ "music" ] } }, { "_index" : "megacorp", "_type" : "employee", "_id" : "1", "_score" : 0.30685282, "_source" : { "first_name" : "John", "last_name" : "Smith", "age" : 25, "about" : "I love to go rock climbing", "interests" : [ "sports", "music" ] } } ] } }使用DSL语句查询
查询字符串搜索便于经过命令行完成特定(ad hoc)的搜索,可是它也有局限性(参阅简单搜索章节)。Elasticsearch提供丰富且灵活的查询语言叫作DSL查询(Query DSL),它容许你构建更加复杂、强大的查询。curl
DSL(Domain Specific Language特定领域语言)以JSON请求体的形式出现。咱们能够这样表示以前关于“Smith”的查询:jvm
$ curl -XGET 'http://localhost:9200/megacorp/employee/_search?pretty' -d ' { "query" : { "match" : { "last_name" : "Smith" } } } '更复杂的搜索
咱们让搜索稍微再变的复杂一些。咱们依旧想要找到姓氏为“Smith”的员工,可是咱们只想获得年龄大于30岁的员工。咱们的语句将添加过滤器(filter),它使得咱们高效率的执行一个结构化搜索:elasticsearch
$ curl -XGET 'http://localhost:9200/megacorp/employee/_search?pretty' -d ' { "query" : { "filtered" : { "filter" : { "range" : { "age" : { "gt" : 30 } --<1> } }, "query" : { "match" : { "last_name" : "smith" --<2> } } } } } '
- <1> 这部分查询属于区间过滤器(range filter),它用于查找全部年龄大于30岁的数据——gt为"greater than"的缩写。
- <2> 这部分查询与以前的match语句(query)一致。
{ "took" : 2, "timed_out" : false, "_shards" : { "total" : 5, "successful" : 5, "failed" : 0 }, "hits" : { "total" : 1, "max_score" : 0.30685282, "hits" : [ { "_index" : "megacorp", "_type" : "employee", "_id" : "2", "_score" : 0.30685282, "_source" : { "first_name" : "Jane", "last_name" : "Smith", "age" : 32, "about" : "I like to collect rock albums", "interests" : [ "music" ] } } ] } }全文搜索
到目前为止搜索都很简单:搜索特定的名字,经过年龄筛选。让咱们尝试一种更高级的搜索,全文搜索——一种传统数据库很难实现的功能。学习
咱们将会搜索全部喜欢“rock climbing”的员工:ui
$ curl -XGET 'http://localhost:9200/megacorp/employee/_search?pretty' -d ' { "query" : { "match" : { "about" : "rock climbing" } } } '你能够看到咱们使用了以前的
match
查询,从about
字段中搜索"rock climbing",咱们获得了两个匹配文档:url{ "took" : 3, "timed_out" : false, "_shards" : { "total" : 5, "successful" : 5, "failed" : 0 }, "hits" : { "total" : 2, "max_score" : 0.16273327, "hits" : [ { "_index" : "megacorp", "_type" : "employee", "_id" : "1", "_score" : 0.16273327,<1> "_source" : { "first_name" : "John", "last_name" : "Smith", "age" : 25, "about" : "I love to go rock climbing", "interests" : [ "sports", "music" ] } }, { "_index" : "megacorp", "_type" : "employee", "_id" : "2", "_score" : 0.016878016,<2> "_source" : { "first_name" : "Jane", "last_name" : "Smith", "age" : 32, "about" : "I like to collect rock albums", "interests" : [ "music" ] } } ] } }
- <1><2> 结果相关性评分。
默认状况下,Elasticsearch根据结果相关性评分来对结果集进行排序,所谓的「结果相关性评分」就是文档与查询条件的匹配程度。很显然,排名第一的
John Smith
的about
字段明确的写到“rock climbing”命令行可是为何
Jane Smith
也会出如今结果里呢?缘由是“rock”在她的abuot字段中被说起了。由于只有“rock”被说起而“climbing”没有,因此她的_score
要低于John。短语搜索
目前咱们能够在字段中搜索单独的一个词,这挺好的,可是有时候你想要确切的匹配若干个单词或者短语(phrases)。例如咱们想要查询同时包含"rock"和"climbing"(而且是相邻的)的员工记录。
要作到这个,咱们只要将
match
查询变动为match_phrase
查询便可:$ curl -XGET 'http://localhost:9200/megacorp/employee/_search?pretty' -d ' { "query" : { "match_phrase" : { "about" : "rock climbing" } } } '{ "took" : 16, "timed_out" : false, "_shards" : { "total" : 5, "successful" : 5, "failed" : 0 }, "hits" : { "total" : 1, "max_score" : 0.23013961, "hits" : [ { "_index" : "megacorp", "_type" : "employee", "_id" : "1", "_score" : 0.23013961, "_source" : { "first_name" : "John", "last_name" : "Smith", "age" : 25, "about" : "I love to go rock climbing", "interests" : [ "sports", "music" ] } } ] } }高亮咱们的搜索
不少应用喜欢从每一个搜索结果中高亮(highlight)匹配到的关键字,这样用户能够知道为何这些文档和查询相匹配。在Elasticsearch中高亮片断是很是容易的。
让咱们在以前的语句上增长
highlight
参数:$ curl -XGET 'http://localhost:9200/megacorp/employee/_search?pretty' -d ' { "query" : { "match_phrase" : { "about" : "rock climbing" } }, "highlight": { "fields" : { "about" : {} } } } '当咱们运行这个语句时,会命中与以前相同的结果,可是在返回结果中会有一个新的部分叫作
highlight
,这里包含了来自about
字段中的文本,而且用<em></em>来标识匹配到的单词。{ "took" : 33, "timed_out" : false, "_shards" : { "total" : 5, "successful" : 5, "failed" : 0 }, "hits" : { "total" : 1, "max_score" : 0.23013961, "hits" : [ { "_index" : "megacorp", "_type" : "employee", "_id" : "1", "_score" : 0.23013961, "_source" : { "first_name" : "John", "last_name" : "Smith", "age" : 25, "about" : "I love to go rock climbing", "interests" : [ "sports", "music" ] }, "highlight" : { "about" : [ "I love to go <em>rock</em> <em>climbing</em>" ] } } ] } }聚合
分析
最后,咱们还有一个需求须要完成:容许管理者在职员目录中进行一些分析。 Elasticsearch有一个功能叫作聚合(aggregations),它容许你在数据上生成复杂的分析统计。它很像SQL中的
GROUP BY
可是功能更强大。$ curl -XGET 'http://localhost:9200/megacorp/employee/_search?pretty' -d ' { "aggs": { "all_interests": { "terms": { "field": "interests" } } } } '查询结果:
{... "aggregations" : { "all_interests" : { "doc_count_error_upper_bound" : 0, "sum_other_doc_count" : 0, "buckets" : [ { "key" : "music", "doc_count" : 2 }, { "key" : "forestry", "doc_count" : 1 }, { "key" : "sports", "doc_count" : 1 } ] } } }这些数据并无被预先计算好,它们是实时的从匹配查询语句的文档中动态计算生成的。
若是咱们想知道全部姓"Smith"的人最大的共同点(兴趣爱好),咱们只须要增长合适的语句既可:
$ curl -XGET 'http://localhost:9200/megacorp/employee/_search?pretty' -d ' { "query": { "match": { "last_name": "smith" } }, "aggs": { "all_interests": { "terms": { "field": "interests" } } } } 'all_interests聚合已经变成只包含和查询语句相匹配的文档了:
... "all_interests": { "buckets": [ { "key": "music", "doc_count": 2 }, { "key": "sports", "doc_count": 1 } ] }聚合也容许分级汇总。例如,让咱们统计每种兴趣下职员的平均年龄:
$ curl -XGET 'http://localhost:9200/megacorp/employee/_search?pretty' -d ' { "aggs" : { "all_interests" : { "terms" : { "field" : "interests" }, "aggs" : { "avg_age" : { "avg" : { "field" : "age" } } } } } } '虽然此次返回的聚合结果有些复杂,但仍然很容易理解:
... "all_interests": { "buckets": [ { "key": "music", "doc_count": 2, "avg_age": { "value": 28.5 } }, { "key": "forestry", "doc_count": 1, "avg_age": { "value": 35 } }, { "key": "sports", "doc_count": 1, "avg_age": { "value": 25 } } ] }该聚合结果比以前的聚合结果要更加丰富。咱们依然获得了兴趣以及数量(指具备该兴趣的员工人数)的列表,可是如今每一个兴趣额外拥有
avg_age
字段来显示具备该兴趣员工的平均年龄。