Elasticsearch学习记录(入门篇)

Elasticsearch学习记录(入门篇)

一、 Elasticsearch的请求与结果node

请求结构

curl -X<VERB> '<PROTOCOL>://<HOST>:<PORT>/<PATH>?<QUERY_STRING>' -d '<BODY>'
  • VERB HTTP方法:GET, POST, PUT, HEAD, DELETE
  • PROTOCOL http或者https协议(只有在Elasticsearch前面有https代理的时候可用)
  • HOST Elasticsearch集群中的任何一个节点的主机名,若是是在本地的节点,那么就叫localhost
  • PORT Elasticsearch HTTP服务所在的端口,默认为9200
  • PATH API路径(例如_count将返回集群中文档的数量),PATH能够包含多个组件,例如_cluster/stats或者_nodes/stats/jvm
  • QUERY_STRING 一些可选的查询请求参数,例如?pretty参数将使请求返回更加美观易读的JSON数据
    BODY 一个JSON格式的请求主体(若是请求须要的话)

PUT建立(索引建立)

$ curl -XPUT 'http://localhost:9200/megacorp/employee/3?pretty' -d ' 
{
    "first_name" :  "Douglas",
    "last_name" :   "Fir",
    "age" :         35,
    "about":        "I like to build cabinets",
    "interests":  [ "forestry" ]
}
’
{
  "_index" : "megacorp",
  "_type" : "employee",
  "_id" : "3",
  "_version" : 1,
  "_shards" : {
    "total" : 2,
    "successful" : 1,
    "failed" : 0
  },
  "created" : true
}

GET请求(搜索)

检索文档

$ curl -XGET 'http://localhost:9200/megacorp/employee/1?pretty'
{
  "_index" : "megacorp",
  "_type" : "employee",
  "_id" : "1",
  "_version" : 1,
  "found" : true,
  "_source" : {
    "first_name" : "John",
    "last_name" : "Smith",
    "age" : 25,
    "about" : "I love to go rock climbing",
    "interests" : [ "sports", "music" ]
  }
}

简单搜索

使用megacorp索引和employee类型,可是咱们在结尾使用关键字_search来取代原来的文档ID。响应内容的hits数组中包含了咱们全部的三个文档。默认状况下搜索会返回前10个结果。数据库

$ curl -XGET 'http://localhost:9200/megacorp/employee/_search?pretty'
{
  "took" : 2,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "failed" : 0
  },
  "hits" : {
    "total" : 3,
    "max_score" : 1.0,
    "hits" : [ {
      "_index" : "megacorp",
      "_type" : "employee",
      "_id" : "2",
      "_score" : 1.0,
      "_source" : {
        "first_name" : "Jane",
        "last_name" : "Smith",
        "age" : 32,
        "about" : "I like to collect rock albums",
        "interests" : [ "music" ]
      }
    }, {
      "_index" : "megacorp",
      "_type" : "employee",
      "_id" : "1",
      "_score" : 1.0,
      "_source" : {
        "first_name" : "John",
        "last_name" : "Smith",
        "age" : 25,
        "about" : "I love to go rock climbing",
        "interests" : [ "sports", "music" ]
      }
    }, {
      "_index" : "megacorp",
      "_type" : "employee",
      "_id" : "3",
      "_score" : 1.0,
      "_source" : {
        "first_name" : "Douglas",
        "last_name" : "Fir",
        "age" : 35,
        "about" : "I like to build cabinets",
        "interests" : [ "forestry" ]
      }
    } ]
  }
}

接下来,让咱们搜索姓氏中包含“Smith”的员工。咱们将在命令行中使用轻量级的搜索方法。这种方法常被称做查询字符串(query string)搜索,由于咱们像传递URL参数同样去传递查询语句:数组

$ curl -XGET 'http://localhost:9200/megacorp/employee/_search?q=last_name:Smith&pretty'
{
  "took" : 4,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "failed" : 0
  },
  "hits" : {
    "total" : 2,
    "max_score" : 0.30685282,
    "hits" : [ {
      "_index" : "megacorp",
      "_type" : "employee",
      "_id" : "2",
      "_score" : 0.30685282,
      "_source" : {
        "first_name" : "Jane",
        "last_name" : "Smith",
        "age" : 32,
        "about" : "I like to collect rock albums",
        "interests" : [ "music" ]
      }
    }, {
      "_index" : "megacorp",
      "_type" : "employee",
      "_id" : "1",
      "_score" : 0.30685282,
      "_source" : {
        "first_name" : "John",
        "last_name" : "Smith",
        "age" : 25,
        "about" : "I love to go rock climbing",
        "interests" : [ "sports", "music" ]
      }
    } ]
  }
}

使用DSL语句查询

查询字符串搜索便于经过命令行完成特定(ad hoc)的搜索,可是它也有局限性(参阅简单搜索章节)。Elasticsearch提供丰富且灵活的查询语言叫作DSL查询(Query DSL),它容许你构建更加复杂、强大的查询。curl

DSL(Domain Specific Language特定领域语言)以JSON请求体的形式出现。咱们能够这样表示以前关于“Smith”的查询:jvm

$ curl -XGET 'http://localhost:9200/megacorp/employee/_search?pretty' -d ' 
{
    "query" : {
        "match" : {
            "last_name" : "Smith"
        }
    }
}
'

更复杂的搜索

咱们让搜索稍微再变的复杂一些。咱们依旧想要找到姓氏为“Smith”的员工,可是咱们只想获得年龄大于30岁的员工。咱们的语句将添加过滤器(filter),它使得咱们高效率的执行一个结构化搜索:elasticsearch

$ curl -XGET 'http://localhost:9200/megacorp/employee/_search?pretty' -d '
{
    "query" : {
        "filtered" : {
            "filter" : {
                "range" : {
                    "age" : { "gt" : 30 } --<1>
                }
            },
            "query" : {
                "match" : {
                    "last_name" : "smith" --<2>
                }
            }
        }
    }
}
'
  • <1> 这部分查询属于区间过滤器(range filter),它用于查找全部年龄大于30岁的数据——gt为"greater than"的缩写。
  • <2> 这部分查询与以前的match语句(query)一致。
{
  "took" : 2,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "failed" : 0
  },
  "hits" : {
    "total" : 1,
    "max_score" : 0.30685282,
    "hits" : [ {
      "_index" : "megacorp",
      "_type" : "employee",
      "_id" : "2",
      "_score" : 0.30685282,
      "_source" : {
        "first_name" : "Jane",
        "last_name" : "Smith",
        "age" : 32,
        "about" : "I like to collect rock albums",
        "interests" : [ "music" ]
      }
    } ]
  }
}

全文搜索

到目前为止搜索都很简单:搜索特定的名字,经过年龄筛选。让咱们尝试一种更高级的搜索,全文搜索——一种传统数据库很难实现的功能。学习

咱们将会搜索全部喜欢“rock climbing”的员工:ui

$ curl -XGET 'http://localhost:9200/megacorp/employee/_search?pretty' -d '
{
    "query" : {
        "match" : {
            "about" : "rock climbing"
        }
    }
}
'

你能够看到咱们使用了以前的match查询,从about字段中搜索"rock climbing",咱们获得了两个匹配文档:url

{
  "took" : 3,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "failed" : 0
  },
  "hits" : {
    "total" : 2,
    "max_score" : 0.16273327,
    "hits" : [ {
      "_index" : "megacorp",
      "_type" : "employee",
      "_id" : "1",
      "_score" : 0.16273327,<1>
      "_source" : {
        "first_name" : "John",
        "last_name" : "Smith",
        "age" : 25,
        "about" : "I love to go rock climbing",
        "interests" : [ "sports", "music" ]
      }
    }, {
      "_index" : "megacorp",
      "_type" : "employee",
      "_id" : "2",
      "_score" : 0.016878016,<2>
      "_source" : {
        "first_name" : "Jane",
        "last_name" : "Smith",
        "age" : 32,
        "about" : "I like to collect rock albums",
        "interests" : [ "music" ]
      }
    } ]
  }
}
  • <1><2> 结果相关性评分。

默认状况下,Elasticsearch根据结果相关性评分来对结果集进行排序,所谓的「结果相关性评分」就是文档与查询条件的匹配程度。很显然,排名第一的John Smithabout字段明确的写到“rock climbing命令行

可是为何Jane Smith也会出如今结果里呢?缘由是“rock”在她的abuot字段中被说起了。由于只有“rock”被说起而“climbing”没有,因此她的_score要低于John。

短语搜索

目前咱们能够在字段中搜索单独的一个词,这挺好的,可是有时候你想要确切的匹配若干个单词或者短语(phrases)。例如咱们想要查询同时包含"rock"和"climbing"(而且是相邻的)的员工记录。

要作到这个,咱们只要将match查询变动为match_phrase查询便可:

$ curl -XGET 'http://localhost:9200/megacorp/employee/_search?pretty' -d '
{
    "query" : {
        "match_phrase" : {
            "about" : "rock climbing"
        }
    }
}
'
{
  "took" : 16,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "failed" : 0
  },
  "hits" : {
    "total" : 1,
    "max_score" : 0.23013961,
    "hits" : [ {
      "_index" : "megacorp",
      "_type" : "employee",
      "_id" : "1",
      "_score" : 0.23013961,
      "_source" : {
        "first_name" : "John",
        "last_name" : "Smith",
        "age" : 25,
        "about" : "I love to go rock climbing",
        "interests" : [ "sports", "music" ]
      }
    } ]
  }
}

高亮咱们的搜索

不少应用喜欢从每一个搜索结果中高亮(highlight)匹配到的关键字,这样用户能够知道为何这些文档和查询相匹配。在Elasticsearch中高亮片断是很是容易的。

让咱们在以前的语句上增长highlight参数:

$ curl -XGET 'http://localhost:9200/megacorp/employee/_search?pretty' -d '
{
    "query" : {
        "match_phrase" : {
            "about" : "rock climbing"
        }
    },
    "highlight": {
        "fields" : {
            "about" : {}
        }               
    }                   
}        
'

当咱们运行这个语句时,会命中与以前相同的结果,可是在返回结果中会有一个新的部分叫作highlight,这里包含了来自about字段中的文本,而且用<em></em>来标识匹配到的单词。

{
  "took" : 33,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "failed" : 0
  },
  "hits" : {
    "total" : 1,
    "max_score" : 0.23013961,
    "hits" : [ {
      "_index" : "megacorp",
      "_type" : "employee",
      "_id" : "1",
      "_score" : 0.23013961,
      "_source" : {
        "first_name" : "John",
        "last_name" : "Smith",
        "age" : 25,
        "about" : "I love to go rock climbing",
        "interests" : [ "sports", "music" ]
      },
      "highlight" : {
        "about" : [ "I love to go <em>rock</em> <em>climbing</em>" ]
      }
    } ]
  }
}

聚合

分析

最后,咱们还有一个需求须要完成:容许管理者在职员目录中进行一些分析。 Elasticsearch有一个功能叫作聚合(aggregations),它容许你在数据上生成复杂的分析统计。它很像SQL中的GROUP BY可是功能更强大。

$  curl -XGET 'http://localhost:9200/megacorp/employee/_search?pretty' -d '
{
  "aggs": {
    "all_interests": {
      "terms": { "field": "interests" }
    }
  }
}
'

查询结果:

{...
  "aggregations" : {
    "all_interests" : {
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 0,
      "buckets" : [ {
        "key" : "music",
        "doc_count" : 2
      }, {
        "key" : "forestry",
        "doc_count" : 1
      }, {
        "key" : "sports",
        "doc_count" : 1
      } ]
    }
  }
}

这些数据并无被预先计算好,它们是实时的从匹配查询语句的文档中动态计算生成的。

若是咱们想知道全部姓"Smith"的人最大的共同点(兴趣爱好),咱们只须要增长合适的语句既可:

$ curl -XGET 'http://localhost:9200/megacorp/employee/_search?pretty' -d '
{
  "query": {
    "match": {
      "last_name": "smith"
    }
  },
  "aggs": {
    "all_interests": {
      "terms": {
        "field": "interests"
      }
    }
  }
}
'

all_interests聚合已经变成只包含和查询语句相匹配的文档了:

...
  "all_interests": {
     "buckets": [
        {
           "key": "music",
           "doc_count": 2
        },
        {
           "key": "sports",
           "doc_count": 1
        }
     ]
  }

聚合也容许分级汇总。例如,让咱们统计每种兴趣下职员的平均年龄:

$ curl -XGET 'http://localhost:9200/megacorp/employee/_search?pretty' -d '
{
    "aggs" : {
        "all_interests" : {
            "terms" : { "field" : "interests" },
            "aggs" : {
                "avg_age" : {
                    "avg" : { "field" : "age" }
                }
            }
        }
    }
}
'

虽然此次返回的聚合结果有些复杂,但仍然很容易理解:

...
  "all_interests": {
     "buckets": [
        {
           "key": "music",
           "doc_count": 2,
           "avg_age": {
              "value": 28.5
           }
        },
        {
           "key": "forestry",
           "doc_count": 1,
           "avg_age": {
              "value": 35
           }
        },
        {
           "key": "sports",
           "doc_count": 1,
           "avg_age": {
              "value": 25
           }
        }
     ]
  }

该聚合结果比以前的聚合结果要更加丰富。咱们依然获得了兴趣以及数量(指具备该兴趣的员工人数)的列表,可是如今每一个兴趣额外拥有avg_age字段来显示具备该兴趣员工的平均年龄。

相关文章
相关标签/搜索