百度:咱们好比说想找寻任何的信息的时候,就会上百度去搜索一下,好比说找一部本身喜欢的电影,或者说找一本喜欢的书,或者找一条感兴趣的新闻(提到搜索的第一印象),百度 != 搜索java
垂直搜索(站内搜索)node
互联网的搜索:电商网站,招聘网站,新闻网站,各类app算法
IT系统的搜索:OA软件,办公自动化软件,会议管理,日程管理,项目管理,员工管理,搜索“张三”,“张三儿”,“张小三”;有个电商网站,卖家,后台管理系统,搜索“牙膏”,订单,“牙膏相关的订单”数据库
搜索,就是在任何场景下,找寻你想要的信息,这个时候,会输入一段你要搜索的关键字,而后就指望找到这个关键字相关的有些信息json
作软件开发的话,或者对IT、计算机有必定的了解的话,都知道,数据都是存储在数据库里面的,好比说电商网站的商品信息,招聘网站的职位信息,新闻网站的新闻信息,等等吧。因此说,很天然的一点,若是说从技术的角度去考虑,如何实现如说,电商网站内部的搜索功能的话,就能够考虑,去使用数据库去进行搜索。windows
用数据库来实现搜索,是不太靠谱的。一般来讲,性能会不好的。api
全文检索:倒排索引服务器
lucene:就是一个jar包,里面包含了封装好的各类创建倒排索引,以及进行搜索的代码,包括各类算法。咱们就用java开发的时候,引入lucene jar,而后基于lucene的api进行去进行开发就能够了。用lucene,咱们就能够去将已有的数据创建索引,lucene会在本地磁盘上面,给咱们组织索引的数据结构。另外的话,咱们也能够用lucene提供的一些功能和api来针对磁盘上额restful
Elasticsearch 是一个分布式、RESTful 风格的搜索和数据分析引擎,可以解决不断涌现出的各类用例。 做为 Elastic Stack 的核心,它集中存储您的数据,帮助您发现意料之中以及意料以外的状况。网络
搜索:百度,网站的站内搜索,IT系统的检索
数据分析:电商网站,最近7天牙膏这种商品销量排名前10的商家有哪些;新闻网站,最近1个月访问量排名前3的新闻版块是哪些
分布式,搜索,数据分析
全文检索:我想搜索商品名称包含牙膏的商品,select * from products where product_name like "%牙膏%"
结构化检索:我想搜索商品分类为日化用品的商品都有哪些,select * from products where category_id='日化用品'
部分匹配、自动完成、搜索纠错、搜索推荐
数据分析:咱们分析每个商品分类下有多少个商品,select category_id,count(*) from products group by category_id
分布式:ES自动能够将海量数据分散到多台服务器上去存储和检索
海联数据的处理:分布式之后,就能够采用大量的服务器去存储和检索数据,天然而然就能够实现海量数据的处理了
近实时:检索个数据要花费1小时(这就不要近实时,离线批处理,batch-processing);在秒级别对数据进行搜索和分析
跟分布式/海量数据相反的:lucene,单机应用,只能在单台服务器上使用,最多只能处理单台服务器能够处理的数据量
(1)能够做为一个大型分布式集群(数百台服务器)技术,处理PB级数据,服务大公司;也能够运行在单机上,服务小公司
(2)Elasticsearch不是什么新技术,主要是将全文检索、数据分析以及分布式技术,合并在了一块儿,才造成了独一无二的ES;lucene(全文检索),商用的数据分析软件(也是有的),分布式数据库(mycat)
(3)对用户而言,是开箱即用的,很是简单,做为中小型的应用,直接3分钟部署一下ES,就能够做为生产环境的系统来使用了,数据量不大,操做不是太复杂
(4)数据库的功能面对不少领域是不够用的(事务,还有各类联机事务型的操做);特殊的功能,好比全文检索,同义词处理,相关度排名,复杂数据分析,海量数据的近实时处理;Elasticsearch做为传统数据库的一个补充,提供了数据库所不不能提供的不少功能
(1)分布式的文档存储引擎
(2)分布式的搜索引擎和分析引擎
(3)分布式,支持PB级数据
// name: node名称 // cluster_name: 集群名称(默认的集群名称就是elasticsearch) // version.number: 5.2.0,es版本号 { name: "1LdqLFq", cluster_name: "elasticsearch", cluster_uuid: "5pqT0Q_XQky6GKjSiFgilA", version: { number: "5.2.0", build_hash: "24e05b9", build_date: "2017-01-24T19:52:35.800Z", build_snapshot: false, lucene_version: "6.4.0" }, tagline: "You Know, for Search" }
应用系统的数据结构都是面向对象的,复杂的
有一个电商网站,须要为其基于ES构建一个后台系统,提供如下功能:
GET /_cat/health?v
epoch timestamp cluster status node.total node.data shards pri relo init unassign pending_tasks max_task_wait_time active_shards_percent 1566094709 10:18:29 elasticsearch yellow 1 1 1 1 0 0 1 0 - 50.0%
如何快速了解集群的健康情况?green、yellow、red?
green:每一个索引的primary shard和replica shard都是active状态的
yellow:每一个索引的primary shard都是active状态的,可是部分replica shard不是active状态,处于不可用的状态
red:不是全部索引的primary shard都是active状态的,部分索引有数据丢失了
为何如今会处于一个yellow状态?
咱们如今就一个笔记本电脑,就启动了一个es进程,至关于就只有一个node。
如今es中有一个index,就是kibana本身内置创建的index。
因为默认的配置是给每一个index分配5个primary shard和5个replica shard,并且primary shard和replica shard不能在同一台机器上(为了容错)。
如今kibana本身创建的index是1个primary shard和1个replica shard。
当前就一个node,因此只有1个primary shard被分配了和启动了,可是一个replica shard没有第二台机器去启动。
作一个小实验:此时只要启动第二个es进程,就会在es集群中有2个node,而后那1个replica shard就会自动分配过去,而后cluster status就会变成green状态。
GET _cat/indices?v
health status index uuid pri rep docs.count docs.deleted store.size pri.store.size yellow open .kibana xpiNHK4UQb2569AzgiveSw 1 1 1 0 3.1kb 3.1kb
PUT /test_index?pretty
DELETE /test_index?pretty
语法:
PUT /index/type/id { "json数据" }
示例:
PUT /ecommerce/product/1 { "name" : "gaolujie yagao", "desc" : "gaoxiao meibai", "price" : 30, "producer" : "gaolujie producer", "tags": [ "meibai", "fangzhu" ] } { "_index": "ecommerce", "_type": "product", "_id": "1", "_version": 1, "result": "created", "_shards": { "total": 2, "successful": 1, "failed": 0 }, "created": true } PUT /ecommerce/product/2 { "name" : "jiajieshi yagao", "desc" : "youxiao fangzhu", "price" : 25, "producer" : "jiajieshi producer", "tags": [ "fangzhu" ] } PUT /ecommerce/product/3 { "name" : "zhonghua yagao", "desc" : "caoben zhiwu", "price" : 40, "producer" : "zhonghua producer", "tags": [ "qingxin" ] }
es会自动创建index和type,不须要提早建立,并且es默认会对document每一个field都创建倒排索引,让其能够被搜索
语法:
GET /index/type/id
示例:
GET /ecommerce/product/1 { "_index": "ecommerce", "_type": "product", "_id": "1", "_version": 1, "found": true, "_source": { "name": "gaolujie yagao", "desc": "gaoxiao meibai", "price": 30, "producer": "gaolujie producer", "tags": [ "meibai", "fangzhu" ] } }
语法:
PUT /index/type/id { "json数据" }
示例:
PUT /ecommerce/product/1 { "name" : "jiaqiangban gaolujie yagao", "desc" : "gaoxiao meibai", "price" : 30, "producer" : "gaolujie producer", "tags": [ "meibai", "fangzhu" ] } { "_index": "ecommerce", "_type": "product", "_id": "1", "_version": 2, "result": "updated", "_shards": { "total": 2, "successful": 1, "failed": 0 }, "created": false }
替换方式有一个很差,即便必须带上全部的field,才能去进行信息的修改(意思是会所有覆盖)
语法:
POST /index/type/id/_update { "json数据" }
示例:
POST /ecommerce/product/1/_update { "doc": { "name": "jiaqiangban gaolujie yagao" } } { "_index": "ecommerce", "_type": "product", "_id": "1", "_version": 8, "result": "updated", "_shards": { "total": 2, "successful": 1, "failed": 0 } }
语法:
DELETE /index/type/id
示例:
DELETE /ecommerce/product/1 { "found": true, "_index": "ecommerce", "_type": "product", "_id": "1", "_version": 9, "result": "deleted", "_shards": { "total": 2, "successful": 1, "failed": 0 } }
搜索所有商品:
GET /ecommerce/product/_search
{ "took": 3, "timed_out": false, "_shards": { "total": 5, "successful": 5, "failed": 0 }, "hits": { "total": 3, "max_score": 1, "hits": [ { "_index": "ecommerce", "_type": "product", "_id": "2", "_score": 1, "_source": { "name": "jiajieshi yagao", "desc": "youxiao fangzhu", "price": 25, "producer": "jiajieshi producer", "tags": [ "fangzhu" ] } }, { "_index": "ecommerce", "_type": "product", "_id": "1", "_score": 1, "_source": { "name": "gaolujie yagao", "desc": "gaoxiao meibai", "price": 30, "producer": "gaolujie producer", "tags": [ "meibai", "fangzhu" ] } }, { "_index": "ecommerce", "_type": "product", "_id": "3", "_score": 1, "_source": { "name": "zhonghua yagao", "desc": "caoben zhiwu", "price": 40, "producer": "zhonghua producer", "tags": [ "qingxin" ] } } ] } }
took:耗费了几毫秒
timed_out:是否超时,这里是没有
_shards:数据拆成了5个分片,因此对于搜索请求,会打到全部的primary shard(或者是它的某个replica shard也能够)
hits.total:查询结果的数量,3个document
hits.max_score:score的含义,就是document对于一个search的相关度的匹配分数,越相关,就越匹配,分数也高
hits.hits:包含了匹配搜索的document的详细数据
搜索商品名称中包含yagao的商品,并且按照售价降序排序
GET /ecommerce/product/_search?q=name:yagao&sort=price:desc
query string search的由来,由于search参数都是以http请求的query string来附带的
适用于临时的在命令行使用一些工具,好比curl,快速的发出请求,来检索想要的信息;可是若是查询请求很复杂,是很难去构建的
在生产环境中,几乎不多使用query string search
DSL:Domain Specified Language,特定领域的语言
优势:更加适合生产环境的使用,能够构建复杂的查询
http request body:请求体,能够用json的格式来构建查询语法,比较方便,能够构建各类复杂的语法,比query string search确定强大多了
GET /ecommerce/product/_search { "query": { "match_all": {} } }
GET /ecommerce/product/_search { "query": { "match": { "name": "yagao" } }, "sort": [ { "price": { "order": "desc" } } ] }
分页查询商品,总共3条商品,假设每页就显示1条商品,如今显示第2页,因此就查出来第2个商品
GET /ecommerce/product/_search { "query": { "match_all": {} }, "from": 1, "size": 1 }
GET /ecommerce/product/_search { "query": { "match_all": {} }, "_source": ["name","price"] }
搜索商品名称包含yagao,并且售价大于25元的商品
GET /ecommerce/product/_search { "query": { "bool": { "must": { "match": { "name": "yagao" } }, "filter": { "range": { "price": { "gt": 25 } } } } } }
新增测试数据
PUT /ecommerce/product/4 { "name":"special yagao", "desc":"special meibai", "price":50, "producer":"special yagao producer", "tags":["meibai"] }
全文模糊检索
GET /ecommerce/product/_search { "query" : { "match" : { "producer" : "yagao producer" } } }
{ "took": 4, "timed_out": false, "_shards": { "total": 5, "successful": 5, "failed": 0 }, "hits": { "total": 4, "max_score": 0.70293105, "hits": [ { "_index": "ecommerce", "_type": "product", "_id": "4", "_score": 0.70293105, "_source": { "name": "special yagao", "desc": "special meibai", "price": 50, "producer": "special yagao producer", "tags": [ "meibai" ] } }, { "_index": "ecommerce", "_type": "product", "_id": "1", "_score": 0.25811607, "_source": { "name": "gaolujie yagao", "desc": "gaoxiao meibai", "price": 30, "producer": "gaolujie producer", "tags": [ "meibai", "fangzhu" ] } }, { "_index": "ecommerce", "_type": "product", "_id": "3", "_score": 0.25811607, "_source": { "name": "zhonghua yagao", "desc": "caoben zhiwu", "price": 40, "producer": "zhonghua producer", "tags": [ "qingxin" ] } }, { "_index": "ecommerce", "_type": "product", "_id": "2", "_score": 0.1805489, "_source": { "name": "jiajieshi yagao", "desc": "youxiao fangzhu", "price": 25, "producer": "jiajieshi producer", "tags": [ "fangzhu" ] } } ] } }
跟全文检索相对应,相反,全文检索会将输入的搜索串拆解开来,去倒排索引里面去一一匹配,只要能匹配上任意一个拆解后的单词,就能够做为结果返回
phrase search,要求输入的搜索串,必须在指定的字段文本中,彻底包含如出一辙的,才能够算匹配,才能做为结果返回
GET /ecommerce/product/_search { "query" : { "match_phrase" : { "producer" : "yagao producer" } } }
{ "took": 7, "timed_out": false, "_shards": { "total": 5, "successful": 5, "failed": 0 }, "hits": { "total": 1, "max_score": 0.70293105, "hits": [ { "_index": "ecommerce", "_type": "product", "_id": "4", "_score": 0.70293105, "_source": { "name": "special yagao", "desc": "special meibai", "price": 50, "producer": "special yagao producer", "tags": [ "meibai" ] } } ] } }
GET /ecommerce/product/_search { "query" : { "match" : { "producer" : "producer" } }, "highlight": { "fields" : { "producer" : {} } } }
{ "took": 15, "timed_out": false, "_shards": { "total": 5, "successful": 5, "failed": 0 }, "hits": { "total": 4, "max_score": 0.25811607, "hits": [ { "_index": "ecommerce", "_type": "product", "_id": "1", "_score": 0.25811607, "_source": { "name": "gaolujie yagao", "desc": "gaoxiao meibai", "price": 30, "producer": "gaolujie producer", "tags": [ "meibai", "fangzhu" ] }, "highlight": { "producer": [ "gaolujie <em>producer</em>" ] } }, { "_index": "ecommerce", "_type": "product", "_id": "3", "_score": 0.25811607, "_source": { "name": "zhonghua yagao", "desc": "caoben zhiwu", "price": 40, "producer": "zhonghua producer", "tags": [ "qingxin" ] }, "highlight": { "producer": [ "zhonghua <em>producer</em>" ] } }, { "_index": "ecommerce", "_type": "product", "_id": "2", "_score": 0.1805489, "_source": { "name": "jiajieshi yagao", "desc": "youxiao fangzhu", "price": 25, "producer": "jiajieshi producer", "tags": [ "fangzhu" ] }, "highlight": { "producer": [ "jiajieshi <em>producer</em>" ] } }, { "_index": "ecommerce", "_type": "product", "_id": "4", "_score": 0.14638957, "_source": { "name": "special yagao", "desc": "special meibai", "price": 50, "producer": "special yagao producer", "tags": [ "meibai" ] }, "highlight": { "producer": [ "special yagao <em>producer</em>" ] } } ] } }
//将文本field的fielddata属性设置为true PUT /ecommerce/_mapping/product { "properties": { "tags": { "type": "text", "fielddata": true } } } // 聚合计算 GET /ecommerce/product/_search { "aggs": { "group_by_tags": { "terms": { "field": "tags" } } } }
{ "took": 20, "timed_out": false, "_shards": { "total": 5, "successful": 5, "failed": 0 }, "hits": { "total": 4, "max_score": 1, "hits": [ { "_index": "ecommerce", "_type": "product", "_id": "2", "_score": 1, "_source": { "name": "jiajieshi yagao", "desc": "youxiao fangzhu", "price": 25, "producer": "jiajieshi producer", "tags": [ "fangzhu" ] } }, { "_index": "ecommerce", "_type": "product", "_id": "4", "_score": 1, "_source": { "name": "special yagao", "desc": "special meibai", "price": 50, "producer": "special yagao producer", "tags": [ "meibai" ] } }, { "_index": "ecommerce", "_type": "product", "_id": "1", "_score": 1, "_source": { "name": "gaolujie yagao", "desc": "gaoxiao meibai", "price": 30, "producer": "gaolujie producer", "tags": [ "meibai", "fangzhu" ] } }, { "_index": "ecommerce", "_type": "product", "_id": "3", "_score": 1, "_source": { "name": "zhonghua yagao", "desc": "caoben zhiwu", "price": 40, "producer": "zhonghua producer", "tags": [ "qingxin" ] } } ] }, "aggregations": { "group_by_tags": { "doc_count_error_upper_bound": 0, "sum_other_doc_count": 0, "buckets": [ { "key": "fangzhu", "doc_count": 2 }, { "key": "meibai", "doc_count": 2 }, { "key": "qingxin", "doc_count": 1 } ] } } }
不返回hit信息
GET /ecommerce/product/_search { "size": 0, "aggs": { "all_tags": { "terms": { "field": "tags" } } } }
{ "took": 20, "timed_out": false, "_shards": { "total": 5, "successful": 5, "failed": 0 }, "hits": { "total": 4, "max_score": 0, "hits": [] }, "aggregations": { "group_by_tags": { "doc_count_error_upper_bound": 0, "sum_other_doc_count": 0, "buckets": [ { "key": "fangzhu", "doc_count": 2 }, { "key": "meibai", "doc_count": 2 }, { "key": "qingxin", "doc_count": 1 } ] } } }
GET /ecommerce/product/_search { "query": { "match": { "name": "yagao" } }, "size": 0, "aggs": { "group_by_tags": { "terms": { "field": "tags" } } } }
{ "took": 2, "timed_out": false, "_shards": { "total": 5, "successful": 5, "failed": 0 }, "hits": { "total": 4, "max_score": 0, "hits": [] }, "aggregations": { "group_by_tags": { "doc_count_error_upper_bound": 0, "sum_other_doc_count": 0, "buckets": [ { "key": "fangzhu", "doc_count": 2 }, { "key": "meibai", "doc_count": 2 }, { "key": "qingxin", "doc_count": 1 } ] } } }
GET /ecommerce/product/_search { "size": 0, "aggs": { "group_by_tags": { "terms": { "field": "tags" }, "aggs": { "avg_price": { "avg": { "field": "price" } } } } } }
{ "took": 3, "timed_out": false, "_shards": { "total": 5, "successful": 5, "failed": 0 }, "hits": { "total": 4, "max_score": 0, "hits": [] }, "aggregations": { "group_by_tags": { "doc_count_error_upper_bound": 0, "sum_other_doc_count": 0, "buckets": [ { "key": "fangzhu", "doc_count": 2, "avg_price": { "value": 27.5 } }, { "key": "meibai", "doc_count": 2, "avg_price": { "value": 40 } }, { "key": "qingxin", "doc_count": 1, "avg_price": { "value": 40 } } ] } } }
GET /ecommerce/product/_search { "size": 0, "aggs": { "group_by_tags": { "terms": { "field": "tags", "order": { "avg_price": "desc" } }, "aggs": { "avg_price": { "avg": { "field": "price" } } } } } }
{ "took": 1, "timed_out": false, "_shards": { "total": 5, "successful": 5, "failed": 0 }, "hits": { "total": 4, "max_score": 0, "hits": [] }, "aggregations": { "group_by_tags": { "doc_count_error_upper_bound": 0, "sum_other_doc_count": 0, "buckets": [ { "key": "meibai", "doc_count": 2, "avg_price": { "value": 40 } }, { "key": "qingxin", "doc_count": 1, "avg_price": { "value": 40 } }, { "key": "fangzhu", "doc_count": 2, "avg_price": { "value": 27.5 } } ] } } }
GET /ecommerce/product/_search { "size": 0, "aggs":{ "group_by_price":{ "range": { "field": "price", "ranges": [ { "from": 0, "to": 20 },{ "from": 20, "to": 40 },{ "from": 40, "to": 50 } ] }, "aggs": { "group_by_tags": { "terms": { "field": "tags" }, "aggs": { "average_price": { "avg": { "field": "price" } } } } } } } }
{ "took": 3, "timed_out": false, "_shards": { "total": 5, "successful": 5, "failed": 0 }, "hits": { "total": 4, "max_score": 0, "hits": [] }, "aggregations": { "group_by_price": { "buckets": [ { "key": "0.0-20.0", "from": 0, "to": 20, "doc_count": 0, "group_by_tags": { "doc_count_error_upper_bound": 0, "sum_other_doc_count": 0, "buckets": [] } }, { "key": "20.0-40.0", "from": 20, "to": 40, "doc_count": 2, "group_by_tags": { "doc_count_error_upper_bound": 0, "sum_other_doc_count": 0, "buckets": [ { "key": "fangzhu", "doc_count": 2, "average_price": { "value": 27.5 } }, { "key": "meibai", "doc_count": 1, "average_price": { "value": 30 } } ] } }, { "key": "40.0-50.0", "from": 40, "to": 50, "doc_count": 1, "group_by_tags": { "doc_count_error_upper_bound": 0, "sum_other_doc_count": 0, "buckets": [ { "key": "qingxin", "doc_count": 1, "average_price": { "value": 40 } } ] } } ] } } }