elasticsearch 第二讲

时间 2020-04-04

原文原文链接

elasticsearch 的一些详细用法

term 的用法

term 是不进行分词处理的；
term 不进行大小写转换, 内容默认分词处理的时候为小写
term 不进行算分的计算算法

初始化数据

POST products/_bulk
{"index":{"_id": 1}}
{"productId":"XHDK-A-1293-#fj3", "desc": "iPhone"}
{"index":{"_id":2}}
{"productId":"KDKE-8-9947-kL5", "desc": "iPad"}
{"index":{"_id":"3"}}
{"productId":"JODL-X-1937-#pV7", "desc": "MBP"}

查询数据

POST products/_search
{
    "query":{
        "term":{
            //不能查询到数据
            //"productId": "XHDK"
            //能够查询到数据
            //"productId": "xhdk"
            // 不能查询到数据 缘由就是term不进行分词处理
            //"XHDK-A-1293-#fj3"
            // 能够经过 keyword 方式进行搜索
            //"productId.keyword": "XHDK-A-1293-#fj3"
        }
    }
}

//查看 XHDK-A-1293-#fj3 如何进行分词也会知道上面查询不到的缘由

POST _analyze
{
  "analyzer": "standard"
  , "text": ["XHDK-A-1293-#fj3"]
}

不进行算分的计算

POST products/_search
{
    "query":{
        "constant_score":{
            "filter":{
                "term": {
                  "productId.keyword": {
                    "value": "XHDK-A-1293-#fj3"
                  }
                }
            }
        }
    }
}

Query & Filtering 多字符字段查询

POST products/_bulk
{"index": {"_id": "1"}}
{"price": "10", "avaliable": true, "date": "2018-01-01", "productID": "XHDK-A-1293-#fJ3"}
{"index": {"_id":"2"}}
{"price":"20","avaliable":true,"date": "2019-01-01","productID": "KDKE-B-9947-#kL5"}
{"index": {"_id": "3"}}
{"price": "30", "avaliable": false, "productID": "JODL-X-1937-#pV7"}
{"index": {"_id": "4"}}
{"price": "30", "avaliable": false, "productID": "QQPX-R-3956-#aD8"}
{"index": {"_id": "5"}}
{"price": "40", "avaliable": false, "genre":"Comedy"}
{"index":{"_id":6}}
{"price":"50","genre":["Comedy","Romance"]}

范围查找查找 price 大于20 小于等于30的数据；同上能够添加不进行分数计算用 constant_scoreapp

POST products/_search
{
	"query": {
		"range":{
			"price":{
				"gt": 20,
				"lte": 30
			}
		}
	}
}

日期范围的查找同上price的方式;dom

日期计算的方式 y 年 M月 d日 h时 m分 s秒 w周获取 now -1y 表示获取的是去年今天的日期elasticsearch

不进行分数计算提升效率参考 constant_score 的用法；post

POST products/_search
{
	"query":{
		"range":{
			"date": {
				"gt": "now-1y"
			}
		}
	}
}

查看存在某一个字段的数据 existsui

不进行分数计算提升效率参考 constant_score 的用法；code

POST products/_search
{
	"query":{
		"exists":{
			"field": "date"
		}
	}
}

对于多字段 genre 表示的是包含，而不是获取完整的值;blog

例以下面的语句，返回的就是两条数据排序

若是想返回一条数据，能够添加字段 genre_count 字段，表示对应的长度，添加冗余字段ip

POST products/_search
{
	"query":{
		"term":{
			"genre.keyword": "Comedy"
		}
	}
}

添加genre_count 字段语句为

{"index": {"_id": "5"}}
{"price": "40", "avaliable": false, "genre":"Comedy", "genre_count": 1}
{"index":{"_id":6}}
{"price":"50","genre":["Comedy","Romance"], "genre_count": 2}

查询符合条件的数据

POST product/_search
{
	"query": {
		"bool":{
			"must":[
				{
          "term":{
            "genre.keyword": {
              "value": "Comedy"
            }
          }
        },
        {
          "term":{
            "genre_count": {
              "value": "1"
            }
          }
        }
			]
		}
	}
}

es 算法分为 TF-IDF 算法，表示的是 TF(文字) IDF(搜索文字所在的文档数据/总文档数)

es 新算法为 BM25

should 和must 同一级别混合使用的时候，should不会生效，可是能够嵌套使用

should 表示或者 must 表示必定匹配 filter表示的是必定匹配（可是不会计算分数）must_not 表示的是必定不匹配（不会计算分数）

查询价格为 30 不能够访问同时价格为大于等于10的数据

POST product/_search
{
	"query": {
		"bool": {
			"must": [
				{"term": {
					"price": {
						"value": "30"
					}
				}}
			],
			"filter": [
        {
        	"term": {
          "avaliable": "false"
         }
        }
			],
			"must_not": [
        {"range": {
          "price": {
            "lte": 10
          }
        }}
      ]
		}
	}
}

更改计算分数的方式，

boost 权重值 boost > 1 对打分提升， 1 > boost > 0 打分的权重相对低，当 boost < 0 贡献为负分

当前为不一样的字段设置的权重值

POST blogs/_bulk
{"index": {"_id":"1"}}
{"title": "Apple iPad", "content": "Apple iPad,Apple iPad"}
{"index": {"_id":"2"}}
{"title": "Apple iPad,Apple iPad", "content": "Apple iPad"}

//match 支持 query和bost的使用，bost表示的是当前的权重值
POST blogs/_search
{
  "query": {
    "bool": {
      "must": [
        {
          "match": {
            "title": {
              "query": "apple,ipad",
              "boost": 1
            }
          }
        },
        {
          "match": {
            "content": {
              "query": "apple,ipad",
              "boost": 4
            }
          }
        }
      ]
    }
  }
}

同一个字段设置权重值

这里设置的是 nagative_boosting 的权重值，若是 > 1 表示排前若是为 1 > a > 0 表示为排后，若是为负数报错

POST news/_bulk
{"index":{"_id": "1"}}
{"conent": "Apple mac"}
{"index":{"_id": "2"}}
{"conent": "Apple iPad"}
{"index":{"_id": "3"}}
{"conent": "Apple employee like Apple Pipe and Apple juice"}

POST news/_search
{
	"query":{
		"boosting": {
			"positive":{
				"bool": {
					"must": [
						{
							"match": {
								"content": "apple"
							}
						}
					]
				}
			},
			"negative": {
				"match": {
					"content": "pip"
				}
			},
			"negative_boosting": 0.1
		}
	}
}

dis_max 的用法；也是对排序上的一个用法

一、以下两条数据，若是使用普通查询 title 和 body 包含 brown fox语句的查询，则1 数据在前 2数据在后，缘由是排序的规则是全部符合字段的累加（例如：eg1）， 1数据title和body中都包含brown 1数据只有body中包含 brown fox，对于这种想让2数据排序靠前，则须要使用dis_max

二、使用dis_max 表示的是字段中按照排分最高的字段进行排序（例如：eg2）

三、使用dis_max 表示的是字段中按照最高分字段进行排序的，例如 Quick pets 则两个数据中都包含（例如：eg3），按照最高字段评分比对，则当前按照的是 title字段进行比对的（能够经过explain: true查看）,则两个数据的评分同样，第一条匹配的是 quick 第二条匹配的是 pets，因此能够经过添加字段 tie_breaker ；

四、使用tie_breaker 表示的是获取最佳匹配语句评分的_score，其余语句的评分与 tie_breaker 相乘得数的分，二者评分求和并规范化；得出的结果（例如：eg4）

PUT blogs/_doc/1
{
  "title": "Quick brown rabbits",
  "body": "Brown rabbits are commonly seen."
}
PUT blogs/_doc/2
{
  "title": "Keeping pets healthy",
  "body": "My quick brown fox eats rabbits on a regular basis"
}

eg1

POST blogs/_search
{
	"query": {
		"bool": {
			"must": [
				{"match": {"title":" brown fox"}},
				{"match": {"body":" brown fox"}}
			]
		}
	}
}

eg2

POST blogs/_search
{
	"query": {
		"dis_max": {
			"queries": [
				{"match": {"title":" quick fox"}},
				{"match": {"body":" quick fox"}}
			]
		}
	}
}

eg3

POST blogs/_search
{
  "query": {
    "dis_max": {
      "queries": [
        {"match": { "title": "Quick pets"}},
        {"match": { "body": "Quick pets" }}
        ]
    }
  }
}

eg4

POST blogs/_search
{
  "query": {
    "dis_max": {
      "queries": [
        {"match": {"title": "Quick pets"}},
        {"match": {"body": "Quick pets"}}
        ]
        , "tie_breaker": 0.7
    }
  }
}

最佳字段多数字段混合字段三种场景

优先设置 title的mapping，保证字段使用的是 english 分词器，同时设置它的 title.std 分词器为 standard

PUT title
{
  "mappings": {
    "properties": {
      "title": {
        "type": "text",
        "analyzer": "english",
        "fields": {
          "std": {
            "type": "text",
            "analyzer": "standard"
          }
        }
      }
    }
  }
}

POST title/_bulk
{"index": {"_id": 1}}
{"title": "My dog barks"}
{"index": {"_id": 2}}
{"title": "I see a lot of barking dogs on the road"}

best_field 表示的是最佳字段匹配默认

当前顺序为 1 2，由于使用的是 english分词器，分词的结果都是原生的词，能够经过 analayzer 查看分词结果；他们匹配的内容都为 dog 和 bark ，当前最短则分数最高；

POST title/_search
{
	"query": {
		"multi_match": {
			"query": "barking dogs",
			"fields": ["title"],
			"type": "best_fields"
		}
	}
}

most_field 表示的是多个字段分数的累加，使用 operator ，它表示的是单个字段中同时包含query信息

当前是从 title.std 和 title中查找符合 query的数据，由于std为 standard 的分词方式，则保留最原始的词内容，因此这个顺序查找的是 2 1；

POST title/_search
{
  "query": {
    "multi_match": {
      "query": "barking dogs",
      "fields": ["title.std", "title"],
      "type": "most_fields"
    }
  }
}

cross_field 查找的是多个字段都总和包含对应的query信息，而不是仅仅单个字段中包含信息，表示的是多个字段分数累加，可是能够添加operator，表示在全部字段中都包含query信息，至关月copy to，把全部字段的值放到一块儿，而后从中查找是否符合，可是比copy to节省空间

PUT address/_doc/1
{
  "street": "5 Poland Street",
  "city": "London",
  "country": "United Kingdom",
  "postcode": "W1V 3DG"
}


POST address/_search
{
  "query": {
    "multi_match": {
      "query": "Poland Street W1V",
      "operator": "and", 
      "fields": ["street", "city", "country", "postcode"]
      , "type": "cross_fields"
    }
  }
}

elasticsearch 第二讲

elasticsearch 的一些详细用法

term 的用法

初始化数据

查询数据

不进行算分的计算

Query & Filtering 多字符字段查询

es 算法分为 TF-IDF 算法，表示的是 TF(文字) IDF(搜索文字所在的文档数据/总文档数)

es 新算法为 BM25

查询 价格为 30 不能够访问 同时价格为大于等于10的数据

dis_max 的用法；也是对排序上的一个用法

eg1

eg2

eg3

eg4

最佳字段 多数字段 混合字段三种场景

查询价格为 30 不能够访问同时价格为大于等于10的数据

最佳字段多数字段混合字段三种场景