es原理

时间 2019-11-12

标签原理繁體版

原文原文链接

一：一个请求到达es集群，选中一个coordinate节点之后，会经过请求路由到指定primary shard中，若是分发策略选择为round-robin，若是来4个请求，则2个打到primary shard中2个打到replic shard中。java

二： es在多个shard进行分片但数据倾斜严重的时候有可能会发生搜索score不许的状况，由于IDF分值的计算方法实在shard本地完成的；如shard1中数据较多，在计算某一词搜索时的分值时会致使分值总体降低，而这时shard2中出现的词频较少会总体分值偏高，这样容易致使本来不太相关的内容却变得分值高了起来，从而使排序不许；解决方法就是让多个shard在生产环境中尽可能作到数据均衡分布，这样就不会由于score的本地计算而总体受影响。app

三： es计算分值时有两种策略：ide

1）most-field->默认策略是全文检索的全部关键词，在document的每个field中可匹配的次数越多则分值越高；规则：（每一个match中field匹配分值的和） *（实际document匹配到了字段个数）/（query中match的个数），以下代码：idea

GET /index3/type3/_search
{
  "query": {
    "bool": {
      "should": [
        {
          "match": {
            "title":"spark"//title中可匹配成功
          }
        },
        {
          "match": {
            "content":"java"//content中也可匹配成功
          }
        }
      ]
    }
  }
}

View Code

2）beast-field->若是使用dis_max，document的分值则会根据match中field匹配分值最高的决定，也就是说和其余属性无关spa

GET /index3/type3/_search
{
  "query": {
    "dis_max": {
      "queries": [
        {
          "match": {
            "title": "spark"
          }
        },
        {
          "match": {
            "content": "java"
          }
        }
      ]
    }
  }

View Code

3）es中除了most_fields和beast_fields之外，使用cross_fields的状况仍是比较多的，使用es系统中默认的cross_fields策略实质是将 "fields": ["name","content"]两个字段的内容放到一块儿后创建索引，这样就能经过一个fullField字段进行fullText，使结果更加准确code

搜索参数：
GET /index2/type2/_search
{
  "query": {
    "multi_match": {
      "query": "happening like",
      //query中的搜索词条去content和name两个字段中来匹配，不过会因为两个字段mapping定义不一样致使得分不一样，排序结果可能有差别
      "fields": ["name","content"],
      //best_fields策略是每一个document的得分等于得分最高的match field的值；而匹配出最佳之后，其它document得分未必准确；most_fields根据每一个field的评分计算出ducoment的综合评分
      "type":"cross_fields",
      "operator":"and"
    }
  }
}
结果：
{
  "took": 36,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "failed": 0
  },
  "hits": {
    "total": 3,
    "max_score": 0.84968257,
    "hits": [
      {
        "_index": "index2",
        "_type": "type2",
        "_id": "2",
        "_score": 0.84968257,
        "_source": {
          "num": 10,
          "title": "他的名字",
          "name": "yes happening like write",
          "content": "happening like"
        }
      },
      {
        "_index": "index2",
        "_type": "type2",
        "_id": "4",
        "_score": 0.8164005,
        "_source": {
          "num": 1000,
          "title": "个人名字",
          "name": "happening like write",
          "content": "happening hello like yeas and he happening like had read a lot about happening hello like"
        }
      },
      {
        "_index": "index2",
        "_type": "type2",
        "_id": "3",
        "_score": 0.5063205,
        "_source": {
          "num": 105,
          "title": "这是谁的名字",
          "name": "happening like write",
          "content": " national  treasure because  of its rare number and cute appearance. Many foreign people are so crazy about  pandas and they can’t watching these  lovely creatures all the time. Though some action"
        }
      }
    ]
  }
}

View Code

四：提高全文检索效果的两种方法blog

1) 使用boost提高检索分值排序

GET index3/type3/_search
{
  "query": {
    "bool": {
      "should": [
        {
          "match": {
            "content": {
              "query": "from",
              "boost":5//使用boost将term检索评分提高5倍
            }
          }
        },{
          "match": {
            "content": {
              "query": "foot"//若是不使用boost则搜索foot则会得分较高
            }
          }
        }
      ]
    }
  }
}
结果:
{
  "took": 3,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "failed": 0
  },
  "hits": {
    "total": 3,
    "max_score": 1.3150566,
    "hits": [
      {
        "_index": "index3",
        "_type": "type3",
        "_id": "1",
        "_score": 1.3150566,
        "_source": {
          "date": "2019-01-02",
          "name": "the little",
          "content": "Half the hello book ideas in his talk were plagiarized from an article I wrote last month.",
          "no": "123"
        }
      },
      {
        "_index": "index3",
        "_type": "type3",
        "_id": "5",
        "_score": 1.3114156,
        "_source": {
          "date": "2019-05-01",
          "name": "http litty",
          "content": "There are hello moments in life when you miss book someone so much that you just want to pick them from your dreams",
          "no": "564",
          "description": "描述"
        }
      },
      {
        "_index": "index3",
        "_type": "type3",
        "_id": "3",
        "_score": 0.28582606,
        "_source": {
          "date": "2019-07-01",
          "name": "very tag",
          "content": "Some of our hello  comrades love book to write long articles with no substance, very much like the foot bindings of a slattern, long as well as smelly",
          "no": "123"
        }
      }
    ]
  }
}

View Code

2）使用boosting的positive和negative进行反向筛选，经过设置（negative_boost：0.5）下降分值索引

GET index3/type3/_search
{
  "query": {
    "boosting": {
      //正常匹配的
      "positive": {
        "match": {
          "content": "from"
        }
      },
      //下降分值去匹配的,如下字段的分值乘以negative_boost值
      "negative": {
        "match": {
            "content": {
              "query": "Half"
            }
          }
      },
      "negative_boost": 0.1
    }
  }
}
结果：
{
  "took": 2,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "failed": 0
  },
  "hits": {
    "total": 2,
    "max_score": 0.26228312,
    "hits": [
      {
        "_index": "index3",
        "_type": "type3",
        "_id": "5",
        "_score": 0.26228312,
        "_source": {
          "date": "2019-05-01",
          "name": "http litty",
          "content": "There are hello moments in life when you miss book someone so much that you just want to pick them from your dreams",
          "no": "564",
          "description": "描述"
        }
      },
      {
        "_index": "index3",
        "_type": "type3",
        "_id": "1",
        "_score": 0.026301134,
        "_source": {
          "date": "2019-01-02",
          "name": "the little",
          "content": "Half the hello book ideas in his talk were plagiarized from an article I wrote last month.",
          "no": "123"
        }
      }
    ]
  }
}

View Code