正确理解和使用 Mongodb 的索引

在 Mongodb 典型的数据库查询场景中，索引 index 扮演着很是重要的做用，若是没有索引，MongoDB 须要为了找到一个匹配的文档而扫描整个 collection，代价很是高昂。mongodb

Mongodb 的索引使用的 B-tree 这一特殊的数据结构，借助索引 Mongodb 能够高效的匹配到须要查询的数据，如下图来为例(来自官方)：shell

score 索引不但能够高效的支持 range 查询，此外也可让 MongoDB 高效地返回排序以后的数据。数据库

Mongodb 的索引同其它数据库系统很类似，Mongodb 的索引是定义在 collection 级别的，支持对任何单个 field 以及任何 sub-field 创建索引。bash

默认的 `_id` index

Mongodb 在 collection 建立时会默认创建一个基于_id的惟一性索引做为 document 的 primary key，这个 index 没法被删除。markdown

Mongodb 支持多种方式建立索引，具体建立方式见官方文档 https://docs.mongodb.com/manual/indexes/#create-an-index数据结构

Single field index

Single field index 是 Mongodb 最简单的索引类型，不一样于 MySQL，MongoDB 的索引是有顺序 ascending或 descending。性能

可是对于 single field index 来讲，索引的顺序可有可无，由于 MongoDB 支持任意顺序遍历 single field index。spa

在此建立一个 records collection：code

{
  "_id": ObjectId("570c04a4ad233577f97dc459"),
  "score": 1034,
  "location": { state: "NY", city: "New York" }
}
复制代码

而后建立一个 single field index：orm

db.records.createIndex( { score: 1 } )
复制代码

上面的语句在 collection 的 score field 上建立了一个 ascending 索引，这个索引支持如下查询：

db.records.find( { score: 2 } )
db.records.find( { score: { $gt: 10 } } )
复制代码

可使用 MongoDB 的 explain 来对以上两个查询进行分析：

db.records.find({score:2}).explain('executionStats')
复制代码

single index on embedded field

此外 MongoDB 还支持对 embedded field 进行索引建立：

db.records.createIndex( { "location.state": 1 } )
复制代码

上面的 embedded index 支持如下查询：

db.records.find( { "location.state": "CA" } )
db.records.find( { "location.city": "Albany", "location.state": "NY" } )
复制代码

sort on single index

对于 single index 来讲，因为 MongoDB index 自己支持顺序查找，因此对于single index 来讲

db.records.find().sort( { score: 1 } )
db.records.find().sort( { score: -1 } )
db.records.find({score:{$lte:100}}).sort( { score: -1 } )
复制代码

这些查询语句都是知足使用 index 的。

Compound index

Mongodb 支持对多个 field 创建索引，称之为 compound index。Compound index 中 field 的顺序对索引的性能有相当重要的影响，好比索引 {userid:1, score:-1} 首先根据 userid 排序，而后再在每一个 userid 中根据 score 排序。

建立 Compound index

在此建立一个 products collection：

{
 "_id": ObjectId(...),
 "item": "Banana",
 "category": ["food", "produce", "grocery"],
 "location": "4th Street Store",
 "stock": 4,
 "type": "cases"
}
复制代码

而后建立一个 compound index：

db.products.createIndex( { "item": 1, "stock": 1 } )
复制代码

这个 index 引用的 document 首先会根据 item 排序，而后在每一个 item 中，又会根据 stock 排序，如下语句都知足该索引：

db.products.find( { item: "Banana" } )
db.products.find( { item: "Banana", stock: { $gt: 5 } } )
复制代码

条件 {item: "Banana"} 知足是由于这个 query 知足 prefix 原则。

使用 compound 索引须要知足 prefix 原则

Index prefix 是指 index fields 的左前缀子集，考虑如下索引：

{ "item": 1, "location": 1, "stock": 1 }
复制代码

这个索引包含如下 index prefix：

{ item: 1 }
{ item: 1, location: 1 }
复制代码

因此只要语句知足 index prefix 原则都是能够支持使用 compound index 的：

db.products.find( { item: "Banana" } )
db.products.find( { item: "Banana",location:"4th Street Store"} )
db.products.find( { item: "Banana",location:"4th Street Store",stock:4})
复制代码

相反若是不知足 index prefix 则没法使用索引，好比如下 field 的查询：

the location field
the stock field
the location and stock fields

因为 index prefix 的存在，若是一个 collection 既有 {a:1, b:1} 索引，也有 {a:1} 索引，若是两者没有稀疏或者惟一性的要求，single index 是能够移除的。

Sort on Compound index

前文说过 single index 的 sort 顺序可有可无，可是 compound index 则彻底不一样。

考虑有以下场景：

db.events.find().sort( { username: 1, date: -1 } )
复制代码

events collection 有一个上面的查询，首先结果根据 username 进行 ascending 排序，而后再对结果进行 date descending 排序，或者是下面的查询：

db.events.find().sort( { username: -1, date: 1 } )
复制代码

根据 username 进行 descending 排序，而后再对 date 进行 ascending 排序，索引：

db.events.createIndex( { "username" : 1, "date" : -1 } ）
复制代码

能够支持这两种查询，可是下面的查询不支持：

db.events.find().sort( { username: 1, date: 1 })
复制代码

也就是说 sort 的顺序必需要和建立索引的顺序是一致的，一致的意思是不必定非要同样，总结起来大体以下

	{ "username" : 1, "date" : -1 }	{ "username" : 1, "date" : 1 }
sort( { username: 1, date: -1 } )	支持	不支持
sort( { username: -1, date: 1 } )	支持	不支持
sort( { username: 1, date: 1 } )	不支持	支持
sort( { username: -1, date: -1 } )	不支持	支持

即排序的顺序必需要和索引一致，逆序以后一致也能够，下表清晰的列出了 compound index 知足的 query 语句：

query	index
db.data.find().sort( { a: 1 } )	{ a: 1 }
db.data.find().sort( { a: -1 } )	{ a: 1 }
db.data.find().sort( { a: 1, b: 1 } )	{ a: 1, b: 1 }
db.data.find().sort( { a: -1, b: -1 } )	{ a: 1, b: 1 }
db.data.find().sort( { a: 1, b: 1, c: 1 } )	{ a: 1, b: 1, c: 1 }
db.data.find( { a: { $gt: 4 } } ).sort( { a: 1, b: 1 } )	{ a: 1, b: 1 }

非 index prefix 的排序

考虑索引 { a: 1, b: 1, c: 1, d: 1 }，即便排序的 field 不知足 index prefix 也是能够的，但前提条件是排序 field 以前的 index field 必须是等值条件，

	Example	Index Prefix
r1	db.data.find( { a: 5 } ).sort( { b: 1, c: 1 } )	{ a: 1 , b: 1, c: 1 }
r2	db.data.find( { b: 3, a: 4 } ).sort( { c: 1 } )	{ a: 1, b: 1, c: 1 }
r3	db.data.find( { a: 5, b: { $lt: 3} } ).sort( { b: 1 } )	{ a: 1, b: 1 }

上面表格 r1 的排序 field 是 b 和 c，a 是 index field 并且在 b 和 c 以前，可使用索引；r3 的排序中 b 是范围查询，可是 b 以前的 a 用的也是等值条件，也就是只要排序 field 以前的 field 知足等值条件便可，其它的 field 能够任意条件。

如何创建正确索引

前文基本覆盖了平常使用 MongoDB 所须要的主要索引知识，可是如何才创建正确的索引？

使用 explain 分析查询语句

MongoDB 默认提供了相似 MySQL explain 的语句来分析查询语句的来对咱们正确创建索引提供帮助，在创建索引时咱们须要对照 explain 对各类查询条件进行分析。

理解 field 顺序对索引的影响

索引的真正做用是帮助咱们限制数据的选择范围，好比 Compound index 多个 feild 的顺序如何决定，应该首选能够最大化的缩小数据查找范围的 field，这样若是第一个 field 能够迅速缩小数据的查找范围，那么后续的 feild 匹配的行就会变少不少。考虑语句：

{'online_time': {'$lte': present}, 'offline_time': {'$gt': present}, 'online': 1, 'orientation': 'quality', 'id': {'$gt': max_id}}
复制代码

考虑以下索引

	索引	nscanded
r1	{start_time:1, end_time: 1, origin: 1, id: 1, orientation: 1}	12959
r2	{start_time:1, end_time: 1, origin: 1, orientation: 1, id: 1}	2700

因为 field id 和 orientation 的顺序不一样会致使须要扫描的 documents 数量差别巨大，说明两者对对数据的限制范围差异很大，优先考虑可以最大化限制数据范围的索引顺序。

监控慢查询

始终对生成环境产生的慢查询进行第一时间分析，提前发现问题并解决。