摘要: 对于MongoDB的多键查询,建立复合索引能够有效提升性能。javascript
复合索引,即Compound Index,指的是将多个键组合到一块儿建立索引,这样能够加速匹配多个键的查询。不妨经过一个简单的示例理解复合索引。java
students集合以下:mongodb
db.students.find().pretty()
{
"_id" : ObjectId("5aa7390ca5be7272a99b042a"),
"name" : "zhang",
"age" : "15"
}
{
"_id" : ObjectId("5aa7393ba5be7272a99b042b"),
"name" : "wang",
"age" : "15"
}
{
"_id" : ObjectId("5aa7393ba5be7272a99b042c"),
"name" : "zhang",
"age" : "14"
}
复制代码
在name和age两个键分别建立了索引(_id自带索引):数据库
db.students.getIndexes()
[
{
"v" : 1,
"key" : {
"name" : 1
},
"name" : "name_1",
"ns" : "test.students"
},
{
"v" : 1,
"key" : {
"age" : 1
},
"name" : "age_1",
"ns" : "test.students"
}
]
复制代码
当进行多键查询时,能够经过explian()分析执行状况(结果仅保留winningPlan):json
db.students.find({name:"zhang",age:"14"}).explain()
"winningPlan":
{
"stage": "FETCH",
"filter":
{
"name":
{
"$eq": "zhang"
}
},
"inputStage":
{
"stage": "IXSCAN",
"keyPattern":
{
"age": 1
},
"indexName": "age_1",
"isMultiKey": false,
"isUnique": false,
"isSparse": false,
"isPartial": false,
"indexVersion": 1,
"direction": "forward",
"indexBounds":
{
"age": [
"[\"14\", \"14\"]"
]
}
}
}
复制代码
由winningPlan可知,这个查询依次分为IXSCAN和FETCH两个阶段。IXSCAN即索引扫描,使用的是age索引;FETCH即根据索引去查询文档,查询的时候须要使用name进行过滤。bash
为name和age建立复合索引:性能
db.students.createIndex({name:1,age:1})
db.students.getIndexes()
[
{
"v" : 1,
"key" : {
"name" : 1,
"age" : 1
},
"name" : "name_1_age_1",
"ns" : "test.students"
}
]
复制代码
有了复合索引以后,同一个查询的执行方式就不一样了:ui
db.students.find({name:"zhang",age:"14"}).explain()
"winningPlan":
{
"stage": "FETCH",
"inputStage":
{
"stage": "IXSCAN",
"keyPattern":
{
"name": 1,
"age": 1
},
"indexName": "name_1_age_1",
"isMultiKey": false,
"isUnique": false,
"isSparse": false,
"isPartial": false,
"indexVersion": 1,
"direction": "forward",
"indexBounds":
{
"name": [
"[\"zhang\", \"zhang\"]"
],
"age": [
"[\"14\", \"14\"]"
]
}
}
}
复制代码
由winningPlan可知,这个查询的顺序没有变化,依次分为IXSCAN和FETCH两个阶段。可是,IXSCAN使用的是name与age的复合索引;FETCH即根据索引去查询文档,不须要过滤。spa
这个示例的数据量过小,并不能看出什么问题。可是实际上,当数据量很大,IXSCAN返回的索引比较多时,FETCH时进行过滤将很是耗时。接下来将介绍一个真实的案例。.net
随着接收的错误数据不断增长,咱们Fundebug已经累计处理3.5亿错误事件,这给咱们的服务不断带来性能方面的挑战,尤为对于MongoDB集群来讲。
对于生产数据库,配置profile,能够记录MongoDB的性能数据。执行如下命令,则全部超过1s的数据库读写操做都会被记录下来。
db.setProfilingLevel(1,1000)
复制代码
查询profile所记录的数据,会发现events集合的某个查询很是慢:
db.system.profile.find().pretty()
{
"op" : "command",
"ns" : "fundebug.events",
"command" : {
"count" : "events",
"query" : {
"createAt" : {
"$lt" : ISODate("2018-02-05T20:30:00.073Z")
},
"projectId" : ObjectId("58211791ea2640000c7a3fe6")
}
},
"keyUpdates" : 0,
"writeConflicts" : 0,
"numYield" : 1414,
"locks" : {
"Global" : {
"acquireCount" : {
"r" : NumberLong(2830)
}
},
"Database" : {
"acquireCount" : {
"r" : NumberLong(1415)
}
},
"Collection" : {
"acquireCount" : {
"r" : NumberLong(1415)
}
}
},
"responseLength" : 62,
"protocol" : "op_query",
"millis" : 28521,
"execStats" : {
},
"ts" : ISODate("2018-03-07T20:30:59.440Z"),
"client" : "192.168.59.226",
"allUsers" : [ ],
"user" : ""
}
复制代码
events集合中有数亿个文档,所以count操做比较慢也不算太意外。根据profile数据,这个查询耗时28.5s,时间长得有点离谱。另外,numYield高达1414,这应该就是操做如此之慢的直接缘由。根据MongoDB文档,numYield的含义是这样的:
The number of times the operation yielded to allow other operations to complete. Typically, operations yield when they need access to data that MongoDB has not yet fully read into memory. This allows other operations that have data in memory to complete while MongoDB reads in data for the yielding operation.
这就意味着大量时间消耗在读取硬盘上,且读了很是屡次。能够推测,应该是索引的问题致使的。
不妨使用explian()来分析一下这个查询(仅保留executionStats):
db.events.explain("executionStats").count({"projectId" : ObjectId("58211791ea2640000c7a3fe6"),createAt:{"$lt" : ISODate("2018-02-05T20:30:00.073Z")}})
"executionStats":
{
"executionSuccess": true,
"nReturned": 20853,
"executionTimeMillis": 28055,
"totalKeysExamined": 28338,
"totalDocsExamined": 28338,
"executionStages":
{
"stage": "FETCH",
"filter":
{
"createAt":
{
"$lt": ISODate("2018-02-05T20:30:00.073Z")
}
},
"nReturned": 20853,
"executionTimeMillisEstimate": 27815,
"works": 28339,
"advanced": 20853,
"needTime": 7485,
"needYield": 0,
"saveState": 1387,
"restoreState": 1387,
"isEOF": 1,
"invalidates": 0,
"docsExamined": 28338,
"alreadyHasObj": 0,
"inputStage":
{
"stage": "IXSCAN",
"nReturned": 28338,
"executionTimeMillisEstimate": 30,
"works": 28339,
"advanced": 28338,
"needTime": 0,
"needYield": 0,
"saveState": 1387,
"restoreState": 1387,
"isEOF": 1,
"invalidates": 0,
"keyPattern":
{
"projectId": 1
},
"indexName": "projectId_1",
"isMultiKey": false,
"isUnique": false,
"isSparse": false,
"isPartial": false,
"indexVersion": 1,
"direction": "forward",
"indexBounds":
{
"projectId": [
"[ObjectId('58211791ea2640000c7a3fe6'), ObjectId('58211791ea2640000c7a3fe6')]"
]
},
"keysExamined": 28338,
"dupsTested": 0,
"dupsDropped": 0,
"seenInvalidated": 0
}
}
}
复制代码
可知,events集合并无为projectId与createAt创建复合索引,所以IXSCAN阶段采用的是projectId索引,其nReturned为28338; FETCH阶段须要根据createAt进行过滤,其nReturned为20853,过滤掉了7485个文档;另外,IXSCAN与FETCH阶段的executionTimeMillisEstimate分别为30ms和27815ms,所以基本上全部时间都消耗在了FETCH阶段,这应该是读取硬盘致使的。
没有为projectId和createAt建立复合索引是个尴尬的错误,赶忙补救一下:
db.events.createIndex({projectId:1,createTime:-1},{background: true})
复制代码
在生产环境构建索引这种事最好是晚上作,这个命令一共花了大概7个小时吧!background设为true,指的是不要阻塞数据库的其余操做,保证数据库的可用性。可是,这个命令会一直占用着终端,这时不能使用CTRL + C,不然会终止索引构建过程。
复合索引建立成果以后,前文的查询就快了不少(仅保留executionStats):
db.javascriptevents.explain("executionStats").count({"projectId" : ObjectId("58211791ea2640000c7a3fe6"),createAt:{"$lt" : ISODate("2018-02-05T20:30:00.073Z")}})
"executionStats":
{
"executionSuccess": true,
"nReturned": 0,
"executionTimeMillis": 47,
"totalKeysExamined": 20854,
"totalDocsExamined": 0,
"executionStages":
{
"stage": "COUNT",
"nReturned": 0,
"executionTimeMillisEstimate": 50,
"works": 20854,
"advanced": 0,
"needTime": 20853,
"needYield": 0,
"saveState": 162,
"restoreState": 162,
"isEOF": 1,
"invalidates": 0,
"nCounted": 20853,
"nSkipped": 0,
"inputStage":
{
"stage": "COUNT_SCAN",
"nReturned": 20853,
"executionTimeMillisEstimate": 50,
"works": 20854,
"advanced": 20853,
"needTime": 0,
"needYield": 0,
"saveState": 162,
"restoreState": 162,
"isEOF": 1,
"invalidates": 0,
"keysExamined": 20854,
"keyPattern":
{
"projectId": 1,
"createAt": -1
},
"indexName": "projectId_1_createTime_-1",
"isMultiKey": false,
"isUnique": false,
"isSparse": false,
"isPartial": false,
"indexVersion": 1
}
}
}
复制代码
可知,count操做使用了projectId和createAt的复合索引,所以很是快,只花了46ms,性能提高了将近**600倍!!!**对比使用复合索引先后的结果,发现totalDocsExamined从28338降到了0,表示使用复合索引以后再也不须要去查询文档,只须要扫描索引就行了,这样就不须要去访问磁盘了,天然快了不少。