分布式文档存储数据库之MongoDB索引管理

时间 2020-11-10

标签 html node git web mongodb shell 数据库 bash 函数性能栏目系统架构繁體版

原文原文链接

　　前文咱们聊到了MongoDB的简介、安装和对collection的CRUD操做，回顾请参考http://www.javashuo.com/article/p-ergzixcf-nu.html；今天咱们来聊下mongodb的索引；html

　　一、为何要有索引？索引的做用是干吗的？node

　　咱们知道mongodb一般应用在一些web站点，数据量很是大的场景中；在大数据的场景中，对于咱们要查询一个数据，mongodb是否可以快速的响应结果就变得尤其的重要；这也是索引存在的意义；索引就是用来帮助咱们在很大的数据集中快速查询咱们想要的数据；一般咱们在mongodb中插入一条数据时，mongodb会自动给咱们添加一个_id的字段，这个字段是mongodb内部本身维护，一般状况咱们都不会去管它；在关系型数据库中，咱们能够在单个字段上构建索引，也能够在多个字段上构建索引，之因此要在多个字段构建索引是由于咱们的查询条件极可能用到的字段不仅一个；因此咱们构建索引的原则是根据查询条件来构建；好比咱们要查询年龄大于30的用户有哪些，咱们就能够把索引构建在年龄这个字段上，构建在其余字段上，对于咱们要查询年龄大于30这个条件来说是没有意义的；因此构建索引一般咱们会去了解用户最常的查询，在用户最常查询的字段上构建索引，这样能够有效提升用户的查询；对于mongodb也是同样的，索引的存在就是为了提升咱们的查询；git

　　二、为何索引可以帮助咱们快速查找呢？web

　　首先索引是按照咱们指定的字段来构建，构建索引就是把咱们指定的字段抽取出来，而后提早排好序（或者按照必定规律的方式排列好），而后保存为另一个collection；用户在查找数据时，mongodb首先会去找索引，看看用户的条件是否和索引匹配，可以匹配，索引就能告诉用户要查询的数据在那个地方，这样就很快的找到用户查询的数据；假如咱们构建的索引没有匹配用户的查询，那么用户的查询会以遍历的方式去查找，这样一来无形之中速度就变慢了（本来不加索引，直接遍历，如今有索引，要先查索引，没有命中，还要遍历）；因此构建索引，若是必定是数据量很大的状况才构建，数据量小，构建索引不但不会帮助咱们快速的查找内容，反而会拖慢咱们的查询速度；其次在很大的数据量上，若是索引构建的字段没有被查询命中，那么我构建的索引就无心义；mongodb

　　三、索引在必定程度上是要影响用户写的性能shell

　　咱们在某个字段构建好索引之后，用户在写数据时，一般会额外多一次写io；对于写请求，在没有索引的状况，用户只须要写一次io，有了索引用户每写一条数据，都会对应有一次写索引的io；这样一来在必定程度上对用户的写性能会有影响；但一般咱们构建索引都是在读多写少的场景中使用；在写请求不是特别多的场景其实多一次写io，比起读请求的压力咱们是能够接受的；更况且有些数据库支持延迟写索引，所谓延迟写索引是指用户在插入数据时，它不当即写索引，而是等一段时间再写，这样一来就有效的下降写索引对用户的写请求性能的影响；数据库

　　上图主要描述了索引和文档的关系，在索引里的数据一般是咱们指定的字段，用特定的排列方式组织在一块儿，在用户查询某个数据时，就可以很快的从索引中拿到对应文档的位置，从而不用每一个文档挨着遍历；这也是索引可以帮助咱们快速查找的缘由吧；bash

　　四、索引类型函数

　　索引是有类型的，不一样类型的索引在内部组织索引的方式各不相同，不一样类型的索引给咱们查询带来的效果也不一样；常见的索引类型有b+ tree（平衡树索引），hash索引、空间索引、全文索引等等；在mongodb中索引也有不少类型，不一样的是咱们上面说的索引类型，b+ tree，hash索引这些都是从索引内部组织结构来描述；在mongodb中的索引咱们是从索引构建的位置来描述；mongodb中的索引有单键索引、组合索引、多键索引、空间索引、文本索引和hash索引；所谓单键索引是指构建在一个字段上的索引；组合索引指构建在多个字段上的索引；多键索引指将索引构建在一个键的值是一个子文档的字段上；咱们知道文档和文档是能够嵌套的，这也意味着一个文档内部能够引用另外一个文档，一个文档中的某个键对应的值也能够是另一个子文档；咱们把这种索引构建在一个文档中的某个键是一个子文档的某个字段上的索引叫作多键索引，它和单键索引不是对应的；空间索引指基于位置查询的索引，但一般这种索引只有用到特定的方法来查询时，它才会生效，好比使用基于空间位置的函数；文本索引指支持搜索整个文档中的文本信息，一般这种索引咱们也叫全文索引；hash索引指把某个字段的值作hash计算后组织的索引；这种索引有个特色就是时间复杂度是o(1)；无论数据有多少，在查找数据时所用到的时间都是同样的；之因此时间复杂度是o(1)，缘由是hash计算每个值都是惟一的；这种索引的查找方式有点相似键值查找，不一样的是hash背后对应的是一个hash桶，先找到hash桶，而后查找到对应的hash值；hash索引和b+树索引最大的区别是，b+树索引能够查询一个范围，由于树状索引一般是把数据组织成一个有序的结构，而hash索引不能，hash索引只能查找一个精确的值，不能查找一个范围；由于hash索引背后对应的是一个hash值，每一个hash值可能都不在一个hash桶，因此咱们假如要查询年龄大于30岁的用户，用hash索引就不适合，由于30和31的hash值可能就不在一个hash桶上；性能

　　五、在mongodb数据库上建立索引

　　准备数据

> use testdb
switched to db testdb
> for (i=1;i<=1000000;i++) db.peoples.insert({name:"people"+i,age:(i%120),classes:(i%20)})
WriteResult({ "nInserted" : 1 })
> db.peoples.find().count()
1000000
> db.peoples.find()
{ "_id" : ObjectId("5fa943987a7deafb9e543326"), "name" : "people1", "age" : 1, "classes" : 1 }
{ "_id" : ObjectId("5fa943987a7deafb9e543327"), "name" : "people2", "age" : 2, "classes" : 2 }
{ "_id" : ObjectId("5fa943987a7deafb9e543328"), "name" : "people3", "age" : 3, "classes" : 3 }
{ "_id" : ObjectId("5fa943987a7deafb9e543329"), "name" : "people4", "age" : 4, "classes" : 4 }
{ "_id" : ObjectId("5fa943987a7deafb9e54332a"), "name" : "people5", "age" : 5, "classes" : 5 }
{ "_id" : ObjectId("5fa943987a7deafb9e54332b"), "name" : "people6", "age" : 6, "classes" : 6 }
{ "_id" : ObjectId("5fa943987a7deafb9e54332c"), "name" : "people7", "age" : 7, "classes" : 7 }
{ "_id" : ObjectId("5fa943987a7deafb9e54332d"), "name" : "people8", "age" : 8, "classes" : 8 }
{ "_id" : ObjectId("5fa943987a7deafb9e54332e"), "name" : "people9", "age" : 9, "classes" : 9 }
{ "_id" : ObjectId("5fa943987a7deafb9e54332f"), "name" : "people10", "age" : 10, "classes" : 10 }
{ "_id" : ObjectId("5fa943987a7deafb9e543330"), "name" : "people11", "age" : 11, "classes" : 11 }
{ "_id" : ObjectId("5fa943987a7deafb9e543331"), "name" : "people12", "age" : 12, "classes" : 12 }
{ "_id" : ObjectId("5fa943987a7deafb9e543332"), "name" : "people13", "age" : 13, "classes" : 13 }
{ "_id" : ObjectId("5fa943987a7deafb9e543333"), "name" : "people14", "age" : 14, "classes" : 14 }
{ "_id" : ObjectId("5fa943987a7deafb9e543334"), "name" : "people15", "age" : 15, "classes" : 15 }
{ "_id" : ObjectId("5fa943987a7deafb9e543335"), "name" : "people16", "age" : 16, "classes" : 16 }
{ "_id" : ObjectId("5fa943987a7deafb9e543336"), "name" : "people17", "age" : 17, "classes" : 17 }
{ "_id" : ObjectId("5fa943987a7deafb9e543337"), "name" : "people18", "age" : 18, "classes" : 18 }
{ "_id" : ObjectId("5fa943987a7deafb9e543338"), "name" : "people19", "age" : 19, "classes" : 19 }
{ "_id" : ObjectId("5fa943987a7deafb9e543339"), "name" : "people20", "age" : 20, "classes" : 0 }
Type "it" for more
> it
{ "_id" : ObjectId("5fa943987a7deafb9e54333a"), "name" : "people21", "age" : 21, "classes" : 1 }
{ "_id" : ObjectId("5fa943987a7deafb9e54333b"), "name" : "people22", "age" : 22, "classes" : 2 }
{ "_id" : ObjectId("5fa943987a7deafb9e54333c"), "name" : "people23", "age" : 23, "classes" : 3 }
{ "_id" : ObjectId("5fa943987a7deafb9e54333d"), "name" : "people24", "age" : 24, "classes" : 4 }
{ "_id" : ObjectId("5fa943987a7deafb9e54333e"), "name" : "people25", "age" : 25, "classes" : 5 }
{ "_id" : ObjectId("5fa943987a7deafb9e54333f"), "name" : "people26", "age" : 26, "classes" : 6 }
{ "_id" : ObjectId("5fa943987a7deafb9e543340"), "name" : "people27", "age" : 27, "classes" : 7 }
{ "_id" : ObjectId("5fa943987a7deafb9e543341"), "name" : "people28", "age" : 28, "classes" : 8 }
{ "_id" : ObjectId("5fa943987a7deafb9e543342"), "name" : "people29", "age" : 29, "classes" : 9 }
{ "_id" : ObjectId("5fa943987a7deafb9e543343"), "name" : "people30", "age" : 30, "classes" : 10 }
{ "_id" : ObjectId("5fa943987a7deafb9e543344"), "name" : "people31", "age" : 31, "classes" : 11 }
{ "_id" : ObjectId("5fa943987a7deafb9e543345"), "name" : "people32", "age" : 32, "classes" : 12 }
{ "_id" : ObjectId("5fa943987a7deafb9e543346"), "name" : "people33", "age" : 33, "classes" : 13 }
{ "_id" : ObjectId("5fa943987a7deafb9e543347"), "name" : "people34", "age" : 34, "classes" : 14 }
{ "_id" : ObjectId("5fa943987a7deafb9e543348"), "name" : "people35", "age" : 35, "classes" : 15 }
{ "_id" : ObjectId("5fa943987a7deafb9e543349"), "name" : "people36", "age" : 36, "classes" : 16 }
{ "_id" : ObjectId("5fa943987a7deafb9e54334a"), "name" : "people37", "age" : 37, "classes" : 17 }
{ "_id" : ObjectId("5fa943987a7deafb9e54334b"), "name" : "people38", "age" : 38, "classes" : 18 }
{ "_id" : ObjectId("5fa943987a7deafb9e54334c"), "name" : "people39", "age" : 39, "classes" : 19 }
{ "_id" : ObjectId("5fa943987a7deafb9e54334d"), "name" : "people40", "age" : 40, "classes" : 0 }
Type "it" for more
>

　　提示：建立测试数据可使用循环的方式，它这里的循环和c语言中的循环是同样的；在mongodb中查看数据，当数据量过多时，它不会一次性所有显示，而是分页显示，每次默认显示20条；键入it命令能够显示下一页；

　　在mongodb上建立索引，语法格式：db.mycoll.ensureIndex(keypattern[,options])或者db.mycoll.createIndex(keypattern[,options])

　　在name字段上建立索引

> db.peoples.ensureIndex({name:1})
{
        "createdCollectionAutomatically" : false,
        "numIndexesBefore" : 1,
        "numIndexesAfter" : 2,
        "ok" : 1
}
>

　　提示：这里的name是指字段名称，而非索引名称；后面的1表示升序，-1表示降序；

　　查看索引

> db.peoples.getIndices()
[
        {
                "v" : 2,
                "key" : {
                        "_id" : 1
                },
                "name" : "_id_"
        },
        {
                "v" : 2,
                "key" : {
                        "name" : 1
                },
                "name" : "name_1"
        }
]
>

　　提示：从上面返回的结果能够看到，peoples这个集合上有两个索引，一个名为_id_，其对应的字段为_id，以升序的方式排列；一个名为name_1，其对应字段为name，以升序的方式排列；默认不给索引取名，它就是字段名后面加下划线，再加表示升序或降序的数字；以下所示

　　删除索引

> db.peoples.dropIndex("name_1")
{ "nIndexesWas" : 3, "ok" : 1 }
> db.peoples.dropIndex("age_-1")
{ "nIndexesWas" : 2, "ok" : 1 }
> db.peoples.getIndices()
[ { "v" : 2, "key" : { "_id" : 1 }, "name" : "_id_" } ]
>

　　提示：删除索引须要指定索引名称，而且须要用引号引发来；

　　在name字段上构建惟一键索引

　　提示：建立惟一键索引，咱们只须要在建立索引时加上unique：true这个选项便可；所谓惟一键是指咱们指定的字段上的值必须是惟一的；若是在咱们在插入对应字段时和以前有的数据重复，此时会插入失败；

　　验证：插入一条name字段值为peoples23的数据，看看是否可以插入成功呢？

　　提示：能够看到当咱们在name字段上构建惟一键索引后，在插入name字段有相同值的数据时，它告诉咱们说插入的数据重复；不容许咱们插入；说明咱们建立的惟一键索引生效了；

　　重建索引

> db.peoples.reIndex()
{
        "nIndexesWas" : 2,
        "nIndexes" : 2,
        "indexes" : [
                {
                        "v" : 2,
                        "key" : {
                                "_id" : 1
                        },
                        "name" : "_id_"
                },
                {
                        "v" : 2,
                        "unique" : true,
                        "key" : {
                                "name" : 1
                        },
                        "name" : "name_1"
                }
        ],
        "ok" : 1
}
>

　　提示：若是咱们要修改索引，能够删除从新键，上面的reIndex不能实现修改原有索引的属性信息；

　　构建索引并指定为后台构建，释放当前shell

> db.peoples.createIndex({age:-1},{background:true})
{
        "createdCollectionAutomatically" : false,
        "numIndexesBefore" : 2,
        "numIndexesAfter" : 3,
        "ok" : 1
}
> db.peoples.getIndices()
[
        {
                "v" : 2,
                "key" : {
                        "_id" : 1
                },
                "name" : "_id_"
        },
        {
                "v" : 2,
                "unique" : true,
                "key" : {
                        "name" : 1
                },
                "name" : "name_1"
        },
        {
                "v" : 2,
                "key" : {
                        "age" : -1
                },
                "name" : "age_-1",
                "background" : true
        }
]
>

　　删除全部手动构建的索引

> db.peoples.dropIndexes()
{
        "nIndexesWas" : 3,
        "msg" : "non-_id indexes dropped for collection",
        "ok" : 1
}
> db.peoples.getIndices()
[ { "v" : 2, "key" : { "_id" : 1 }, "name" : "_id_" } ]
>

　　建立组合索引

> db.peoples.createIndex({name:1,age:1},{background:true})
{
        "createdCollectionAutomatically" : false,
        "numIndexesBefore" : 1,
        "numIndexesAfter" : 2,
        "ok" : 1
}
> db.peoples.getIndices()
[
        {
                "v" : 2,
                "key" : {
                        "_id" : 1
                },
                "name" : "_id_"
        },
        {
                "v" : 2,
                "key" : {
                        "name" : 1,
                        "age" : 1
                },
                "name" : "name_1_age_1",
                "background" : true
        }
]
>

　　以name字段为条件查询数据，看看mongodb查询过程

> db.peoples.find({name:"people1221"}).explain()
{
        "queryPlanner" : {
                "plannerVersion" : 1,
                "namespace" : "testdb.peoples",
                "indexFilterSet" : false,
                "parsedQuery" : {
                        "name" : {
                                "$eq" : "people1221"
                        }
                },
                "queryHash" : "01AEE5EC",
                "planCacheKey" : "4C5AEA2C",
                "winningPlan" : {
                        "stage" : "FETCH",
                        "inputStage" : {
                                "stage" : "IXSCAN",
                                "keyPattern" : {
                                        "name" : 1,
                                        "age" : 1
                                },
                                "indexName" : "name_1_age_1",
                                "isMultiKey" : false,
                                "multiKeyPaths" : {
                                        "name" : [ ],
                                        "age" : [ ]
                                },
                                "isUnique" : false,
                                "isSparse" : false,
                                "isPartial" : false,
                                "indexVersion" : 2,
                                "direction" : "forward",
                                "indexBounds" : {
                                        "name" : [
                                                "[\"people1221\", \"people1221\"]"
                                        ],
                                        "age" : [
                                                "[MinKey, MaxKey]"
                                        ]
                                }
                        }
                },
                "rejectedPlans" : [ ]
        },
        "serverInfo" : {
                "host" : "node01.test.org",
                "port" : 27017,
                "version" : "4.4.1",
                "gitVersion" : "ad91a93a5a31e175f5cbf8c69561e788bbc55ce1"
        },
        "ok" : 1
}
>

　　提示：从上面返回的结果能够看到在本次查询是IXSCAN（索引扫描），因此查找很快就返回了；同时也显示了索引相关的信息；

　　组合name和age字段条件查询，看看是否命中索引？

> db.peoples.find({$and:[{age:{$lt:80}},{name:{$gt:"people200"}}]}).explain()
{
        "queryPlanner" : {
                "plannerVersion" : 1,
                "namespace" : "testdb.peoples",
                "indexFilterSet" : false,
                "parsedQuery" : {
                        "$and" : [
                                {
                                        "age" : {
                                                "$lt" : 80
                                        }
                                },
                                {
                                        "name" : {
                                                "$gt" : "people200"
                                        }
                                }
                        ]
                },
                "queryHash" : "96038BC4",
                "planCacheKey" : "E71214BA",
                "winningPlan" : {
                        "stage" : "FETCH",
                        "inputStage" : {
                                "stage" : "IXSCAN",
                                "keyPattern" : {
                                        "name" : 1,
                                        "age" : 1
                                },
                                "indexName" : "name_1_age_1",
                                "isMultiKey" : false,
                                "multiKeyPaths" : {
                                        "name" : [ ],
                                        "age" : [ ]
                                },
                                "isUnique" : false,
                                "isSparse" : false,
                                "isPartial" : false,
                                "indexVersion" : 2,
                                "direction" : "forward",
                                "indexBounds" : {
                                        "name" : [
                                                "(\"people200\", {})"
                                        ],
                                        "age" : [
                                                "[-inf.0, 80.0)"
                                        ]
                                }
                        }
                },
                "rejectedPlans" : [ ]
        },
        "serverInfo" : {
                "host" : "node01.test.org",
                "port" : 27017,
                "version" : "4.4.1",
                "gitVersion" : "ad91a93a5a31e175f5cbf8c69561e788bbc55ce1"
        },
        "ok" : 1
}
>

　　提示：能够看到咱们组合两个字段作条件范围查询也是能够正常索引扫描；