MongoDB的数据模型

时间 2019-11-10

标签 mongodb 数据模型栏目 MongoDB 繁體版

原文原文链接

文档的数据模型表明了数据的组织结构，一个好的数据模型能更好的支持应用程序。在MongoDB中，文档有两种数据模型，内嵌（embed）和引用（references）。mongodb

内嵌

MongoDB的文档是无模式的，因此能够支持各类数据结构，内嵌模型也叫作非规格化模型（denormalized）。在MongoDB中，一组相关的数据能够是一个文档，也能够是组成文档的一部分。看看下面一张MongoDB文档中的图片。数据库

内嵌类型支持一组相关的数据存储在一个文档中，这样的好处就是，应用程序能够经过比较少的的查询和更新操做来完成一些常规的数据的查询和更新工做。数据结构

根据MongoDB文档，当遇到如下状况的时候，咱们应该考虑使用内嵌类型：app

若是数据关系是一种一对一的包含关系，例以下面的文档，每一个人都有一个contact字段来描述这我的的联系方式。

像这种一对一的关系，使用内嵌类型能够很方便的进行数据的查询和更新。post

{
    "_id": <ObjectId0>,
    "name": "Wilber",
    "contact": {
                        "phone": "12345678",
                        "email": "wilber@shanghai.com"
                    }
}

若是数据的关系是一对多，那么也能够考虑使用内嵌模型。例以下面的文档，用posts字段记录全部用户发布的博客。

在这中状况中，若是应用程序会常常经过用户名字段来查询改用户发布的博客信息。那么，把posts做为内嵌字段会是一个比较好的选择，这样就能够减小不少查询的操做。性能

{
    "_id": <ObjectId1>,
    "name": "Wilber",
    "contact": {
                        "phone": "12345678",
                        "email": "wilber@shanghai.com"
                    },
    "posts": [
                    {
                        "title": "Indexes in MongoDB",
                        "created": "12/01/2014",
                        "link": "www.blog.com"
                    },
                    {
                        "title": "Replication in MongoDB",
                        "created": "12/02/2014",
                        "link": "www.blog.com"
                    },
                    {
                        "title": "Sharding in MongoDB",
                        "created": "12/03/2014",
                        "link": "www.blog.com"
                    }
                 ]
}

根据上面的描述能够看出，内嵌模型能够给应用程序提供很好的数据查询性能，由于基于内嵌模型，能够经过一次数据库操做获得全部相关的数据。同时，内嵌模型可使数据更新操做变成一个原子写操做。fetch

然而，内嵌模型也可能引入一些问题，好比说文档会愈来愈大，这样就可能会影响数据库写操做的性能，还可能会产生数据碎片（data fragmentation）（即：使用内嵌模型要考虑Document Growth，下面引入MongoDB文档对Document Grouth的介绍）。另外，MongoDB中会有最大文档大小限制，因此在使用内嵌类型时还要考虑这点。ui

Document Growth

Some updates to documents can increase the size of documents. These updates include pushing elements to an array (i.e. $push) and adding new fields to a document. If the document size exceeds the allocated space for that document, MongoDB will relocate the document on disk. Relocating documents takes longer than in place updates and can lead to fragmented storage. Although MongoDB automatically adds padding to document allocations to minimize the likelihood of relocation, data models should avoid document growth when possible.spa

For instance, if your applications require updates that will cause document growth, you may want to refactor your data model to use references between data in distinct documents rather than a denormalized data model. 3d

引用

相对于嵌入模型，引用模型又称规格化模型（Normalized data models），经过引用的方式来表示数据之间的关系。

这里一样使用来自MongoDB文档中的图片，在这个模型中，把contact和access从user中移出，并经过user_id做为索引来表示他们之间的联系。

当咱们遇到如下状况的时候，就能够考虑使用引用模型了：

使用内嵌模型每每会带来数据的冗余，却能够提高数据查询的效率。可是，当应用程序基本上不经过内嵌模型查询，或者说查询效率的提高不足以弥补数据冗余带来的问题时，咱们就应该考虑引用模型了。
当须要实现复杂的多对多关系的时候，能够考虑引用模型。好比咱们熟知的例子，学生-课程-老师关系，若是用引用模型来实现三者的关系，可能会比内嵌模型更清晰直观，同时会减小不少冗余数据。
当须要实现复杂的树形关系的时候，能够考虑引用模型。

下面看一个比较有意思的例子，该例子来自MongoDB文档

很直观的，咱们都会使用父子关系来表示这中树形结构

那么经过父引用，咱们能够经过下面的方式来表示这棵树

db.categories.insert( { _id: "MongoDB", parent: "Databases" } )
db.categories.insert( { _id: "dbm", parent: "Databases" } )
db.categories.insert( { _id: "Databases", parent: "Programming" } )
db.categories.insert( { _id: "Languages", parent: "Programming" } )
db.categories.insert( { _id: "Programming", parent: "Books" } )
db.categories.insert( { _id: "Books", parent: null } )

也能够经过孩子引用

db.categories.insert( { _id: "MongoDB", children: [] } )
db.categories.insert( { _id: "dbm", children: [] } )
db.categories.insert( { _id: "Databases", children: [ "MongoDB", "dbm" ] } )
db.categories.insert( { _id: "Languages", children: [] } )
db.categories.insert( { _id: "Programming", children: [ "Databases", "Languages" ] } )
db.categories.insert( { _id: "Books", children: [ "Programming" ] } )

在MongoDB中，引用又有两种实现方式，手动引用（Manual references）和DBRefs。

Manual references

像前面的一对多例子，咱们能够把use中的name字段保存在post文档中创建二者的关系，这样咱们能够经过屡次查询的方式的到咱们想要的数据。这种引用方式比较简单，并且能够知足大多数的需求。

user document

post document

{

"name": "Wilber",

"gender": "Male",

"birthday": "1987-09",

"contact": {

"phone": "12345678",

"email": "wilber@shanghai.com"

}

{

"title": "Indexes in MongoDB",

"created": "12/01/2014",

"link": "www.blog.com",

"author": "Wilber"

}

{

"title": "Replication in MongoDB",

"created": "12/02/2014",

"link": "www.blog.com",

"author": "Wilber"

}

{

"title": "Sharding in MongoDB",

"created": "12/03/2014",

"link": "www.blog.com",

"author": "Wilber"

}

注意，手动引用的惟一不足是这种引用没有指明使用哪一个database，哪一个collection。若是出现一个collection中的文档与多个其它collection中的文档有引用关系，咱们可能就要考虑使用DBRefs了。

举例，假如用户能够在多个博客平台上发布博客，不一样博客平台的数据保存在不一样的collection。这种状况使用DBRefs就比较方便了。

user document

Post4CNblog document

Post4CSDN document

Post4ITeye document

{

"name": "Wilber",

"gender": "Male",

"birthday": "1987-09",

"contact": {

"phone": "12345678",

"email": "wilber@shanghai.com"

}

{

"title": "Indexes in MongoDB",

"created": "12/01/2014",

"link": "www.blog.com",

"author": "Wilber"

}

{

"title": "Replication in MongoDB",

"created": "12/02/2014",

"link": "www.blog.com",

"author": "Wilber"

}

{

"title": "Sharding in MongoDB",

"created": "12/03/2014",

"link": "www.blog.com",

"author": "Wilber"

}

{

"title": "Notepad++ configuration",

"created": "12/05/2014",

"link": "www.blog.com",

"author": "Wilber"

}

若是要查询在CNblog上发布"Replication in MongoDB"的用户详细信息，咱们可使用下面语句，经过两次查询获得用户详细信息

> db.Post4CNblog.find({"title": "Replication in MongoDB"})
{ "_id" : ObjectId("548fe8100c3e84a00806a48f"), "title" : "Replication in MongoDB", "created" : "12/02/2014", "link" : "www.blog.com", "auth
or" : "Wilber" }
> db.user.find({"name":"Wilber"}).toArray()
[
        {
                "_id" : ObjectId("548fe8100c3e84a00806a48d"),
                "name" : "Wilber",
                "gender" : "Male",
                "birthday" : "1987-09",
                "contact" : {
                        "phone" : "12345678",
                        "email" : "wilber@shanghai.com"
                }
        }
]

DBRefs

DBRefs引用经过_id，collection名和database名（可选）来创建文档之间的关系。经过这种方式，即便文档分布在多个不一样的collection中，也能够被方便的连接起来。

DBRefs有特定的格式，会包含下面字段:

$ref：要引用文档的collection名称
$id：包含要引用文档的_id字段
$db（Optional）：要引用的文档所在的database名称

举例，将上面的例子经过DBRefs来实现。注意，这是要把user文档中的用户名设置成_id字段。

user document

Post4CNblog document

Post4CSDN document

Post4ITeye document

{

"_id": "Wilber",

"gender": "Male",

"birthday": "1987-09",

"contact": {

"phone": "12345678",

"email": "wilber@shanghai.com"

}

{

"title": "Indexes in MongoDB",

"created": "12/01/2014",

"link": "www.blog.com",

"author": {"$ref": "user", "$id": "Wilber"}

}

{

"title": "Replication in MongoDB",

"created": "12/02/2014",

"link": "www.blog.com",

"author": {"$ref": "user", "$id": "Wilber"}

}

{

"title": "Sharding in MongoDB",

"created": "12/03/2014",

"link": "www.blog.com",

"author": {"$ref": "user", "$id": "Wilber"}

}

{

"title": "Notepad++ configuration",

"created": "12/05/2014",

"link": "www.blog.com",

"author": {"$ref": "user", "$id": "Wilber"}

}

一样查询在CNblog上发布"Replication in MongoDB"的用户详细信息，这样能够经过一次查询来完成

> db.Post4CNblog.findOne({"title":"Replication in MongoDB"}).author.fetch()
{
        "_id" : "Wilber",
        "gender" : "Male",
        "birthday" : "1987-09",
        "contact" : {
                "phone" : "12345678",
                "email" : "wilber@shanghai.com"
        }
}
>

总结

经过这篇文章大概认识了MongoDB中的数据模型，不能说内嵌模型和引用模型那个好，关键是看应用场景。

还有就是，在使用内嵌模型是必定要注意Document Growth和最大文档限制。

Ps：例子中全部的命令均可以参考如下连接

http://files.cnblogs.com/wilber2013/data_modeling.js