文档的数据模型表明了数据的组织结构,一个好的数据模型能更好的支持应用程序。在MongoDB中,文档有两种数据模型,内嵌(embed)和引用(references)。mongodb
MongoDB的文档是无模式的,因此能够支持各类数据结构,内嵌模型也叫作非规格化模型(denormalized)。在MongoDB中,一组相关的数据能够是一个文档,也能够是组成文档的一部分。看看下面一张MongoDB文档中的图片。数据库
内嵌类型支持一组相关的数据存储在一个文档中,这样的好处就是,应用程序能够经过比较少的的查询和更新操做来完成一些常规的数据的查询和更新工做。数据结构
根据MongoDB文档,当遇到如下状况的时候,咱们应该考虑使用内嵌类型:app
像这种一对一的关系,使用内嵌类型能够很方便的进行数据的查询和更新。post
{ "_id": <ObjectId0>, "name": "Wilber", "contact": { "phone": "12345678", "email": "wilber@shanghai.com" } }
在这中状况中,若是应用程序会常常经过用户名字段来查询改用户发布的博客信息。那么,把posts做为内嵌字段会是一个比较好的选择,这样就能够减小不少查询的操做。性能
{ "_id": <ObjectId1>, "name": "Wilber", "contact": { "phone": "12345678", "email": "wilber@shanghai.com" }, "posts": [ { "title": "Indexes in MongoDB", "created": "12/01/2014", "link": "www.blog.com" }, { "title": "Replication in MongoDB", "created": "12/02/2014", "link": "www.blog.com" }, { "title": "Sharding in MongoDB", "created": "12/03/2014", "link": "www.blog.com" } ] }
根据上面的描述能够看出,内嵌模型能够给应用程序提供很好的数据查询性能,由于基于内嵌模型,能够经过一次数据库操做获得全部相关的数据。同时,内嵌模型可使数据更新操做变成一个原子写操做。fetch
然而,内嵌模型也可能引入一些问题,好比说文档会愈来愈大,这样就可能会影响数据库写操做的性能,还可能会产生数据碎片(data fragmentation)(即:使用内嵌模型要考虑Document Growth,下面引入MongoDB文档对Document Grouth的介绍)。另外,MongoDB中会有最大文档大小限制,因此在使用内嵌类型时还要考虑这点。ui
Some updates to documents can increase the size of documents. These updates include pushing elements to an array (i.e. $push) and adding new fields to a document. If the document size exceeds the allocated space for that document, MongoDB will relocate the document on disk. Relocating documents takes longer than in place updates and can lead to fragmented storage. Although MongoDB automatically adds padding to document allocations to minimize the likelihood of relocation, data models should avoid document growth when possible.spa
For instance, if your applications require updates that will cause document growth, you may want to refactor your data model to use references between data in distinct documents rather than a denormalized data model. 3d
相对于嵌入模型,引用模型又称规格化模型(Normalized data models),经过引用的方式来表示数据之间的关系。
这里一样使用来自MongoDB文档中的图片,在这个模型中,把contact和access从user中移出,并经过user_id做为索引来表示他们之间的联系。
当咱们遇到如下状况的时候,就能够考虑使用引用模型了:
下面看一个比较有意思的例子,该例子来自MongoDB文档
很直观的,咱们都会使用父子关系来表示这中树形结构
db.categories.insert( { _id: "MongoDB", parent: "Databases" } ) db.categories.insert( { _id: "dbm", parent: "Databases" } ) db.categories.insert( { _id: "Databases", parent: "Programming" } ) db.categories.insert( { _id: "Languages", parent: "Programming" } ) db.categories.insert( { _id: "Programming", parent: "Books" } ) db.categories.insert( { _id: "Books", parent: null } )
db.categories.insert( { _id: "MongoDB", children: [] } ) db.categories.insert( { _id: "dbm", children: [] } ) db.categories.insert( { _id: "Databases", children: [ "MongoDB", "dbm" ] } ) db.categories.insert( { _id: "Languages", children: [] } ) db.categories.insert( { _id: "Programming", children: [ "Databases", "Languages" ] } ) db.categories.insert( { _id: "Books", children: [ "Programming" ] } )
在MongoDB中,引用又有两种实现方式,手动引用(Manual references)和DBRefs。
像前面的一对多例子,咱们能够把use中的name字段保存在post文档中创建二者的关系,这样咱们能够经过屡次查询的方式的到咱们想要的数据。这种引用方式比较简单,并且能够知足大多数的需求。
user document |
post document |
{ "name": "Wilber", "gender": "Male", "birthday": "1987-09", "contact": { "phone": "12345678", "email": "wilber@shanghai.com" } } |
{ "title": "Indexes in MongoDB", "created": "12/01/2014", "link": "www.blog.com", "author": "Wilber" } { "title": "Replication in MongoDB", "created": "12/02/2014", "link": "www.blog.com", "author": "Wilber" } { "title": "Sharding in MongoDB", "created": "12/03/2014", "link": "www.blog.com", "author": "Wilber" } |
注意,手动引用的惟一不足是这种引用没有指明使用哪一个database,哪一个collection。若是出现一个collection中的文档与多个其它collection中的文档有引用关系,咱们可能就要考虑使用DBRefs了。
举例,假如用户能够在多个博客平台上发布博客,不一样博客平台的数据保存在不一样的collection。这种状况使用DBRefs就比较方便了。
user document |
Post4CNblog document |
Post4CSDN document |
Post4ITeye document |
{ "name": "Wilber", "gender": "Male", "birthday": "1987-09", "contact": { "phone": "12345678", "email": "wilber@shanghai.com" } } |
{ "title": "Indexes in MongoDB", "created": "12/01/2014", "link": "www.blog.com", "author": "Wilber" } { "title": "Replication in MongoDB", "created": "12/02/2014", "link": "www.blog.com", "author": "Wilber" } |
{ "title": "Sharding in MongoDB", "created": "12/03/2014", "link": "www.blog.com", "author": "Wilber" } |
{ "title": "Notepad++ configuration", "created": "12/05/2014", "link": "www.blog.com", "author": "Wilber" }
|
若是要查询在CNblog上发布"Replication in MongoDB"的用户详细信息,咱们可使用下面语句,经过两次查询获得用户详细信息
> db.Post4CNblog.find({"title": "Replication in MongoDB"}) { "_id" : ObjectId("548fe8100c3e84a00806a48f"), "title" : "Replication in MongoDB", "created" : "12/02/2014", "link" : "www.blog.com", "auth or" : "Wilber" } > db.user.find({"name":"Wilber"}).toArray() [ { "_id" : ObjectId("548fe8100c3e84a00806a48d"), "name" : "Wilber", "gender" : "Male", "birthday" : "1987-09", "contact" : { "phone" : "12345678", "email" : "wilber@shanghai.com" } } ]
DBRefs引用经过_id,collection名和database名(可选)来创建文档之间的关系。经过这种方式,即便文档分布在多个不一样的collection中,也能够被方便的连接起来。
DBRefs有特定的格式,会包含下面字段:
举例,将上面的例子经过DBRefs来实现。注意,这是要把user文档中的用户名设置成_id字段。
user document |
Post4CNblog document |
Post4CSDN document |
Post4ITeye document |
{ "_id": "Wilber", "gender": "Male", "birthday": "1987-09", "contact": { "phone": "12345678", "email": "wilber@shanghai.com" } } |
{ "title": "Indexes in MongoDB", "created": "12/01/2014", "link": "www.blog.com", "author": {"$ref": "user", "$id": "Wilber"} } { "title": "Replication in MongoDB", "created": "12/02/2014", "link": "www.blog.com", "author": {"$ref": "user", "$id": "Wilber"} } |
{ "title": "Sharding in MongoDB", "created": "12/03/2014", "link": "www.blog.com", "author": {"$ref": "user", "$id": "Wilber"} } |
{ "title": "Notepad++ configuration", "created": "12/05/2014", "link": "www.blog.com", "author": {"$ref": "user", "$id": "Wilber"} }
|
一样查询在CNblog上发布"Replication in MongoDB"的用户详细信息,这样能够经过一次查询来完成
> db.Post4CNblog.findOne({"title":"Replication in MongoDB"}).author.fetch() { "_id" : "Wilber", "gender" : "Male", "birthday" : "1987-09", "contact" : { "phone" : "12345678", "email" : "wilber@shanghai.com" } } >
经过这篇文章大概认识了MongoDB中的数据模型,不能说内嵌模型和引用模型那个好,关键是看应用场景。
还有就是,在使用内嵌模型是必定要注意Document Growth和最大文档限制。
Ps:例子中全部的命令均可以参考如下连接
http://files.cnblogs.com/wilber2013/data_modeling.js