Node.js 操做Mongodb

时间 2019-12-07

原文原文链接

Node.js 操做Mongodb
1.简介
官网英文文档 https://docs.mongodb.com/manual/ 这里几乎什么都有了
MongoDB is open-source document database that provides high performance , high availability , and automatic scaling.
MongoDB是一个开源的文档数据库，提供高性能、高可用性、自动缩放
2.安装
详细的安装相关信息，好比支持哪些系统，32位和64位有哪些区别，这些请上官网自行查看node

Sudo apt-get install –y mongodb-org
安装mongodb-org 会依赖于
mongodb-org-server 包含mongod进程，配置文件还有初始化脚本
mongodb-org-mongos 包含mongos进程
mongodb-org-shell 包含mongo命令
mongodb-org-tools 包含mongoinport ,mongodump,mongorestore,mongofiles等命令
各个命令有什么用，请看下章。mysql

启动mongodb
sudo mongod
启动的时候有一些参数，能够经过mongod –help查看
默认端口为27017，你也能够经过mongod --port=xxx 来改变端口。
通常经常使用的是 mongod -f /etc/mongodb.config 来启动   mongodb.config如何配置请
中止mongodb
官网推荐的是 mongo 进到命令行里
use admin   //转到admin数据库
db.shutdownServer()   //中止mongodb
也能够直接kill mongodb的进程
ps –ef|grep mongodb   找到mongodb的进程pid
Kill -9 pidweb

3.命令
启动mongodb命令
cd mongodbHome
./bin/mongo   具体参数能够用 ./bin/mongo --help查看以下
help          显示帮助
show dbs      显示数据库列表
use <db>      转到某个数据库
show collections 显示当前数据库所包含的集合（相似mysql中的表）
show users     显示当前数据库的用户列表
……           …….
……           …….
还有不少功能，这里就不列
4.增删改查
1.新增
db.collection.insert({x:”a”,y:”b”,z:”c”})
意思是向collection集合中插入一条数据，格式是x:”a”,y:”b”,z:”c” 这个格式相似json,能够自定义，collection是集合名称，好比向user集合中插入数据{name=”老王”，info=”老王是一种神秘的生物，他每每就住在你隔壁”}能够这么写
db.user.insert({name:”老王”,info:”老王是一种神秘的生物，他每每就住在你隔壁”})
mongodb3.2版还增长了 db.collection.insertOne()和db.collection.insertMany() 两个方法
db.collection.insertMany([
   {x:”a”,y:”b”,z:”c”},
   {x:”1”,y:”2”,z:”3”},
   {……………………….},
          …
          …
          …
])
这个方法效率很高。使用命令行的时候，能够先设置一个变量A=[{…},{…},{…},{…}]
在使用db.collection.insertMany(A)
2.删除
db.collection.remove({x:”a”})
意思是删除collection集合中x=”a”的全部数据
   3.修改
       db.collection.update({
           {x:”a”},
           {$set{y:”1”}},
           {multi:true}     //true:匹配多行 false:匹配一行默认为false
})
至关于 update collection set y=”1” where x=”a”
4.查询     mongodb语句        对应sql语句
   db.collection.find()        select * from collection
   db.collection.find({x:”a”})   select * from collection where x=”a”;
   db.collection.find({x:”a”,z:{$lt:”3”}})   select * from collection where x=”a” and z<3
   db.collection.find({x:”a”,$or{z:{$lt:”3”}}}) select * from collection where x=”a” or z<3
   db.collection.find({x:”a”},{x:1,y:0,z:1}) select x,z from collection where x=”a”

   游标
   var myCursor=db.collection.find();
   while(myCursor.hasNext()){
      print(myCursor.next())
}
更多高级应用请查阅 db.collection.bulkWrite()
Mongodb命令与sql的对比请移步至 https://docs.mongodb.com/manual/reference/sql-comparison/ 查看
5.数据处理
1.Aggregation
Aggregations operations process data records and return computed results
Aggregation处理数据记录并返回处理结果，类比于关系数据库中的函数，存贮过程之类。限制是返回的数据要<16M
假设collecion中包含x,y,z
db.collection.aggregate([
{$match:{x:”a”}},
{$group:{_id:$y,total:{$sum:z}}}
])
说明：$group必定要有_id,这个_id能够是collection里边的某个列名,等价于
select y as _id,sum(z) as total from aggregate where x=”a” group by y
$match,$group,$limit,$sort,等方法请参考https://docs.mongodb.com/manual/meta/aggregation-quick-reference/
Aggregation与sql的对应关系请参考
https://docs.mongodb.com/manual/reference/sql-aggregation-comparison/sql

2.Map-Reduce
Map-reduce is a data processing paradigm for condensing large volumes of data into useful aggregated results
大概意思是Map-reduce能对大数据量进行处理，并返回处理的结果
db.collection.mapReduce(
             function(){
emite(this.x,this.z);        //map
},
             function(key,values){
                return Array.sum(values);   //reduce
},
             {
               query:{y:”b”},             //query
               out:”test”                //output to collection name test
             }
)
6.数据模型
单文档操做是原子性的，支持事务
多文档操做是并发的，不支持事务
设置文档验证模型
db.createCollection(“contacts”,
{ validator:{[
      {phone:{$type:”string”}},
      {email:{$regex:/@qq.com$/}}
     ]}
})
db.runCommand(
{
    collMod:”contacts”,
    validator:{[
         {phone:{$type:”string”}},
         {email:{$regex:/@qq.com$/}}
       ]}
      validationLevel:”moderate”
}
)
You cannot specify a validator for collections in the admin, local, and config databases.
注意：您没法对admin,local,config数据库使用文档验证模型
7.管理
1.多线程
mmapv1 provides collection-level locking
意思是同一时间对collection里边的document操做只能有惟一一个读或写
wiredTiger supports concurrent access by readers and writers to the documents in a collection. Clients can read documents while write operations are in progress, and multiple threads can modify different documents in a collection at the same time
意思是wiredTiger支持同时对某一文档进行读写，同时对某一集合下的不一样文档进行修改，说白了就是wiredTiger是document-level locking 且读写对document可同时存在.
2.数据持久化
MongoDB uses write ahead logging to an on-disk journal. Journaling guarantees that MongoDB can quickly recover write operations that were written to the journal but not written to data files in cases where mongod terminated due to a crash or other serious failure.
Mongodb 使用wal日志（写以前先记录到日志中）来保证快速恢复。
3.硬件要求
wiredTiger越多的cpu核心越好，最少2核心.
Mongodb3.2默认使用60%的内存，且最小1GB，推荐是单机内存10G
启动时能够经过mongod –wiredTigerCacheSizeGB=xxx 来设置使用多少内存
wiredTigerCacheSizeGB: Defines the maximum size of the internal cache that WiredTiger will use for all data. mongodb

With the WiredTiger storage engine, use of XFS is strongly recommended to avoid performance issues that may occur when using EXT4 with WiredTiger
使用wiredTiger引擎强烈建议使用xfs文件系统。xfs文件系统比ext4文件系统对mongodb更友好shell

集群的时候要保证时间同步（NTP）
4.性能优化
For read-heavy applications, increase the size of your replica set and distribute read operations to secondary members.数据库

For write-heavy applications, deploy sharding and add one or more shards to a sharded cluster to distribute load among mongod instances.
针对读性能优化是增长replication（备份）来分担读压力
针对写性能优化是增长sharding（分片）来减轻写压力json

//对某个数据库使用慢操做日志
db.setProfilingLevel(0,100)，设置这个之后就能够查看相应的慢操做了
0：关闭默认
1：打开仅记录慢操做
2：打开记录全部操做
慢操做定义为100毫秒
查看慢操做 db.system.profile.find( { millis : { $gt : 100 } } )
查看profile的定义 db.getProfilingStatus()
//对全部数据库使用慢操做日志
mongod --profile 1 --slowms 15
5.配置文件
设置查询时间 db.collection.find().maxTimeMS(30)
设置命令时间 db.runCommand(
                {   distinct:”collection”,
                    key:”a”,
                    maxTimeMS:40
                 }
}
        获取当前操做 db.currentOp()
        中止当前操做 db.killOp(<opid>)
6.备份
最简单的备份是使用MongoDB Cloud Manager 或者 Ops Managerapi

Back Up with Filesystem Snapshots   文件快照备份
You can create a backup of a MongoDB deployment by making a copy of MongoDB’s underlying data files.   直接复制数据文件
文件快照要求 To get a correct snapshot of a running mongod process, you must have journaling enabled 必须打开journaling
步骤 1 db.fsyncLock() //上同步锁
     2 复制数据文件
     3 db.fsyncUnLock() //解锁

      Back Up with mongodump 使用 mongodump进行备份适合单机，集群备份请使用专业工具
      simple and efficient tools for backing up and restoring small MongoDB deployments, but are not ideal for capturing backups of larger systems. 快速有效的备份，但不适于大量数据的备份
      mongodump only captures the documents in the database 注意，mongodump只备份文档，对于索引等其余的是不备份的。
      步骤 mongodump --collection myCollection --db test --out /backup/dir
      具体参数请自行查看 mongodump –help
      恢复备份 mongorestore   <path to backup>
7.运行状态
mongostat captures and returns the counts of database operations by type
mongostat 显示操做（增删改查）的次数数组

mongotop tracks and reports the current read and write activity of a MongoDB instance, and reports these statistics on a per collection basis.
mongotop 对当前的mongodb实例进行跟踪和报告

Mongodb web 控制台 http://localhost:port port=mongod的port+1000 默认是28017

db.serverStatus() from the shell, returns a general overview of the status of the database
db.serverStatus()能够查看整个mongodb的状态

db.stats() from the shell, returns a document that addresses storage use and data volumes.
db.stats() 返回当前数据库的状态

      db.collection.stats() provides statistics that resemble dbStats on the collection level
      db.collection.stats() 返回当前集合的状态

      rs.status() returns an overview of your replica sets status
      rs.status() 返回集群的状态

8.索引
格式 db.collection.createIndex( {<key and index type specification>, <options>},{xxx1:xxx2} )
key:要索引的列名 options: 1|-1 顺序仍是逆序索引
xxx1:索引参数名 xxx2:索引参数值
对大量数据索引会影响性能，因此通常会实行后台索引 db.collection.createIndex({key:options},{backgroup:true})
对索引进行命名
db.collection.createIndex({key:options},{name:yourName })

复合索引 db.collection.createIndex( { <field1>: <type>, <field2>: <type2>, ... } )
There can be no more than 31 fields in a compound index
复合索引不容许超过31个复合列，item和stock不能同时为数组
举例说明： db.products.createIndex( { "item": 1, "stock": 1 } )
The index will contain references to documents sorted first by the values of the item field and, within each value of the item field, sorted by values of the stock field
首先先按item排序，每一个item内部再按stock排序

Text索引尽可能不要使用，很耗时耗内存
db.reviews.createIndex( { comments: "text" } )
db.reviews.createIndex( { title:”text”,comments: "text" } ) //复合索引
db.reviews.createIndex( { title:”text”,comments: "text" },{weight:{title:2,comments:5}} ) 权重
db.collection.dropIndex() //删除索引
db.collection.getIndexes() //获取索引

Hash索引限制：没法使用复合索引
db.collection.createIndex( { _id: "hashed" } )
db.collection.createIndex( { _id: "hashed" ,name:”hashed”} ) 注意，这是错误的

TTL索引 Time To Live
限制：没法对_id使用，不支持复合索引，eventlog不能使capped collection
db.eventlog.createIndex( { name: 1 }, { expireAfterSeconds: 3600 } )

惟一索引
db.members.createIndex( { "user_id": 1 }, { unique: true } )

稀疏索引
Sparse indexes only contain entries for documents that have the indexed field, even if the index field contains a null value
稀疏索引只索引包含该索引列的文档，注意document的格式是不固定的。
db.addresses.createIndex( { "xmpp_id": 1 }, { sparse: true } )

集群索引
1.Stop One Secondary                mongod --port 47017
2.Build the Index                    db.records.createIndex( { username: 1 } )
3.Restart the Program mongod         mongod --port 27017 --replSet rs0
第一步，中止mongod,在另外一个端口启动单例mongod。
第二步，在其上建立索引。
第三步，中止单例mongod,使用原先的集群配置再次启动。

注意，官网如是说
When building an index on a collection, the database that holds the collection is unavailable for read or write operations until the index build completes
创建索引的时候，mongodb不容许读写直到索引完成。除非加上参数backgroup:true
For replica sets, secondaries will begin building indexes after the primary finishes building the index. In sharded clusters, the mongos will send createIndex() to the primary members of the replica set for each shard, which then replicate to the secondaries after the primary finishes building the index.
Replica创建索引的时候，主节点先建索引，完成后，子节点在创建索引。
在Sharded(分片)集群中，主节点的各个分片先创建索引，完成后，轮到子节点各个分片Ensure that your oplog is large enough to permit the indexing or re-indexing operation to complete without falling too far behind to catch up
集群创建索引的时候，要设置足够大的oplog，不然会报错。
查看索引性能 report使用zipcode索引的性能
db.people.find(
{ name: "John Doe", zipcode: { $gt: "63000" } }
).hint( { zipcode: 1 } ).explain("executionStats")

db.collection.totalIndexSize() 查看索引的大小，物理内存最好大于它

9.存贮引擎
WiredTiger is the default storage engine starting in MongoDB 3.2. It is well-suited for most workloads and is recommended for new deployments. WiredTiger provides a document-level concurrency model, checkpointing, and compression, among other features. In MongoDB Enterprise, WiredTiger also supports Encryption at Rest.
大多数状况下使用wiredTiger引擎有更好的表现，并且它支持文档锁（document-level）

MMAPv1 is the original MongoDB storage engine and is the default storage engine for MongoDB versions before 3.2. It performs well on workloads with high volumes of reads and writes, as well as in-place updates.
Mmapv1引擎在大规模高并发读写方面，性能更好但仍是不建议使用，它的故障恢复功能较弱，只有经过日志和备份恢复。利用多核CPU方面也作的很差。

The In-Memory Storage Engine is available in MongoDB Enterprise. Rather than storing documents on-disk, it retains them in-memory for more predictable data latencies.
In-Memory引擎只有企业版才有，使用内存，速度更快，单对内存要求很高。

WiredTiger引擎
Document Level Concurrency   文档级别的并行性，mmapv1是集合级别的并行性
Snapshots and Checkpoints     快照和检查点能在mongodb cash的时候快速恢复
Journal（write-ahead transaction log）日志，有助于快速恢复
Compression                 支持对集合和文档的压缩
Memory Use                 60% RAM或者最小1GB 使用率更高，高效用多核CUP

In-Memory引擎
the in-memory storage engine does not maintain any on-disk data, including configuration data, indexes, user credentials, etc.
    全部数据都放内存中，风险很高，适合于特别要求性能，且对宕机后可快速恢复的数据，如省市区信息，这种不常常写入的数据。
    mongod --storageEngine inMemory --dbpath <path> 启用In-Memory引擎
    Warming: The in-memory storage engine does not persist data after process shutdown.
recovery of in-memory data is impossible
    官网有个大大的提醒：内存引擎是不持久化数据的，就是一旦宕机了，就啥也没了。
Memory Use      50% RAM 或者最小1GB

GridFS 适用于大于16M的文件，若是小于，官网建议使用document
When to use 何时用
If your filesystem limits the number of files in a directory
当文件系统限制文件数量的时候
When you want to access information from portions of large files without having to load whole files into memory
当仅仅使用部分文件的内容时
When you want to keep your files and metadata automatically synced and deployed across a number of systems and facilities
但愿保证文件同步的时候

chunks stores the binary chunks. chunks集合存贮文件的二进制块
files stores the file’s metadata. files集合存贮文件的描述信息

10.安全
建立用户
use reporting
db.createUser(
{
    user: "reportsUser",
    pwd: "12345678",
    roles: [
       { role: "read", db: "reporting" },   //能够对reporting读
       { role: "read", db: "products" },    //能够对products读
       { role: "read", db: "sales" },       //能够对sales读
       { role: "readWrite", db: "accounts" } //能够对accounts读写
    ]
}
)
建立角色
use admin
db.createRole(
   {
     role: "mongostatRole",    //角色名
     privileges: [             //权限
       { resource: { cluster: true }, actions: [ "serverStatus" ] }
     ],
     roles: []    //继承哪些角色
   }
)
分配角色
use reporting
db.grantRolesToUser(
    "reportsUser",
    [
      { role: "read", db: "accounts" }
    ]
)

db.getRole( "read", { showPrivileges: true } ) 查看角色权限
Mongodb内建角色
Read    provides the ability to read data on all non-system collections
读      提供除了系统表以外其余表的读取功能
readWrite provides all the privileges of the read role and the ability to modify data on all non-system collections
   读写    提供除系统表以外其余表的读写功能
   详细资料请参考 https://docs.mongodb.com/manual/core/security-built-in-roles/
须要注意的一点:
a role can only include privileges that apply to its database and can only inherit from other roles in its database，except for database admin.
意思是，建立的角色只对当前数据库有效，除了在admin数据库中建立的角色。Admin中建立的角色能够对其余数据库进行操做
Ensure that the HTTP status interface, the REST API, and the JSON API are all disabled in production environments to prevent potential data exposure and vulnerability to attackers.
生产环境的时候请确保关闭http,rest api,json api 功能。

使用bing_ip限制访问的ip
关于相关权限处理的方法请参考 https://docs.mongodb.com/manual/reference/security/
11.集群
A replica set in MongoDB is a group of mongod processes that maintain the same data set
The primary node receives all write operations。
replica set 可以集合多个mongod进程为同一份数据服务，且只有主节点才能写。
The secondaries replicate the primary’s oplog and apply the operations to their data sets such that the secondaries’ data sets reflect the primary’s data set
子节点依赖于主节点的日志进行数据同步。
Replica set members send heartbeats (pings) to each other every two seconds. If a heartbeat does not return within 10 seconds, the other members mark the delinquent member as inaccessible
集群中各个节点彼此发送ping消息，每2秒钟一次。若是某个节点10秒内未能应答，则集群就会标识这个节点没法链接。
The purpose of an arbiter is to maintain a quorum in a replica set by responding to heartbeat and election requests by other replica set members
arbiter 的做用是检测集群的健康情况，并进行选举。
If your replica set has an even number of members, add an arbiter to obtain a majority of votes in an election for primary
建议是，当集群数量不少的时候，单独一台机子安装arbiter，专门负责集群的选举。
When a primary does not communicate with the other members of the set for more than 10 seconds, an eligible secondary will hold an election to elect itself the new primary
当子节点连接主节点时间超过10秒，子节点会认为，主节点crash了，而后子节点对集群发出选举，提名本身成为新的主节点。
Primary : receives all write operations.
Seconderies: replicate operations from the primary to maintain an identical data set
全部写均要通过主节点，全部子节点均是主节点的一个备份。
A replica set can have up to 50 members but only 7 voting members
Replica set 目前只容许最多50个节点，容许最多有7个投票节点。

You can configure a secondary member for a specific purpose
出于各类目的，你能够随意设置子节点。
Prevent it from becoming a primary in an election, which allows it to reside in a secondary data center or to serve as a cold standby. See Priority 0 Replica Set Members.
使子节点成为一个冷备份，能够设置priority:0，这样，子节点就不能参与选举了
Prevent applications from reading from it, which allows it to run applications that require separation from normal traffic. See Hidden Replica Set Members.
使子节点成为隐藏节点，能够在集群故障的时候快速替换
Keep a running “historical” snapshot for use in recovery from certain errors, such as unintentionally deleted databases. See Delayed Replica Set Members.
使子节点成为延迟备份，能够在集群人为误操做后快速还原到前半小时或一小时的状态，延迟时间能够配置，默认是一小时。

If your deployment requires more than 50 members, you’ll need to use master-slave replication. However, master-slave replication lacks the automatic failover capabilities.
注意：若是数据过多，须要的集群规模很大，超过50个节点，那么，repleca set目前来讲，不适合，你须要转换成master-slave模式（最简单的集群，缺少故障自动恢复能力）

Priority 0 Replica Set Members      设置priority的目的
A priority 0 member is a secondary that cannot become primary
A priority 0 member can function as a standby
     in sets with varied hardware or geographic distribution, a priority 0 standby ensures that only qualified members become primary
   Hidden Replica Set Members       设置隐藏节点的目的
     A hidden member maintains a copy of the primary’s data set but is invisible to client applications
     Use hidden members for dedicated tasks such as reporting and backups
   Delayed Replica Set Members       设置延迟节点的目的
     a delayed member’s data set reflects an earlier, or delayed, state of the set
     Must be priority 0 members. Set the priority to 0 to prevent a delayed member from becoming primary.
Should be hidden members. Always prevent applications from seeing and querying delayed members

   Oplog      日志
     The oplog (operations log) is a special capped collection that keeps a rolling record of all operations that modify the data stored in your databases.
     记录全部的操做，能够经过它来恢复数据。
     The default oplog size depends on the storage engine:
Engine   Default Size    Lower Bound       Upper Bound
In-Memory Storage Engine 5% of physical memory 50 MB 50 GB
WiredTiger Storage Engine 5% of free disk space 990 MB 50 GB
MMAPv1 Storage Engine 5% of free disk space 990 MB 50 GB
默认大小
引擎        大小         下限        上限
In-Memory   5%内存       50M        50G
      wiredTiger   5%剩余磁盘   990M       50G
      MMAPv1    5%剩余磁盘   990M       50G

Master-Slave 主从集群，操做简单，没有故障自动恢复功能，不作重点讲解
mongod --master --dbpath /data/masterdb/ --oplogSize 1024 启动master
mongod --slave --source <masterhostname><:<port>> --dbpath /data/slavedb/ 启动slave
注意，日志必定要足够大，缘由在于，master节点接受写要求，数据同时记录到日志中，而后，master节点把日志分发给各个slave节点，slave节点执行日志的内容，以达到数据同步。一旦大规模写，而日志容量过小就有可能形成，主从不一样步，数据不一致的状况。
rs.printReplicationInfo()，rs.printSlaveReplicationInfo() 查看主节点状态，从节点状态
更多方法请参考https://docs.mongodb.com/manual/reference/replication/

mongod --replSet "rs0"
rs.initiate()   //初始化
rs.conf()     //查看配置
rs.add("mongodb1.example.net")   //添加节点
rs.status()   //查看状态
以上全部都必须在主节点上进行

修改节点优先级
cfg = rs.conf()
cfg.members[2].priority = 0.5 //修改节点的优先级
rs.reconfig(cfg)

添加arbiter节点
rs.addArb("m1.example.net:30000")

可用参数
rs.add({_id: 1, host: "mongodb3.example.net:27017", priority: 0, hidden: true})
rs.add({_id: 1, host: "mongodb3.example.net:27017", priority: 0, hidden: true, slaveDelay:3600})
{
_id: <string>,
version: <int>,
protocolVersion: <number>,
members: [
    {
      _id: <int>,
      host: <string>,
      arbiterOnly: <boolean>,
      buildIndexes: <boolean>,
      hidden: <boolean>,
      priority: <number>,
      tags: <document>,
      slaveDelay: <int>,
      votes: <number>
    },
    ...
],
settings: {
    chainingAllowed : <boolean>,
    heartbeatIntervalMillis : <int>,
    heartbeatTimeoutSecs: <int>,
    electionTimeoutMillis : <int>,
    getLastErrorModes : <document>,
    getLastErrorDefaults : <document>
}
}

移除已有节点
rs.remove("mongod3.example.net:27017")
对应节点关闭mongod进程 db.shutdownServer()

12.分片
Sharding is a method for distributing data across multiple machines. MongoDB uses sharding to support deployments with very large data sets and high throughput operations.
Sharding就是将大规模数据分片，每一个分片各自占用一个mongod进程，方便处理。ar

Sharding系统包含如下几个进程
Shard: Each shard contains a subset of the sharded data
      每一个分片包含整个数据的一小部分
Mongos: The mongos acts as a query router
      起到路由的做用
Config Server: Config servers store metadata and configuration settings for the cluster
      存贮分片集群的配置信息

分片好处：分布式读写，大规模存贮，高可用性

For queries that include the shard key or the prefix of a compound shard key, mongos can target the query at a specific shard or set of shards
对于前缀带shard key的复合查询能够快速定位到某个分片，无需扫描整个数据。
这里点出了一个性能优化要点：分片集群的增删改查尽可能使用shard key
For queries that do not include the shard key or the prefix of a compound shard key, mongos performs a broadcast operation, querying all shards in the sharded cluster
对于不包含shard key的查询，就须要扫描全部分片，而后再聚合起来，影响速度。

In production environments, individual shards should be deployed as replica sets
Mongodb3.2版本提供shard的replica，在商用上，应该设置分片的备份。

You cannot change the shard key after sharding, nor can you unshard a sharded collection
注意：以key=x分片以后就不能在修改成key=y了，集群分片以后也没法还原为未分片前的状态了。

分片策略    优势                     缺点                   适用
Hash      数据均匀分布             顺序查询慢             基本适用
Range     可针对某个阶段分片       数据容易出现头重脚轻   按阶段查询

Hash sharp key

Range sharp key

Sharp key 限制
You cannot select a different shard key for that collection.
一旦选定sharp key就不能再更改
You cannot update the values of the shard key fields.
不容许更新sharp key
You cannot shard a collection that has unique indexes on other fields.
有除sharp key列外其余列的惟一索引的集合不能分片
You cannot create unique indexes on other fields for a sharded collection
分片集群不能建立其余列的惟一索引

sh.shardCollection( "database.collection", { <field> : "hashed" } ) 建立hash sharp key

     建立Hash分片集群
1.Create the Config Server Replica Set
1）Start each member of the config server replica set.
           mongod --configsvr --replSet <setname> --dbpath <path>
        2）Connect to one of the config servers
           mongo --host <hostname> --port <port>
        3）initiates the replica set
           rs.initiate(
              {
                 _id: "<replSetName>",
                 configsvr: true,           注意必定要有这个
                 members: [
                   { _id : 0, host : "cfg1.example.net:27017" },
                   { _id : 1, host : "cfg2.example.net:27017" },
                   { _id : 2, host : "cfg3.example.net:27017" }
                 ]
               }
)
2.Create the Shard Replica Sets
1）Start each member of the shard replica set.
mongod --shardsvr --replSet <replSetname>
2）Connect to a member of the shard replica set
mongo --host <hostname> --port <port>
3）Initiate the replica set.
            rs.initiate(
              {
                 _id: "<replSetName>",
                 configsvr: true,
                 members: [
                   { _id : 0, host : "cfg1.example.net:27017" },
                   { _id : 1, host : "cfg2.example.net:27017" },
                   { _id : 2, host : "cfg3.example.net:27017" }
                 ]
               }
)
3.Connect a mongos to the Sharded Cluster
1）Connect a mongos to the cluster
mongos --configdb <configReplSetName>/host1:port,host2:port,……
2）Connect to the mongos
mongo --host <hostname> --port <port>
4.Add Shards to the Cluster
sh.addShard( "<replSetName>/host:port ")
5.Enable Sharding for a Database
sh.enableSharding("<database>")
6.Shard a Collection using Hashed Sharding
If the collection already contains data, you must create a Hashed Indexes on the shard key using the db.collection.createIndex() method before using shardCollection().

sh.shardCollection("<database>.<collection>", { <key> : <direction> } )

建立Range分片集群
步骤和建立Hash分片集群一致，除了第6步，Range分片建立的是普通索引，而Hash分片建立的是hash索引

相关分片命令参考https://docs.mongodb.com/manual/reference/sharding/13.性能测试测试前期准备已经完毕，因为网络的缘由，目前机子不可用，等网络通了，就能够进一步完善了。