map，hash_map和unordered_map 实现比较

时间 2019-11-09

标签 map hash unordered 实现比较繁體版

原文原文链接

map介绍

Map是STL[1]的一个关联容器，它提供一对一（其中第一个能够称为关键字，每一个关键字只能在map中出现一次，第二个可能称为该关键字的值）的数据处理能力，因为这个特性，它完成有可能在咱们处理一对一数据的时候，在编程上提供快速通道。这里说下map内部数据的组织，map内部自建一颗红黑树（一种非严格意义上的平衡二叉树），这颗树具备对数据自动排序的功能，因此在map内部全部的数据都是有序的，后边咱们会见识到有序的好处。java

hash_map介绍

hash_map基于hash table（哈希表）。哈希表最大的优势，就是把数据的存储和查找消耗的时间大大下降，几乎能够当作是常数时间；而代价仅仅是消耗比较多的内存。然而在当前可利用内存愈来愈多的状况下，用空间换时间的作法是值得的。另外，编码比较容易也是它的特色之一。ios

其基本原理是：使用一个下标范围比较大的数组来存储元素。能够设计一个函数（哈希函数，也叫作散列函数），使得每一个元素的关键字都与一个函数值（即数组下标，hash值）相对应，因而用这个数组单元来存储这个元素；也能够简单的理解为，按照关键字为每个元素“分类”，而后将这个元素存储在相应“类”所对应的地方，称为桶。编程

可是，不可以保证每一个元素的关键字与函数值是一一对应的，所以极有可能出现对于不一样的元素，却计算出了相同的函数值，这样就产生了“冲突”，换句话说，就是把不一样的元素分在了相同的“类”之中。总的来讲，“直接定址”与“解决冲突”是哈希表的两大特色。数组

hash_map，首先分配一大片内存，造成许多桶。是利用hash函数，对key进行映射到不一样区域（桶）进行保存。其插入过程是：app

1.获得key
2.经过hash函数获得hash值
3.获得桶号(通常都为hash值对桶数求模)
4.存放key和value在桶内。
其取值过程是:
1.获得key
2.经过hash函数获得hash值
3.获得桶号(通常都为hash值对桶数求模)
4.比较桶的内部元素是否与key相等，若都不相等，则没有找到。
5.取出相等的记录的value。
hash_map中直接地址用hash函数生成，解决冲突，用比较函数解决。这里能够看出，若是每一个桶内部只有一个元素，那么查找的时候只有一次比较。当许多桶内没有值时，许多查询就会更快了(指查不到的时候).less

因而可知，要实现哈希表, 和用户相关的是：hash函数和比较函数。这两个参数恰好是咱们在使用hash_map时须要指定的参数。 ide

unordered_map介绍

Unordered maps are associative containers that store elements formed by the combination of a key value and amapped value, and which allows for fast retrieval of individual elements based on their keys.

In an unordered_map, the key value is generally used to uniquely identify the element, while the mapped value is an object with the content associated to this key. Types of key and mapped value may differ.

Internally, the elements in the unordered_map are not sorted in any particular order with respect to either theirkey or mapped values, but organized into buckets depending on their hash values to allow for fast access to individual elements directly by their key values (with a constant average time complexity on average).

unordered_map containers are faster than map containers to access individual elements by their key, although they are generally less efficient for range iteration through a subset of their elements.

Unordered maps implement the direct access operator (operator[]) which allows for direct access of themapped value using its key value as argument.

函数

unordered_map与map的区别

boost::unordered_map，它与 stl::map的区别就是，stl::map是按照operator<比较判断元素是否相同，以及比较元素的大小，而后选择合适的位置插入到树中。因此，若是对map进行遍历（中序遍历）的话，输出的结果是有序的。顺序就是按照operator< 定义的大小排序。
而boost::unordered_map是计算元素的Hash值，根据Hash值判断元素是否相同。因此，对unordered_map进行遍历，结果是无序的。
用法的区别就是，stl::map 的key须要定义operator< 。而boost::unordered_map须要定义hash_value函数而且重载operator==。对于内置类型，如string，这些都不用操心。对于自定义的类型作key，就须要本身重载operator< 或者hash_value()了。
最后，说，当不须要结果排好序时，最好用unordered_map。
其实，stl::map对于与java中的TreeMap，而boost::unordered_map对应于java中的HashMap。测试

[cpp] view plain copy
 
/** 
比较map、hash_map和unordered_map的执行效率以及内存占用状况 
**/  
  
#include <sys/types.h>  
#include <unistd.h>  
#include <sys/time.h>   
#include <iostream>  
#include <fstream>  
#include <string>  
#include <map>  
#include <ext/hash_map>  
#include <tr1/unordered_map>  
  
using namespace std;  
  
using namespace __gnu_cxx;  
  
using namespace std::tr1;  
  
#define N 100000000  //分别测试N=100,000、N=1,000,000、N=10,000,000以及N=100,000,000  
  
//分别定义MapKey=map<int,int>、hash_map<int,int>、unordered_map<int,int>  
//typedef map<int,int> MapKey;          //采用map  
//typedef hash_map<int,int> MapKey;     //采用hash_map  
typedef unordered_map<int,int> MapKey;  //采用unordered_map  
  
  
  
int GetPidMem(pid_t pid,string& memsize)  
{  
    char filename[1024];  
      
    snprintf(filename,sizeof(filename),"/proc/%d/status",pid);  
      
    ifstream fin;  
      
    fin.open(filename,ios::in);  
    if (! fin.is_open())  
    {  
        cout<<"open "<<filename<<" error!"<<endl;  
        return (-1);  
    }  
      
    char buf[1024];  
    char size[100];  
    char unit[100];  
      
    while(fin.getline(buf,sizeof(buf)-1))  
    {  
        if (0 != strncmp(buf,"VmRSS:",6))  
            continue;  
          
        sscanf(buf+6,"%s%s",size,unit);  
          
        memsize = string(size)+string(unit);  
    }  
      
    fin.close();  
      
    return 0;  
}  
  
int main(int argc, char *argv[])  
{  
    struct timeval begin;  
      
    struct timeval end;  
          
    MapKey MyMap;  
      
    gettimeofday(&begin,NULL);  
      
    for(int i=0;i<N;++i)  
        MyMap.insert(make_pair(i,i));  
      
    gettimeofday(&end,NULL);  
      
    cout<<"insert N="<<N<<",cost="<<end.tv_sec-begin.tv_sec + float(end.tv_usec-begin.tv_usec)/1000000<<" sec"<<endl;  
      
    for(int i=0;i<N;++i)  
        MyMap.find(i);  
  
    gettimeofday(&end,NULL);  
      
    cout<<"insert and getall N="<<N<<",cost="<<end.tv_sec-begin.tv_sec + float(end.tv_usec-begin.tv_usec)/1000000<<" sec"<<endl;  
      
    string memsize;  
      
    GetPidMem(getpid(),memsize);  
      
    cout<<memsize<<endl;  
      
    return 0;  
}

运行结果

记录数N=100000时，结果以下：this

Map类型	插入耗时，单位秒	插入加遍历耗时，单位秒	内存占用状况
map	0.110705	0.171859	5,780kB
hash_map	0.079074	0.091751	5,760kB
unordered_map	0.041311	0.050298	5,216kB

记录数N=1000000时，结果以下：

Map类型	插入耗时，单位秒	插入加遍历耗时，单位秒	内存占用状况
map	1.22678	1.95435	47,960kB
hash_map	0.684772	0.814646	44,632kB
unordered_map	0.311155	0.386898	40,604kB

记录数N=10000000时，结果以下：

Map类型	插入耗时，单位秒	插入加遍历耗时，单位秒	内存占用状况
map	14.9517	23.9928	469,844kB
hash_map	5.93318	7.18117	411,904kB
unordered_map	3.64201	4.43355	453,920kB

记录数N=100000000时，结果以下：

Map类型	插入耗时，单位秒	插入加遍历耗时，单位秒	内存占用状况
map	167.941	251.591	4,688,692kB
hash_map	46.3518	57.6972	3,912,632kB
unordered_map	28.359	35.122	4,3012,56kB

结果分析

运行效率方面：unordered_map最高，hash_map其次，而map效率最低

占用内存方面：hash_map内存占用最低，unordered_map其次，而map占用最高

stl::map

[cpp] view plain copy

#include<string>
#include<iostream>
#include<map>
using namespace std;
struct person
{
string name;
int age;
person(string name, int age)
{
this->name = name;
this->age = age;
}
bool operator < (const person& p) const
{
return this->age < p.age;
}
};
map<person,int> m;
int main()
{
person p1("Tom1",20);
person p2("Tom2",22);
person p3("Tom3",22);
person p4("Tom4",23);
person p5("Tom5",24);
m.insert(make_pair(p3, 100));
m.insert(make_pair(p4, 100));
m.insert(make_pair(p5, 100));
m.insert(make_pair(p1, 100));
m.insert(make_pair(p2, 100));
for(map<person, int>::iterator iter = m.begin(); iter != m.end(); iter++)
{
cout<<iter->first.name<<"\t"<<iter->first.age<<endl;
}
return 0;
}

output:

Tom1 20
Tom3 22
Tom4 23
Tom5 24

operator<的重载必定要定义成const。由于map内部实现时调用operator<的函数好像是const。

因为operator<比较的只是age,因此由于Tom2和Tom3的age相同，因此最终结果里面只有Tom3，没有Tom2

boost::unordered_map

[cpp] view plain copy

#include<string>
#include<iostream>
#include<boost/unordered_map.hpp>
using namespace std;
struct person
{
string name;
int age;
person(string name, int age)
{
this->name = name;
this->age = age;
}
bool operator== (const person& p) const
{
return name==p.name && age==p.age;
}
};
size_t hash_value(const person& p)
{
size_t seed = 0;
boost::hash_combine(seed, boost::hash_value(p.name));
boost::hash_combine(seed, boost::hash_value(p.age));
return seed;
}
int main()
{
typedef boost::unordered_map<person,int> umap;
umap m;
person p1("Tom1",20);
person p2("Tom2",22);
person p3("Tom3",22);
person p4("Tom4",23);
person p5("Tom5",24);
m.insert(umap::value_type(p3, 100));
m.insert(umap::value_type(p4, 100));
m.insert(umap::value_type(p5, 100));
m.insert(umap::value_type(p1, 100));
m.insert(umap::value_type(p2, 100));
for(umap::iterator iter = m.begin(); iter != m.end(); iter++)
{
cout<<iter->first.name<<"\t"<<iter->first.age<<endl;
}
return 0;
}

输出

Tom1 20
Tom5 24
Tom4 23
Tom2 22
Tom3 22

必需要自定义operator==和hash_value。重载operator==是由于，若是两个元素的hash_value的值相同，并不能判定这两个元素就相同，必须再调用operator==。固然，若是hash_value的值不一样，就不须要调用operator==了。