JavaScript数组高性能去重解决方案

时间 2019-11-16

标签 javascript 数组高性能解决方案栏目 JavaScript 繁體版

原文原文链接

在大多数的人眼里，数组去重是一个很简单的课题，不少人甚至熟练掌握了多种数组去重的方法，然而大多时候，咱们却忽略了数组去重所消耗的时间资源。譬如咱们在作前端性能优化的时候，又有多少人会考虑JavaScript的运行性能。今天，我将经过一组测试数据来给你们展现高性能数组去重的必要性。固然以上仅针对像我这样的强迫症患者，😄。前端

先展现下结论，有些不喜欢看过程的同窗能够直接拿去用，固然你也可使用本人的高性能js工具集：npm i efficient-jsgit

// 最高性能数组去重方法 10万数量级：3毫秒，100万数量级：6毫秒，1000万数量级36毫秒
Array.prototype.distinct = function () {
  var hash=[];
  var obj = {};
  for (i = 0; this[i] != null; i++) {
    if(!obj[this[i]]){
      hash.push(this[i]);
      obj[this[i]] = this[i];
    }
  }
  return hash;
}

1、收集数组去重的方法github

一、遍历数组法：实现思路：新建一个数组，遍历去要重的数组，当值不在新数组的时候（indexOf为-1）就加入该新数组中；npm

二、数组下标判断法：实现思路：若是当前数组的第 i 项在当前数组中第一次出现的位置不是 i，那么表示第 i 项是重复的，忽略掉。不然存入结果数组。数组

三、排序后相邻去除法：实现思路：给传入的数组排序，排序后相同的值会相邻，而后遍历排序后数组时，新数组只加入不与前一值重复的值。性能优化

四、优化遍历数组法（推荐）：实现思路：双层循环，外循环表示从0到arr.length，内循环表示从i+1到arr.length，将没重复的右边值放入新数组。（检测到有重复值时终止当前循环同时进入外层循环的下一轮判断）数据结构

五、ES6实现：dom

a、实现思路：ES6提供了新的数据结构Set。它相似于数组，可是成员的值都是惟一的，没有重复的值。Set函数能够接受一个数组（或相似数组的对象）做为参数，用来初始化。前端性能

b、实现思路：Array.filter() + indexOf函数

六、双重 for 循环（最容易理解）：实现思路：外层循环遍历元素，内层循环检查是否重复，当有重复值的时候，可使用 push()，也可使用 splice()

七、for...of + includes() （双重for循环的升级版）：实现思路：外层用 for...of 语句替换 for 循环，把内层循环改成 includes()，先建立一个空数组，当 includes() 返回 false 的时候，就将该元素 push 到空数组中，相似的，还能够用 indexOf() 来替代 includes()

八、Array.sort()：实现思路：首先使用 sort() 将数组进行排序，而后比较相邻元素是否相等，从而排除重复项

九、for...of + Object： 实现思路：利用Object惟一key的特性来实现去重

以上的方法去重都没问题，均可以实现数组去重的目的，可是性能差距很大，可能处理10000条数据之内的表现不回太明显，可是不管哪一个程序都是由不少不少的指令组成的，假如你不去关注每一条指令的优化，而想固然的想直接优化程序，那么你注定会失败。“毋以善小而不为”，虽然用在这里有些欠妥，但大致就是这个意思，一切都要从细节入手。咱们先看看上面的去重方法，方法不少，咱们需理一下，虽然上面的方法看起来思路不同，可是能够分为两类：遍历数组和直接使用内置方法

然而遍历数组的方式有不少种，甚至不止上面的这些方法，咱们首先对比遍历数组的方法的效率，而后在对比其余的

2、创建多维度的测试模板并验证

如下是测试结果的环境

首先咱们列出全部的遍历数组的方法

function getRandomIntInclusive(min, max) {
  min = Math.ceil(min);
  max = Math.floor(max);
  return Math.floor(Math.random() * (max - min + 1)) + min; //The maximum is inclusive and the minimum is inclusive 
}
var orgArray = Array.from(new Array(100000), ()=>{
    return getRandomIntInclusive(1, 1000);
})

// 普通for循环
Array.prototype.distinct1 = function () {
  var hash=[];
  for (i = 0; i < this.length; i++) {
     if(hash.indexOf(this[i])==-1){
      hash.push(this[i]);
     }
  }
  return hash;
}
// 优化版for循环
Array.prototype.distinct2 = function () {
  var hash=[];
  for (i = 0, len = this.length; i < len; i++) {
     if(hash.indexOf(this[i])==-1){
      hash.push(this[i]);
     }
  }
  return hash;
}
// 弱化版for循环
Array.prototype.distinct3 = function () {
  var hash=[];
  for (i = 0; this[i] != null; i++) {
     if(hash.indexOf(this[i])==-1){
      hash.push(this[i]);
     }
  }
  return hash;
}
// foreach
Array.prototype.distinct4 = function () {
  var hash=[];
  this.forEach(item => {
    if(hash.indexOf(item)==-1){
      hash.push(item);
     }
  })
  return hash;
}
// foreach变种
Array.prototype.distinct5 = function () {
  var hash=[];
  Array.prototype.forEach.call(this, item => {
    if(hash.indexOf(item)==-1){
      hash.push(item);
     }
  })
  return hash;
}
// forin
Array.prototype.distinct6 = function () {
  var hash=[];
  for (key in this) {
　　　var item = this[key]
     if(hash.indexOf(item)==-1){
      hash.push(item);
     }
  }
  return hash;
}
// map
Array.prototype.distinct7 = function () {
  var hash=[];
  this.map(item => {
     if(hash.indexOf(item)==-1){
      hash.push(item);
     }
  });
  return hash;
}
// forof
Array.prototype.distinct8 = function () {
  var hash=[];
  for (let item of this) {
     if(hash.indexOf(item)==-1){
      hash.push(item);
     }
  }
  return hash;
}
var startTime,endTime, rtn;
function test(types) {
  types.forEach(type => {
    startTime = new Date();
    rtn = orgArray[type]();
    endTime = new Date();
    console.log(`数量级[${orgArray.length/10000}万]去重后数组长度为${rtn.length},使用${type}消耗时长${endTime - startTime}毫秒`)
    console.log('----------------------------------------------------------------------');
  })
}
var testArray = [];
for (i = 1; i <= 8;i++) {
  testArray.push('distinct' + i);
}
test(testArray)

输出结果：

　　把数据量级提升到100万、1000万，测试结果以下：

基于以上测试结果咱们能够排除forin，可是其余的遍历数组的方法相差不到，咱们取目前表现最好的弱化版for循环（其实针对咱们的测试环境是强化版，哈哈）

(distinct6的去重后结果竟然是1008，这个其实个for in的遍历机制有关，for in会遍历其原型链，因此for in不适合遍历数组，具体参考forin和forof的区别)

　　上代码

function getRandomIntInclusive(min, max) {
  min = Math.ceil(min);
  max = Math.floor(max);
  return Math.floor(Math.random() * (max - min + 1)) + min; //The maximum is inclusive and the minimum is inclusive 
}
var orgArray = Array.from(new Array(100000), ()=>{
    return getRandomIntInclusive(1, 1000);
})


// indexOf
Array.prototype.distinct1 = function () {
  var hash=[];
  for (i = 0; this[i] != null; i++) {
     if(hash.indexOf(this[i])==-1){
      hash.push(this[i]);
     }
  }
  return hash;
}

// 数组下标判断法
Array.prototype.distinct2 = function () {
  var hash=[];
  for (i = 0; this[i] != null; i++) {
     if(this.indexOf(this[i])==i){
      hash.push(this[i]);
     }
  }
  return hash;
}

// includes
Array.prototype.distinct3 = function () {
  var hash=[];
  for (i = 0; this[i] != null; i++) {
     if(!hash.includes(this[i])){
      hash.push(this[i]);
     }
  }
  return hash;
}

// Object
Array.prototype.distinct4 = function () {
  var hash=[];
  var obj = {};
  for (i = 0; this[i] != null; i++) {
     if(!obj[i]){
      hash.push(this[i]);
      obj[i] = this[i];
     }
  }
  return hash;
}

var startTime,endTime, rtn;
function test(types) {
  types.forEach(type => {
    startTime = new Date();
    rtn = orgArray[type]();
    endTime = new Date();
    console.log(`数量级[${orgArray.length/10000}万]去重后数组长度为${rtn.length},使用${type}消耗时长${endTime - startTime}毫秒`)
    console.log('----------------------------------------------------------------------');
  })
}
var testArray = [];
for (i = 1; i <= 4;i++) {
  testArray.push('distinct' + i);
}
test(testArray)

　　测试结果以下

　　这个结果就拉开差距了，在看下100万和1000万的结果

　　结论很明确，数据量级呈线性增加，正常的遍历数组的方式中中用objec的方法效率大大领先其余方法，咱们再次来回顾下object方法的实现思路，

　　利用Object惟一key的特性来实现去重　　

　　为了保证不漏掉，咱们object的去重取执行全部的遍历方法

function getRandomIntInclusive(min, max) {
  min = Math.ceil(min);
  max = Math.floor(max);
  return Math.floor(Math.random() * (max - min + 1)) + min; //The maximum is inclusive and the minimum is inclusive 
}
var orgArray = Array.from(new Array(100000), ()=>{
    return getRandomIntInclusive(1, 1000);
})

// 普通for循环
Array.prototype.distinct1 = function () {
  var hash=[];
  var obj = {};
  for (i = 0; i < this.length; i++) {
    if(!obj[this[i]]){
      hash.push(this[i]);
      obj[this[i]] = this[i];
    }
  }
  return hash;
}
// 优化版for循环
Array.prototype.distinct2 = function () {
  var hash=[];
  var obj = {};
  for (i = 0, len = this.length; i < len; i++) {
    if(!obj[this[i]]){
      hash.push(this[i]);
      obj[this[i]] = this[i];
    }
  }
  return hash;
}
// 弱化版for循环
Array.prototype.distinct3 = function () {
  var hash=[];
  var obj = {};
  for (i = 0; this[i] != null; i++) {
    if(!obj[this[i]]){
      hash.push(this[i]);
      obj[this[i]] = this[i];
    }
  }
  return hash;
}
// foreach
Array.prototype.distinct4 = function () {
  var hash=[];
  var obj = {};
  this.forEach(item => {
    if(!obj[item]){
      hash.push(item);
      obj[item] = item;
    }
  })
  return hash;
}
// foreach变种
Array.prototype.distinct5 = function () {
  var hash=[];
  var obj = {};
  Array.prototype.forEach.call(this, item => {
    if(!obj[item]){
      hash.push(item);
      obj[item] = item;
    }
  })
  return hash;
}
// forin
Array.prototype.distinct6 = function () {
  var hash=[];
  var obj = {};
  for (key in this) {
　　var item = this[key];
    if(!obj[item]){
      hash.push(item);
      obj[item] = item;
    }
  }
  return hash;
}
// map
Array.prototype.distinct7 = function () {
  var hash=[];
  var obj = {};
  this.map(item => {
    if(!obj[item]){
      hash.push(item);
      obj[item] = item;
    }
  });
  return hash;
}
// forof
Array.prototype.distinct8 = function () {
  var hash=[];
  var obj = {};
  for (let item of this) {
    if(!obj[item]){
      hash.push(item);
      obj[item] = item;
    }
  }
  return hash;
}
var startTime,endTime, rtn;
function test(types) {
  types.forEach(type => {
    startTime = new Date();
    rtn = orgArray[type]();
    endTime = new Date();
    console.log(`数量级[${orgArray.length/10000}万]去重后数组长度为${rtn.length},使用${type}消耗时长${endTime - startTime}毫秒`)
    console.log('----------------------------------------------------------------------');
  })
}
var testArray = [];
for (i = 1; i <= 8;i++) {
  testArray.push('distinct' + i);
}
test(testArray)

　　结论以下

　　依然再次测试100万和1000万的，结果以下

　　能够得出结论，普通的for循环（优化版、弱化版）加上Object的组合在遍历数组的方法大幅领先其余方法，咱们如今可使用object与其余方法进行对比来

　　(distinct6的去重后结果竟然是1008，这个其实个for in的遍历机制有关，for in会遍历其原型链，因此for in不适合遍历数组，具体参考forin和forof的区别)

function getRandomIntInclusive(min, max) {
  min = Math.ceil(min);
  max = Math.floor(max);
  return Math.floor(Math.random() * (max - min + 1)) + min; //The maximum is inclusive and the minimum is inclusive 
}
var orgArray = Array.from(new Array(100000), ()=>{
    return getRandomIntInclusive(1, 1000);
})

// 普通for循环
Array.prototype.distinct1 = function () {
  var hash=[];
  var obj = {};
  for (i = 0; i < this.length; i++) {
    if(!obj[this[i]]){
      hash.push(this[i]);
      obj[this[i]] = this[i];
    }
  }
  return hash;
}
// 优化版for循环
Array.prototype.distinct2 = function () {
  var hash=[];
  var obj = {};
  for (i = 0, len = this.length; i < len; i++) {
    if(!obj[this[i]]){
      hash.push(this[i]);
      obj[this[i]] = this[i];
    }
  }
  return hash;
}
// 弱化版for循环
Array.prototype.distinct3 = function () {
  var hash=[];
  var obj = {};
  for (i = 0; this[i] != null; i++) {
    if(!obj[this[i]]){
      hash.push(this[i]);
      obj[this[i]] = this[i];
    }
  }
  return hash;
}
// new Set() + [...]
Array.prototype.distinct4 = function () {
  return [...new Set(this)];
}
// new Set() + Array.from
Array.prototype.distinct5 = function () {
  return Array.from(new Set(this));
}
// Array.filter() + Object
Array.prototype.distinct6 = function () {
  var hash=[];
  var obj = {};
  return this.filter(item => {
    if (!obj[item]) {
      obj[item] = item;
      return true;
    }
  })
}
// 双重for循环
Array.prototype.distinct7 = function () {
  let arr = [...this];
  for (let i=0, len=arr.length; i<len; i++) {
    for (let j=i+1; j<len; j++) {
      if (arr[i] == arr[j]) {
        arr.splice(j, 1);
        // splice 会改变数组长度，因此要将数组长度 len 和下标 j 减一
        len--;
        j--;
      }
    }
  }
  return arr
}
// Array.sort()
Array.prototype.distinct8 = function () {
  var arr = this.sort()
  let result = [arr[0]]

  for (let i=1, len=arr.length; i<len; i++) {
    arr[i] !== arr[i-1] && result.push(arr[i])
  }
  return result
}
var startTime,endTime, rtn;
function test(types) {
  types.forEach(type => {
    startTime = new Date();
    rtn = orgArray[type]();
    endTime = new Date();
    console.log(`数量级[${orgArray.length/10000}万]去重后数组长度为${rtn.length},使用${type}消耗时长${endTime - startTime}毫秒`)
    console.log('----------------------------------------------------------------------');
  })
}
var testArray = [];
for (i = 1; i <= 8;i++) {
  testArray.push('distinct' + i);
}
test(testArray)

　　结果以下

　　基本上能够去除双重for循环和Array.sort()，咱们再次测试100万数量级别的

　　双重for循环执行挂掉了。。。看下30万的

　　咱们去除双重for循环执行100万和1000万

　　很明显普通for循环（包含优化版和弱化版）+ Object遥遥领先。

3、给出测试结果报告

　　　经过屡次执行普通for循环、优化版、弱化版，最终得出结论，效率：弱化版>普通版>优化版

　　弱化版能够改名特殊版，哈哈

// 最高性能数组去重方法 10万数量级：3毫秒，100万数量级：6毫秒，1000万数量级36毫秒
Array.prototype.distinct = function () {
  var hash=[];
  var obj = {};
  for (i = 0; this[i] != null; i++) {
    if(!obj[this[i]]){
      hash.push(this[i]);
      obj[this[i]] = this[i];
    }
  }
  return hash;
}

欢迎访问个人我的网站：枫林晚 , 以及个人github；　　

高性能JavaScript工具集直通车：Javascript高性能工具集