co-parallel & co-gather源码解析

时间 2019-12-06

标签 parallel gather 源码解析繁體版

原文原文链接

原文连接，转载请注明出处javascript

最近看了Ma63d关于爬虫的这篇文章，正好本身也在作爬虫，看到他在文中提到了co-parallel和co-gather，就打算改一下本身的代码（原本代码就只是为了爬一些本身感兴趣的东西，如今还在改，地址在这里）。昨天也是好好的看了一下co-parallel的源码，今天打算本身来作一下解析。java

co-parallel

源码以下：git

var thread = require('co-thread');

module.exports = function *parallel(thunks, n){
  var n = Math.min(n || 5, thunks.length);
  var ret = [];
  var index = 0;

  function *next() {
    var i = index++;
    ret[i] = yield thunks[i];
    if (index < thunks.length) yield next;
  }

  yield thread(next, n);

  return ret;
};

这段代码真的是很短，可是方法真的很巧妙。由于两个方法用到了co-thread，这里把co-thread的源码也贴出来：github

function thread(fn, n) {
  var gens = [];
  while (n--) gens.push(fn);
  return gens;
}

Run fn n times in parallel.数组

源码的描述就是为了parallel执行而建立的。因为在next外部建立了一个index变量，经过控制index的变化就可使得每次执行的next函数都是不一样的函数，在next中继续递归yield本身的话就是能够继续执行不一样的next，最终把全部的thunks都yield了一遍，方法是否是很巧妙。函数

若是你以为我说的话很混乱，那咱们仍是仍是拿一个co-parallel中的例子来说吧ui

var request = require('co-request');
var co = require('co');
var parallel = require('co-parallel');

var urls = [
  'http://google.com',
  'http://yahoo.com',
  'http://ign.com',
  'http://cloudup.com',
  'http://myspace.com',
  'http://facebook.com',
  'http://cuteoverload.com',
  'http://uglyoverload.com',
  'http://segment.io'
];

function *status(url) {
  console.log('GET %s', url);
  var s = (yield request(url)).statusCode;
  return s;
}

co(function *(){
  var start = Date.now();
  var reqs = urls.map(status);
  var res = yield parallel(reqs, 3);
  console.log(res);
  console.log('duration: %dms', Date.now() - start);
});

直接看到var reqs = urls.map(status);这句，因为传递给map的callback是一个Generator函数，因此最后的返回就是Generator的内部指针，也就是Iterator，也就是status()执行了一遍返回的结果。this

再到var res = yield parallel(reqs, 3);这里，因为parallel是一个Generator，因此直接进入parallel中，由于n=3因此thread返回的数组内容应该是相似这个 [function*() {yield thunks[0]},function*(){yield thunks[1]}]。因为yield thread返回的结果是数组，在co中会对数组作Promise.all(obj.map(toPromise, this));由于obj中都是Generator，因此toPromise会直接对每个Generator继续调用co(function*(){yield thunk[i]})。在next中最后又继续yield本身，因此当当一个thunk结束以后会继续下一个thunk。google

co-gather

co-gather实现的和co-parallel差很少的功能，只是增长了并行错误处理机制。由于Promise.all方法会在其中任何一个出问题的时候都把错误扔出来, co-gather是对all中每个方法都作了错误处理，让Promise.all方法不会抛错，源码以下：url

var thread = require('co-thread');
module.exports = function *gather(thunks, n){
  n = Math.min(n || 5, thunks.length);
  var ret = [];
  var index = 0;
  function *next() {
    var i = index++;
    ret[i] = {isError: false};
    try {
      ret[i].value = yield thunks[i];
    } catch (err) {
      ret[i].error = err;
      ret[i].isError = true;
    }
    if (index < thunks.length) yield next;
  }
  yield thread(next, n);
  return ret;
};

和co-parallel不一样的地方就在于对yield thunks[i]作了一层try catch，而后返回的包含执行结果的对象。

总结

单单是看parallel的代码仍是很好理解的，可是因为本身co的源码理解的很差，因此本身在捋清example的时候有点混乱了，后来又从新仔细的看了一遍了co的源码以及阮一峰的Generator的讲解，本身才弄明白。最后再次感谢Ma63d提供的思路。