puppeteer新手遇到的坑

puppeteer安装以及遇到的坑

1. 环境和安装

Puppeteer 至少须要 Node v6.4.0,如要使用 async / await,只有 Node v7.6.0 或更高版本才支持。 node下载地址: https://nodejs.org/zh-cn/node

2. 建立项目

2.1 建立test目录,进入目录执行npm init,生成项目package.json文件
2.2 安装 puppeteer
yarn add puppeteer 或者 npm i puppeteer

在安装的过程当中遇到以下错误git

weifandeMacBook-Pro:example weifan$ npm i puppeteer --save

> puppeteer@1.6.0 install /Users/weifan/Desktop/example/node_modules/puppeteer
> node install.js

ERROR: Failed to download Chromium r571375! Set "PUPPETEER_SKIP_CHROMIUM_DOWNLOAD" env variable to skip download.
{ Error: connect ETIMEDOUT 172.217.25.16:443
    at Object._errnoException (util.js:999:13)
    at _exceptionWithHostPort (util.js:1020:20)
    at TCPConnectWrap.afterConnect [as oncomplete] (net.js:1207:14)
  errno: 'ETIMEDOUT',
  code: 'ETIMEDOUT',
  syscall: 'connect',
  address: '172.217.25.16',
  port: 443 }
npm WARN example@1.0.0 No description
npm WARN example@1.0.0 No repository field.

npm ERR! code ELIFECYCLE
npm ERR! errno 1
npm ERR! puppeteer@1.6.0 install: `node install.js`
npm ERR! Exit status 1
npm ERR! 
npm ERR! Failed at the puppeteer@1.6.0 install script.
npm ERR! This is probably not a problem with npm. There is likely additional logging output above.

npm ERR! A complete log of this run can be found in:
npm ERR!     /Users/weifan/.npm/_logs/2018-07-16T09_49_23_441Z-debug.log

报错的缘由是:由于在执行安装的过程当中须要执行install.js,这里会下载Chromium,咱们这里先跳过进行跳过,github

看来须要设置PUPPETEER_SKIP_CHROMIUM_DOWNLOAD,这个环境变量了,设置方法有多种,这里以下:web

env PUPPETEER_SKIP_CHROMIUM_DOWNLOAD="true" npm i --save puppeteer

你会看到安装成功chrome

2.3 手动下载Chromium

下载地址:https://download-chromium.appspot.com/ npm

把下载刚刚下载的文件解压到项目的chromium文件夹下,在chromium文件夹下你会看到chrome-mac文件,你能够点击爱看下问价内容。json

2.4 在项目的根目录的src文件夹下新建index.js(截图功能), 代码以下:
const puppeteer = require('puppeteer');

async function getPic() {
  const browser = await puppeteer.launch();
  const page = await browser.newPage();
  await page.goto('https://google.com');
  await page.screenshot({path: 'google.png'});

  await browser.close();
}

getPic();

运行代码:node index.js,出现了以下错误api

(node:38213) UnhandledPromiseRejectionWarning: Error: Chromium revision is not downloaded. Run "npm install" or "yarn install"
    at assert (/Users/weifan/Desktop/example/node_modules/puppeteer/lib/helper.js:282:11)
    at Function.launch (/Users/weifan/Desktop/example/node_modules/puppeteer/lib/Launcher.js:106:7)
    at <anonymous>

显示chromium 未下载错误,由于chromium默认的下载路径是在node_modules/puppeteer/.local-chromium/目录,这时候咱们的chromium是在项目根目录,因此须要配置指定路径,修改index.js文件:app

const puppeteer = require('puppeteer');

async function getPic() {
  const browser = await puppeteer.launch({
    executablePath: '../chromium/chrome-mac/Chromium.app',
    headless: false
  });
  const page = await browser.newPage();
  await page.goto('https://google.com');
  await page.screenshot({path: 'google.png'});

  await browser.close();
}

getPic();

再次运行index.js,又报以下错误:less

(node:38246) UnhandledPromiseRejectionWarning: Error: spawn EACCES

在puppeteer的Git issues找到以下解决方法,https://github.com/GoogleChrome/puppeteer/issues/1649,把executablePath改成以下:

executablePath: '../chromium/chrome-mac/Chromium.app/Contents/MacOS/Chromium',

再次node index.js 运行文件,能够跑通了。

参考以下:

一、https://www.jianshu.com/p/a89d8d6c007b

二、https://blog.fundebug.com/2017/11/01/guide-to-automating-scraping-the-web-with-js/

三、https://github.com/GoogleChrome/puppeteer/issues/1649

相关文章
相关标签/搜索