Selenium+Chrome+PhantomJS 爬取淘宝

时间 2019-11-13

标签 selenium+chrome+phantomjs selenium chrome phantomjs 淘宝栏目 Chrome 繁體版

原文原文链接

https://github.com/factsbenchmarks/taobao-jingdonghtml

一简单铺垫python

　　Selenium负责驱动浏览器与python对接git

　　PhantomJS负责渲染解析JavaScriptgithub

二函数web

　　单独一个函数，传一个参数页码，实现跳转到该指定页面的功能。chrome

　　获取某页码内的信息，返回字典格式。将字典格式的数据，保存到数据库。这两个功能，能够单独写两个函数。即插即用，没问题。数据库

三 selenium在最新的版本中并不支持PhantomJS，推荐headless Chrome。浏览器

　　参考文件：less

　　https://developers.google.cn/web/updates/2017/04/headless-chrome函数

　　http://www.javashuo.com/article/p-tyzdzogi-bb.html

from selenium import webdriver
from selenium.webdriver.chrome.options import Options

chrome_options = Options()
chrome_options.add_argument('--headless')
chrome_options.add_argument('--disable-gpu')
driver = webdriver.Chrome(chrome_options=chrome_options)
driver.get("https://cnblogs.com/")