带你学习目前很是流行的开源爬虫框架Scrapy

时间 2021-03-05

标签 html shell api 浏览器 dom scrapy ide 性能学习测试栏目网络爬虫繁體版

原文原文链接

Scrapy安装

官网 https://scrapy.org/html

安装方式

在任意操做系统下，能够使用pip安装Scrapy，例如：shell

$ pip install scrapy

为确认Scrapy已安装成功，首先在Python中测试可否导入Scrapy模块：api

>>> import scrapy  
>>> scrapy.version_info
(1, 8, 0)

Python爬虫、数据分析、网站开发等案例教程视频免费在线观看浏览器

https://space.bilibili.com/523606542

Python学习交流群：1039649593

而后，在 shell 中测试可否执行 Scrapy 这条命令：dom

(base) λ scrapy 
Scrapy 1.8.0 - no active project 
Usage: 
  scrapy <command> [options] [args] 

Available commands: 
  bench Run quick benchmark test
  fetch Fetch a URL using the Scrapy downloader 
  genspider Generate new spider using pre-defined templates 
  runspider Run a self-contained spider (without creating a project) 
  settings Get settings values 
  shell Interactive scraping console 
  startproject Create new project version 
  Print Scrapy version 
  view Open URL in browser, as seen by Scrapy 

  [ more ] More commands available when run from project directory 

Use "scrapy <command> -h" to see more info about a command

经过了以上两项检测，说明Scrapy安装成功了。如上所示，咱们安装的是当前最新版本1.8.0scrapy

注意：

在安装Scrapy的过程当中可能会遇到缺乏VC++等错误，能够安装缺失模块的离线包
成功安装后，在CMD下运行scrapy出现上图不算真正成功，检测真正是否成功使用 scrapy bench 测试，若是没有提示错误，就表明成功安装

具体Scrapy安装流程参考： http://doc.scrapy.org/en/latest/intro/install.html##intro-install-platform-notes 里面有各个平台的安装方法ide

全局命令

$ scrapy 
Scrapy 1.7.3 - no active project 
Usage: 
  scrapy <command> [options] [args] 

Available commands: 
  bench Run quick benchmark test 
        ## 测试电脑性能。
  fetch Fetch a URL using the Scrapy downloader 
        ## 将源代码下载下来并显示出来
  genspider Generate new spider using pre-defined templates 
        ## 建立一个新的 spider 文件 
  runspider Run a self-contained spider (without creating a project) 
        ## 这个和经过crawl启动爬虫不一样，scrapy runspider 爬虫文件名称 
  settings Get settings values 
        ## 获取当前的配置信息 
  shell Interactive scraping console 
        ## 进入 scrapy 的交互模式 
  startproject Create new project 
        ## 建立爬虫项目。 
  version Print Scrapy version 
  view Open URL in browser, as seen by Scrapy 
        ## 将网页document内容下载下来，而且在浏览器显示出来 

  [ more ] More commands available when run from project directory 

Use "scrapy <command> -h" to see more info about a command

项目命令

scrapy startproject projectname
建立一个项目
scrapy genspider spidername domain
建立爬虫。建立好爬虫项目之后，还须要建立爬虫。
scrapy crawl spidername运行爬虫。注意该命令运行时所在的目录。