gevent异步爬虫

本文首发于知乎
以前咱们讲过基于asycnio的异步爬虫实现,不过代码过于复杂,本文咱们使用gevent模块实现异步爬虫。php

本文分为以下部分html

  • 用gevent实现异步爬虫
  • grequests模块

用gevent实现异步爬虫

由于使用很是简单,就直接上代码了python

import gevent
from gevent import monkey
import requests
from bs4 import BeautifulSoup
monkey.patch_all() # 对全部io操做打上补丁,固定加这一句
def get_title(i):
url = 'https://movie.douban.com/top250?start={}&filter='.format(i*25)
text = requests.get(url).content
soup = BeautifulSoup(text, 'html.parser')
lis = soup.find('ol', class_='grid_view').find_all('li')
for li in lis:
title = li.find('span', class_="title").text
print(title)
gevent.joinall([gevent.spawn(get_title, i) for i in range(10)])
复制代码

gevent本质上是开启了多个微线程,下面咱们用threading模块来检验一下编程

import gevent
from gevent import monkey
import requests
from bs4 import BeautifulSoup
import threading
monkey.patch_all()
def get_title(i):
print(threading.current_thread().name) # 打印出当前线程名称
url = 'https://movie.douban.com/top250?start={}&filter='.format(i*25)
text = requests.get(url).content
soup = BeautifulSoup(text, 'html.parser')
lis = soup.find('ol', class_='grid_view').find_all('li')
for li in lis:
title = li.find('span', class_="title").text
print(title)
gevent.joinall([gevent.spawn(get_title, i) for i in range(10)])
复制代码

运行结果首先打印出了下面内容网络

DummyThread-1
DummyThread-2
DummyThread-3
DummyThread-4
DummyThread-5
DummyThread-6
DummyThread-7
DummyThread-8
DummyThread-9
DummyThread-10
复制代码

表示这里其实开了10个微线程同时运行。app

其实咱们也能够控制用一个线程来完成,只须要这样改异步

monkey.patch_all()
改为
monkey.patch_all(thread=False)
复制代码

grequests模块

requests库的做者将requests和gevent融合产生了grequests模块,专门用于异步网络请求,使用以下ui

import grequests
from bs4 import BeautifulSoup
def get_title(rep):
soup = BeautifulSoup(rep.text, 'html.parser')
lis = soup.find('ol', class_='grid_view').find_all('li')
for li in lis:
title = li.find('span', class_="title").text
print(title)
reps = (grequests.get('https://movie.douban.com/top250?start={}&filter='.format(i*25)) for i in range(10))
for rep in grequests.map(reps):
get_title(rep)
复制代码

欢迎关注个人知乎专栏

专栏主页:python编程lua

专栏目录:目录url

版本说明:软件及包版本说明

相关文章
相关标签/搜索