爬虫—代理的使用

使用代理IP

一,requests使用代理

  requests的代理须要构造一个字典,而后经过设置proxies参数便可。html

import requests

proxy = '60.186.9.233'
proxies = {
    'http': 'http://' + proxy,
    'https': 'https://' + proxy
}
try:
    res = requests.get('http://httpbin.org/get', proxies=proxies)
    print(res.text)
except requests.exceptions.ConnectionError as e:
    print('error', e.args)

运行结果:python

{
  "args": {}, 
  "headers": {
    "Accept": "*/*", 
    "Accept-Encoding": "gzip, deflate", 
    "Host": "httpbin.org", 
    "User-Agent": "python-requests/2.18.4"
  }, 
  "origin": "60.186.9.233", 
  "url": "https://httpbin.org/get"
}

  其运行结果的origin是代理的IP,说明代理设置成功。若是代理须要认证,再代理的前面加上用户名密码便可。web

proxy = 'username:password@60.186.9.233'

二,Selenium使用代理

  Selenium一样能够设置代理,一种是有界面浏览器,Chrome为例;另外一种是无头浏览器,以PhantomJS为例。chrome

Chrome浏览器设置浏览器

  经过chrome_options来设置代理,才建立Chrome对象的时候用chrome_options参数传递便可。运行代码会弹出Chrome浏览器,访问链接后看到以下结果。app

# chrome代理设置
from selenium import webdriver

proxy = '60.186.9.233'
chrome_options = webdriver.ChromeOptions()
chrome_options.add_argument('--proxy-server=http://' + proxy)
browser = webdriver.Chrome(chrome_options=chrome_options)
res = browser.get('http://httpbin.org/get')
{
  "args": {}, 
  "headers": {
    "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8", 
    "Accept-Encoding": "gzip, deflate", 
    "Accept-Language": "zh-CN,zh;q=0.9", 
    "Host": "httpbin.org", 
    "Upgrade-Insecure-Requests": "1", 
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.110 Safari/537.36"
  }, 
  "origin": "60.186.9.233", 
  "url": "https://httpbin.org/get"
}

 

PhantomJS设置url

  使用service_args参数将命令行的一些参数定义为列表,在初始化的时候传递给PhantomJS就能够了。spa

# PhantomJs代理设置
from selenium import webdriver

service_args = [
    '--proxy=60.186.9.233',
    '--proxy-type=http'
]
browser = webdriver.PhantomJS(service_args=service_args)
browser.get('http://httpbin.org/get')
print(browser.page_source)

运行结果:命令行

{
  "args": {}, 
  "headers": {
    "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8", 
    "Accept-Encoding": "gzip, deflate", 
    "Accept-Language": "zh-CN,zh;q=0.9", 
    "Host": "httpbin.org", 
    "Upgrade-Insecure-Requests": "1", 
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.110 Safari/537.36"
  }, 
  "origin": "60.186.9.233", 
  "url": "https://httpbin.org/get"
}

若是须要认证,那么在service_args参数中加入--proxy-auth选项便可。代理

service_args = [
    '--proxy=60.186.9.233',
    '--proxy-type=http',
    '--proxy-auth=username:password'
]
相关文章
相关标签/搜索