Python-爬虫-requests库用语post登陆

时间 2019-11-20

标签 python 爬虫 requests 用语 post 登陆栏目 Python 繁體版

原文原文链接

requests库很强大，支持HTTP链接保持和链接池，支持使用cookie保持会话，支持文件上传，支持自动肯定响应内容的编码，支持国际化的URL和POST数据自动编码。php

能够发送无参数的get请求，也能够发送有参数的get请求，修改headers等等。html

这里主要展发送post请求，经过data参数来传递。python

好比：登陆chinaunix网站，经过登陆名、密码来登陆。cookie

经过查看chinaunix网站源码，能够看到登陆页面的网址是：session

http://bbs.chinaunix.net/member.php？mod=logging&action=login&loginsubmit=yes&loginhash=LIcAcpost

不一样的电脑登陆网址可能不同，请查看具体的网页源代码。网站

为了应对网站的反爬虫，能够修改headers来模拟网页登陆。具体以下：编码

import requests

conn = requests.session()
url = 'http://bbs.chinaunix.net/member.php?mod=logging&action=login&loginsubmit=yes&loginhash=LIcAc'
postdata = {
    ‘username’:’***’,
    ‘password’:’***'
}
headers = {
            'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/55.0.2883.95 Safari/537.36'}
rep = conn.post(url, data=postdata,headers=headers)
with open('1.html', 'wb') as f:
    f.write(rep.content)

代码中的登陆名和密码换成本身提早注册好的，不然登陆不上。url

requests库自动保存cookie，不用再单独设置。.net

import requests

conn = requests.session()
url = 'http://bbs.chinaunix.net/member.php?mod=logging&action=login&loginsubmit=yes&loginhash=LIcAc'
postdata = {
    'username':'zhaoxn04',
    'password':'wobugaosuni2004'
}
headers = {
            'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/55.0.2883.95 Safari/537.36'}
rep = conn.post(url, data=postdata,headers=headers)
with open('1.html', 'wb') as f:
    f.write(rep.content)

url1 = 'http://bbs.chinaunix.net/thread-4246512-1-1.html'
rep1 = conn.get(url1, headers=headers)
with open('2.html', 'wb') as f:
    f.write(rep1.content)