安装/升级requests 2.20.0html
pip install --upgrade requestsgit
pip show requestsgithub
检查发现login页面输入内容回车后会跳转页面,其实是由github.com/session页面接收数据并完成登陆。web
接下来就是构造请求了,包括headers和form data两部分;cookie
头部构造已经很熟悉了,主要是注意不要漏掉一些字段,包括referer,Origin等,实际上在session页面的post头部中要包含有login页面返回的cookie,但Session会自动完成这个过程,因此不须要手动指定了。session
字段全一些能够下降被反爬的机率。app
form data分为两部分,固定部分无所谓,但有一个字段authenticity_token是由login页面返回,经过正则找到它便可。post
登陆成功后会自动重定向到首页,此时已经作到了session 的状态保持。url
在post登陆请求后老是返回422:调试
错误码表明请求格式正确,但含有语义错误,没法响应。
检查发现authenticity_token拼写错,改正后正常。
#coding:utf-8
'''
模拟登陆github
'''
import copy
import requests
from lxml import etree
class Login():
def __init__(self):
self.headers = {
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8',
'Accept-Encoding':'gzip, deflate, br',
'Accept-Language':'zh-CN,zh;q=0.9',
'Cache-Control':'max-age=0',
'Connection':'keep-alive',
'Host':'github.com',
'Upgrade-Insecure-Requests':'1',
'User-Agent':'Mozilla/5.0 (Windows NT 6.3; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/69.0.3497.100 Safari/537.36'
}
self.login_url = 'https://github.com/login'
self.post_url = 'https://github.com/session'
self.profileUrl = 'https://github.com/settings/profile'
self.session = requests.Session()
def _token(self):
'''
parse for token.
'''
response = self.session.get(self.login_url, headers=self.headers)
selector = etree.HTML(response.text)
token = selector.xpath('//div//input[2]/@value')[0]
return token
def login(self, username, password):
post_data = {
'commit':'Sign in',
'utf-8':'✓',
'authenticity_token':self._token(),
'login':username,
'password':password
}
header_temp = copy.copy(headers)
header_add = {'Referer':r.url,'Origin':'https://github.com'}
header_temp.update(header_add)
response = self.session.post(self.post_url, headers=header_temp, data= post_data)
def islogin(self):
‘’’登陆成功验证。’’’
try:
response = self.session.get(self.profileUrl, headers=self.headers)
except:
print('get page failed!')
selector = etree.HTML(response.text)
flag = selector.xpath('//div[@class="column two-thirds"]/dl/dt/label/text()')
info = selector.xpath('//div[@class="column two-thirds"]/dl/dd/input/@value')
textarea = selector.xpath('//div[@class="column two-thirds"]/dl/dd/textarea/text()')
# 登录成功返回来的我的设置信息
print(u'我的设置Profile标题: %s'%flag)
print(u'我的设置Profile内容: %s'%info)
print(u'我的设置Profile内容: %s'%textarea)
if __name__ == '__main__':
login = Login()
login.login(username='username’, password='password')
login.islogin()