原创做品,容许转载,转载时请务必以超连接形式标明文章 原始出处 、做者信息和本声明。不然将追究法律责任。http://john88wang.blog.51cto.com/2165294/1441495 php
一 urlib模块
html
利用urllib模块能够打开任意个url。python
urlopen() 打开一个url返回一个文件对象,能够进行相似文件对象的操做。json
In [308]: import urllib In [309]: file=urllib.urlopen(' In [310]: file.readline() Out[310]: '<!DOCTYPE html><!--STATUS OK--><html><head><meta http-equiv="content-type" content="text/html;charset=utf-8"><meta http-equiv="X-UA-Compatible" content="IE=Edge"><link rel="dns-prefetch" href="//s1.bdstatic.com"/><link rel="dns-prefetch" href="//t1.baidu.com"/><link rel="dns-prefetch" href="//t2.baidu.com"/><link rel="dns-prefetch" href="//t3.baidu.com"/><link rel="dns-prefetch" href="//t10.baidu.com"/><link rel="dns-prefetch" href="//t11.baidu.com"/><link rel="dns-prefetch" href="//t12.baidu.com"/><link rel="dns-prefetch" href="//b1.bdstatic.com"/><title>\xe7\x99\xbe\xe5\xba\xa6\xe4\xb8
能够用read(),readlines(),fileno(),close()这些函数api
In [337]: file.info() Out[337]: <httplib.HTTPMessage instance at 0x2394a70> In [338]: file.getcode() Out[338]: 200 In [339]: file.geturl() Out[339]: 'http://www.baidu.com/'
2.urlretrieve() 将url对应的html页面保存为文件浏览器
In [404]: filename=urllib.urlretrieve('http://www.baidu.com/',filename='/tmp/baidu.html') In [405]: type (filename) Out[405]: <type 'tuple'> In [406]: filename[0] Out[406]: '/tmp/baidu.html' In [407]: filename Out[407]: ('/tmp/baidu.html', <httplib.HTTPMessage instance at 0x23ba878>) In [408]: filename[1] Out[408]: <httplib.HTTPMessage instance at 0x23ba878>
3.urlcleanup() 清除由urlretrieve()产生的缓存缓存
In [454]: filename=urllib.urlretrieve('http://www.baidu.com/',filename='/tmp/baidu.html') In [455]: urllib.urlcleanup()
4.urllib.quote()和urllib.quote_plus() 将url进行编码
服务器
In [483]: urllib.quote('http://www.baidu.com') Out[483]: 'http%3A//www.baidu.com' In [484]: urllib.quote_plus('http://www.baidu.com') Out[484]: 'http%3A%2F%2Fwww.baidu.com'
5.urllib.unquote()和urllib.unquote_plus() 将编码后的url解码cookie
In [514]: urllib.unquote('http%3A//www.baidu.com') Out[514]: 'http://www.baidu.com' In [515]: urllib.unquote_plus('http%3A%2F%2Fwww.baidu.com') Out[515]: 'http://www.baidu.com'
6.urllib.urlencode() 将url中的键值对以&划分,能够结合urlopen()实现POST方法和GET方法函数
In [560]: import urllib In [561]: params=urllib.urlencode({'spam':1,'eggs':2,'bacon':0}) ] In [562]: f=urllib.urlopen("http://python.org/query?%s" %params) In [563]: f.readline() Out[563]: '<!doctype html>\n' In [564]: f.readlines() Out[564]: ['<!--[if lt IE 7]> <html class="no-js ie6 lt-ie7 lt-ie8 lt-ie9"> <![endif]-->\n', '<!--[if IE 7]> <html class="no-js ie7 lt-ie8 lt-ie9"> <![endif]-->\n', '<!--[if IE 8]> <html class="no-js ie8 lt-ie9"> <![endif]-->\n', '<!--[if gt IE 8]><!--><html class="no-js" lang="en" dir="ltr"> <!--<![endif]-->\n', '\n',
二 urllib2模块
urllib2比urllib多了些功能,例如提供基本的认证,重定向,cookie等功能
https://docs.python.org/2/library/urllib2.html
https://docs.python.org/2/howto/urllib2.html
In [566]: import urllib2 In [567]: f=urllib2.urlopen('http://www.python.org/') In [568]: print f.read(100) --------> print(f.read(100)) <!doctype html> <!--[if lt IE 7]> <html class="no-js ie6 lt-ie7 lt-ie8 lt-ie9"> <![endif]-->打开python的官网并返回头100个字节内容
HTTP基于请求和响应,客户端发送请求,服务器响应请求。urllib2使用一个Request对象表明发送的请求,调用urlopen()打开Request对象能够返回一个response对象。reponse对象是一个相似文件的对象,能够像文件同样进行操做
In [630]: import urllib2 In [631]: req=urllib2.Request('http://www.baidu.com') In [632]: response=urllib2.urlopen(req) In [633]: the_page=response.read() In [634]: the_page Out[634]: '<!DOCTYPE html><!--STATUS OK--><html><head><meta http-equiv="content-type" content="text/html;charset=utf-8"><meta http-equiv="X-UA-Compatible" content="IE=Edge"><link rel="dns-prefetch" href="//s1.bdstatic.com"/><link rel="dns-prefetch" href="//t1.baidu.com"/><link rel="dns-prefetch" href="//t2.baidu.com"/><link rel="dns-prefetch" href="//t3.baidu.
In [763]: import urllib In [764]: import urllib2 In [765]: url='http://xxxxxx/login.php' In [766]: values={'ver' : '1.7.1', 'email' : 'xxxxx', 'password' : 'xxxx', 'mac' : '111111111111'} In [767]: data=urllib.urlencode(values) In [768]: req=urllib2.Request(url,data) In [769]: response=urllib2.urlopen(req) In [770]: the_page=response.read() In [771]: the_page
若是不使用urllib2.Request()发送data参数,urllib2使用GET请求,GET请求和POST请求差异在于POST请求常有反作用,POST请求会经过某些方式改变系统的状态。也能够经过GET请求发送数据。
In [55]: import urllib2 In [56]: import urllib In [57]: url='http://xxx/login.php' In [58]: values={'ver' : 'xxx', 'email' : 'xxx', 'password' : 'xxx', 'mac' : 'xxx'} In [59]: data=urllib.urlencode(values) In [60]: full_url=url + '?' + data In [61]: the_page=urllib2.urlopen(full_url) In [63]: the_page.read() Out[63]: '{"result":0,"data":0}'
默认状况下,urllib2使用Python-urllib/2.6 代表浏览器类型,能够经过增长User-Agent HTTP头
In [107]: import urllib In [108]: import urllib2 In [109]: url='http://xxx/login.php' In [110]: user_agent='Mozilla/4.0 (compatible; MSIE 5.5; Windows NT)' In [111]: values={'ver' : 'xxx', 'email' : 'xxx', 'password' : 'xxx', 'mac' : 'xxxx'} In [112]: headers={'User-Agent' : user_agent} In [114]: data=urllib.urlencode(values) In [115]: req=urllib2.Request(url,data,headers) In [116]: response=urllib2.urlopen(req) In [117]: the_page=response.read() In [118]: the_page
当给定的url不能链接时,urlopen()将报URLError异常,当给定的url内容不能访问时,urlopen()会报HTTPError异常
#/usr/bin/python from urllib2 import Request,urlopen,URLError,HTTPError req=Request('http://10.10.41.42/index.html') try: response=urlopen(req) except HTTPError as e: print 'The server couldn\'t fulfill the request.' print 'Error code:',e.code except URLError as e: print 'We failed to fetch a server.' print 'Reason:',e.reason else: print "Everything is fine"
这里须要注意的是在写异常处理时,HTTPError必需要写在URLError前面
#/usr/bin/python from urllib2 import Request,urlopen,URLError,HTTPError req=Request('http://10.10.41.42') try: response=urlopen(req) except URLError as e: if hasattr(e,'reason'): print 'We failed to fetch a server.' print 'Reason:',e.reason elif hasattr(e,'code'): print 'The server couldn\'t fulfill the request.' print 'Error code:',e.code else: print "Everything is fine"
hasattr()函数判断一个对象是否有给定的属性
使用urllib2模块登陆须要基本认证的页面
登陆RabbitMQ的管理页面须要进行用户名和密码验证。
In [63]: url='http://172.30.25.179:15672/api/aliveness-test/%2f' In [64]: username='guest' In [65]: password='guest' In [66]: mgr=urllib2.HTTPPasswordMgrWithDefaultRealm() In [67]: s=mgr.add_password(None,url,username,password) In [68]: handler=urllib2.HTTPBasicAuthHandler(mgr) In [69]: opener=urllib2.build_opener(handler) In [70]: opener.open(url).read() Out[70]: '{"status":"ok"}' json.loads(opener.open(url).read())
参考文档:
https://docs.python.org/2/library/urllib2.html#module-urllib2
本文出自 “Linux SA John” 博客,请务必保留此出处http://john88wang.blog.51cto.com/2165294/1441495