Python urllib urllib2

时间 2019-11-09

标签 python urllib urllib2 栏目 Python 繁體版

原文原文链接

urlli2是对urllib的扩展。html

类似与区别：python

最经常使用的urllib.urlopen和urllib2.urlopen是相似的，可是参数有区别，例如超时和代理。cookie

urllib接受url字符串来获取信息，而urllib2除了url字符串，也接受Request对象，而在Request对象中能够设置headers，而urllib却不能设置headers。socket

urllib有urlencode方法来对参数进行encode操做，而urllib2没有此方法，因此他们两常常一块儿使用。post

相对来讲urllib2功能更多一些，包含了各类handler和opener。ui

另外还有httplib模块，它提供了最基础的http请求的方法，例如能够作get/post/put等操做。google

参考：http://blog.csdn.net/column/details/why-bug.html编码

最基本的应用：url

import urllib2  
response = urllib2.urlopen('http://www.baidu.com/')  
html = response.read()  
print html

使用Request对象：.net

import urllib2    
req = urllib2.Request('http://www.baidu.com')    
response = urllib2.urlopen(req)    
the_page = response.read()    
print the_page

发送表单数据：

import urllib    
import urllib2    
  
url = 'http://www.someserver.com/register.cgi'    
    
values = {'name' : 'WHY',    
          'location' : 'SDU',    
          'language' : 'Python' }    
  
data = urllib.urlencode(values) # 编码工做  
req = urllib2.Request(url, data)  # 发送请求同时传data表单  
response = urllib2.urlopen(req)  #接受反馈的信息  
the_page = response.read()  #读取反馈的内容

import urllib2    
import urllib  
  
data = {}  
  
data['name'] = 'WHY'    
data['location'] = 'SDU'    
data['language'] = 'Python'  
  
url_values = urllib.urlencode(data)    
print url_values  
  
name=Somebody+Here&language=Python&location=Northampton    
url = 'http://www.example.com/example.cgi'    
full_url = url + '?' + url_values  
  
data = urllib2.urlopen(full_url)

在http请求中设置headers：

import urllib    
import urllib2    
  
url = 'http://www.someserver.com/cgi-bin/register.cgi'  
  
user_agent = 'Mozilla/4.0 (compatible; MSIE 5.5; Windows NT)'    
values = {'name' : 'WHY',    
          'location' : 'SDU',    
          'language' : 'Python' }    
  
headers = { 'User-Agent' : user_agent }    
data = urllib.urlencode(values)    
req = urllib2.Request(url, data, headers)    
response = urllib2.urlopen(req)    
the_page = response.read()

下面是关于opener和handler的应用：

from urllib2 import Request, urlopen, URLError, HTTPError  
  
  
old_url = 'http://t.cn/RIxkRnO'  
req = Request(old_url)  
response = urlopen(req)    
print 'Old url :' + old_url  
print 'Real url :' + response.geturl()

这里获得url即response.geturl()与old_url不一样，是由于重定向。

查看页面信息info()：

from urllib2 import Request, urlopen, URLError, HTTPError  
  
old_url = 'http://www.baidu.com'  
req = Request(old_url)  
response = urlopen(req)    
print 'Info():'  
print response.info()

一个opener和handler的实例：

# -*- coding: utf-8 -*-  
import urllib2  
  
# 建立一个密码管理者  
password_mgr = urllib2.HTTPPasswordMgrWithDefaultRealm()  
  
# 添加用户名和密码  
  
top_level_url = "http://example.com/foo/"  
  
# 若是知道 realm, 咱们可使用他代替 ``None``.  
# password_mgr.add_password(None, top_level_url, username, password)  
password_mgr.add_password(None, top_level_url,'why', '1223')  
  
# 建立了一个新的handler  
handler = urllib2.HTTPBasicAuthHandler(password_mgr)  
  
# 建立 "opener" (OpenerDirector 实例)  
opener = urllib2.build_opener(handler)  
  
a_url = 'http://www.baidu.com/'  
  
# 使用 opener 获取一个URL  
opener.open(a_url)  
  
# 安装 opener.  
# 如今全部调用 urllib2.urlopen 将用咱们的 opener.  
urllib2.install_opener(opener)

下面是一些技巧：

代理设置：

import urllib2  
enable_proxy = True  
proxy_handler = urllib2.ProxyHandler({"http" : 'http://some-proxy.com:8080'})  
null_proxy_handler = urllib2.ProxyHandler({})  
if enable_proxy:  
    opener = urllib2.build_opener(proxy_handler)  
else:  
    opener = urllib2.build_opener(null_proxy_handler)  
urllib2.install_opener(opener)

timeout设置，

python2.6前：

import urllib2  
import socket  
socket.setdefaulttimeout(10) # 10 秒钟后超时  
urllib2.socket.setdefaulttimeout(10) # 另外一种方式

2.6以后：

import urllib2  
response = urllib2.urlopen('http://www.google.com', timeout=10)

Request中加入header：

import urllib2  
request = urllib2.Request('http://www.baidu.com/')  
request.add_header('User-Agent', 'fake-client')  
response = urllib2.urlopen(request)  
print response.read()

redirect：

import urllib2  
my_url = 'http://www.google.cn'  
response = urllib2.urlopen(my_url)  
redirected = response.geturl() == my_url  
print redirected  
  
my_url = 'http://rrurl.cn/b1UZuP'  
response = urllib2.urlopen(my_url)  
redirected = response.geturl() == my_url  
print redirected

import urllib2  
class RedirectHandler(urllib2.HTTPRedirectHandler):  
    def http_error_301(self, req, fp, code, msg, headers):  
        print "301"  
        pass  
    def http_error_302(self, req, fp, code, msg, headers):  
        print "303"  
        pass  
  
opener = urllib2.build_opener(RedirectHandler)  
opener.open('http://rrurl.cn/b1UZuP')

cookie：

import urllib2  
import cookielib  
cookie = cookielib.CookieJar()  
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cookie))  
response = opener.open('http://www.baidu.com')  
for item in cookie:  
    print 'Name = '+item.name  
    print 'Value = '+item.value

http的put和delete方法：

import urllib2  
request = urllib2.Request(uri, data=data)  
request.get_method = lambda: 'PUT' # or 'DELETE'  
response = urllib2.urlopen(request)

获得http返回码：

import urllib2  
try:  
    response = urllib2.urlopen('http://bbs.csdn.net/why')  
except urllib2.HTTPError, e:  
    print e.code

debug log：

import urllib2  
httpHandler = urllib2.HTTPHandler(debuglevel=1)  
httpsHandler = urllib2.HTTPSHandler(debuglevel=1)  
opener = urllib2.build_opener(httpHandler, httpsHandler)  
urllib2.install_opener(opener)  
response = urllib2.urlopen('http://www.google.com')

1. python urllib 和 urllib2
2. python urllib2与urllib
3. Python urllib与urllib2
4. [python] urllib 和 urllib2
5. python-urllib/urllib2模块
6. urllib urllib2
7. httplib,urllib和urllib2
8. URLLIB,URLLIB2,HTTPLIB
9. Python: difference between urllib and urllib2
10. python-35:urllib 和 urllib2 模块
更多相关文章...
• SQLite - Python - SQLite教程
• Docker 安装 Python - Docker教程
• YAML 入门教程