python2使用urllib/urllib2实现Http请求

在Http请求中，最为常见的两种请求为GET、POST请求，下面实现方式主要是以urllib/urilib2方式实现。
urllib/urllib2是python中两个内置的模块，要实现Http功能，实现方式以urllib2为主，urllib为辅，在urllib2中提供了一个完整的基础函数urllib2.urlopen(url)，经过向指定的url发出请求来获取数据一、GET请求的实现python

    import urllib2
    response = urllib2.urlopen(“127.0.0.1:8800”)
    content = resonse.read()
    print content

在上述的实现方式中，能够对分为请求、响应两步，形式以下： import urllib2 #生成一个请求 requset = urllib2.Requset("127.0.0.1:8800") #请求与响应 response = urllib2.urlopen(requset) content = response.read()浏览器

二、POST请求的实现服务器

import urllib
import urllib2
url = "127.0.0.1:8800"
#请求数据
postdata = {
	'username':  'lxn',
	'password': '888888888'
}
#将数据编码
data = urllib.urllencode(postdata)
#生成一个请求而且写入头信息
req = urllib.Request(url, data)
#请求与响应
response = urllib2.urlopen(req)
content = response.read()

上面实现方式就是一个简单的post请求，可是有时可能会出现这种状况：即便POST请求的数据是对的，可是服务器仍是拒绝你的访问。这是为何呢？问题出如今请求中的头信息中，由于服务器会校验请求头来判断是否来自浏览器的访问，好比在反爬虫的引用中。咱们能够经过加上请求头信息：socket

import urllib
import urllib2
url = "127.0.0.1:8800"
headers = {
	'User-Agent':"Mozilla/4.0 (compatible; MSIE 5.5; Windows NT)",
	'Referer':'127.0.0.1:8800'
}
#请求数据
postdata = {
	'username':  'lxn',
	'password': '888888888'
}
#将数据编码
data = urllib.urllencode(postdata)
#生成一个请求而且写入头信息
req = urllib.Request(url, data，headers)
#请求与响应
response = urllib2.urlopen(req)
content = response.read()

咱们也可使用add_header方式来添加头信息：函数

import urllib
     import urllib2
     url = '127.0.0.1:8800/login'
     postdata = {'username' : 'lxn',
                        'password' : '88888888'}
     data = urllib.urlencode(postdata)
     req = urllib2.Request(url)
     # 将user_agent,referer写入头信息
     req.add_header('User-Agent', 'Mozilla/4.0 (compatible; MSIE 5.5; Windows NT)')
     req.add_header('Referer','http://www.xxxxxx.com/')
     req.add_data(data)
     response = urllib2.urlopen(req)
     content = response.read()

三、Timeout超时设定在Python2.6以前的版本，urllib2的API中并无开放Timeout超时接口，要设定Timeout值，只能更改Socket的全局Timeout值，实例以下：post

import urllib2
     import socket
     socket.setdefaulttimeout(10) # 10 秒钟后超时
     urllib2.socket.setdefaulttimeout(10) # 另外一种方式

在Python2.6及新的版本中，urlopen函数提供了对Timeout的设置，示例以下：编码

import urllib2
     request=urllib2.Request('127.0.0.1:8800/login')
     response = urllib2.urlopen(request,timeout=2) #2秒后超时
     content=response.read()

四、获取HTTP响应码对于200OK来讲，只要使用urlopen返回的response对象的getcode()方法就能够获得HTTP的返回码（只针对返回码为200的请求）。但对其余返回码来讲，urlopen会抛出异常。这时候，就要检查异常对象的code属性了，示例以下：url

import urllib2
     try:
        response = urllib2.urlopen('127.0.0.1:8800')
        print response
     except urllib2.HTTPError as e:
        if hasattr(e, 'code'):
                print 'Error code:',e.code

参考书籍：《Python爬虫开发与项目实战》 — 范传辉编著code