1、引用包html
import urllib.request缓存
2、经常使用方法post
(1)urllib.request.urlretrieve(网址,本地文件存储地址):直接下载网页到本地url
urllib.request.urlretrieve("http://www.baidu.com","D:\1.html")spa
(2)urllib.request.urlcleanup():清理缓存code
(3)查看网页基本内容htm
file = urllib.request.urlopen("http://www.baidu.com")blog
print(file.info()) #查看网页信息utf-8
print(file.code())#查看网页状态码get
print(file.geturl())#获取当前网页的url
(4)设置网页超时时间
urllib.request.urlopen("http"//www.baidu.com",timeout=1)
timeout就是网页的超时时间设定
3、POST请求
import urllib.request import urllib.parse post_url = "http://www.baidu.com" post_data = urllib.parse.urlencode{ "username":"username" "password":"password" }.encode("utf-8") req = urllib.request.Request(post_url,post_data)
4、异常处理
import urllib.request import urllib.error try: urllib.request.urlopen("http://www.baidu.com") except urllib.error.URLError as e: if hasattr(e,"code"): print(e.code) if hasattr(e,"reason"): print(e.reason)