一,抓百度网页对象,输出网页内容浏览器
from urllib import request req = request.urlopen("http://www.baidu.com") print(req.read().decode("utf-8"))
1,其中from urllib import request,这个再cmd中检查是否安装,直接输入from urllib import request,若是没安装则输入:pip install urllibpost
2,req.read().decode("utf-8"),读取网页对象内容,以utf-8编码读取网站
二,模拟真实浏览器ui
from urllib import request resq = request.Request("http://www.baidu.com"); resq.add_header("User-Agent","Mozilla/5.0 (Linux; Android 6.0; Nexus 5 Build/MRA58N) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.84 Mobile Safari/537.36") req = request.urlopen(resq) print(req.read().decode("utf-8"))
这样作是由于有些网站会不让爬虫,因此咱们要模拟真实的的请求。编码
三,发送post请求url
1,导包 from urllib import parsespa
from urllib import request from urllib import parse resq = request.Request("http://www.thsrc.com.tw/tw/TimeTable/SearchResult"); resq.add_header("User-Agent","Mozilla/5.0 (Linux; Android 6.0; Nexus 5 Build/MRA58N) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.84 Mobile Safari/537.36") resq.add_header("Origin","http://www.thsrc.com.tw") postDate = parse.urlencode([ ("StartStation","2f940836-cedc-41ef-8e28-c2336ac8fe68"), ("EndStation","977abb69-413a-4ccf-a109-0272c24fd490"), ("SearchDate","2017/12/09"), ("SearchTime","21:30"), ("SearchWay","DepartureInMandarin") ]); req = request.urlopen(resq,data=postDate.encode("utf-8")) print(req.read().decode("utf-8"))
2,模拟请求台湾高铁,获取高铁班次信息。code