这两天开始准备作一个本身的网络爬虫,因此就各类找资料,找到了一个资料,讲的挺好的,用的就是HttpClient来写的,就在apache上下了jar包,准备本身编写,可是硬是找不到对应的类。上了apache官网看了看,原来资料上用的是apache原来的一个开源工程,叫Commons HttpClient,改项目已经早已被apache弃用,并再也不更新新版本,取而代之的是Apache HttpComponents这个工程的HttpClient和HttpCore,由于其提供更好的性能和灵活性,虽然我尚未任何体会,既然原来的被弃用,那就学新的吧,追赶潮流吧.......javascript
从官网上下了pdf版教程,开始第一个例子吧...html
首先要导入httpclient-4.3.3.jar,httpcore-4.3.2.jar,common-logging.jar这三个jar包。java
开始写代码........web
GET请求:
apache
package com.lu.test; import java.io.InputStream; import java.net.URI; import java.util.Scanner; import org.apache.http.Header; import org.apache.http.HeaderIterator; import org.apache.http.HttpEntity; import org.apache.http.StatusLine; import org.apache.http.client.methods.CloseableHttpResponse; import org.apache.http.client.methods.HttpGet; import org.apache.http.client.utils.URIBuilder; import org.apache.http.impl.client.CloseableHttpClient; import org.apache.http.impl.client.HttpClients; public class HttpClientTest { public static void main(String[] args) throws Exception { URI uri = new URIBuilder().setScheme("HTTP").setHost("www.baidu.com") .setPath("/").setParameter("name", "****").build(); // 建立客户端对象,至关于打开一个浏览器 CloseableHttpClient client = HttpClients.createDefault(); try { // 建立一个get请求 HttpGet httpGet = new HttpGet(uri); // 执行这个请求,改方法返回一个response对象 CloseableHttpResponse response = client.execute(httpGet); try { // 获得请求的方式 System.out.println("request method : " + httpGet.getMethod()); System.out.println("-------------------------------"); // 获得返回的状态行,StatusLine为接口,getStatusLine()返回一个实现该接口的对象 StatusLine statusLine = response.getStatusLine(); System.out.println(statusLine.getProtocolVersion()); System.out.println(statusLine.getStatusCode()); System.out.println(statusLine.getReasonPhrase()); System.out.println("-------------------------------"); // getAllHeaders()方法将获得全部的响应头,并返回一个数组 // Header[] headers = response.getAllHeaders(); // for (Header h : response.getAllHeaders()) { // System.out.println(h.getName() + " : " + h.getValue()); // } HeaderIterator iter = response.headerIterator(); while (iter.hasNext()) { Header header = iter.nextHeader(); System.out.println(header.getName() + " : " + header.getValue()); } } finally { response.close(); } } finally { client.close(); } } }
输出结果: request method : GET ------------------------------- HTTP/1.1 200 OK ------------------------------- Date : Sat, 15 Mar 2014 15:26:21 GMT Content-Type : text/html Transfer-Encoding : chunked Connection : Keep-Alive Vary : Accept-Encoding Set-Cookie : BAIDUID=98D1A9B265CFFD5D549FAF1B3AF80EFA:FG=1; expires=Thu, 31-Dec-37 23:55:55 GMT; max-age=2147483647; path=/; domain=.baidu.com Set-Cookie : BDSVRTM=11; path=/ Set-Cookie : H_PS_PSSID=5489_5229_1431_5223_5460_4261_5568_4760_5516; path=/; domain=.baidu.com P3P : CP=" OTI DSP COR IVA OUR IND COM " Expires : Sat, 15 Mar 2014 15:26:21 GMT Cache-Control : private Server : BWS/1.1 BDPAGETYPE : 1 BDQID : 0xe06a0fbd00182376 BDUSERID : 0
POST请求数组
package com.lu.test; import java.util.ArrayList; import java.util.List; import org.apache.http.HttpEntity; import org.apache.http.NameValuePair; import org.apache.http.client.entity.UrlEncodedFormEntity; import org.apache.http.client.methods.CloseableHttpResponse; import org.apache.http.client.methods.HttpPost; import org.apache.http.impl.client.CloseableHttpClient; import org.apache.http.impl.client.HttpClients; import org.apache.http.message.BasicNameValuePair; import org.apache.http.util.EntityUtils; public class HttpClientTest { public static void main(String[] args) throws Exception { CloseableHttpClient httpClient = HttpClients.createDefault(); // 用于存储表单数据 List<NameValuePair> form = new ArrayList<NameValuePair>(); try { // 添加键值对 form.add(new BasicNameValuePair("username", "****")); form.add(new BasicNameValuePair("password", "********")); // 把表单转换成entity UrlEncodedFormEntity entity = new UrlEncodedFormEntity(form, "UTF-8"); HttpPost httpPost = new HttpPost( "http://localhost:8080/spiderweb/RirectServlet"); // 将entity Set到post请求中 httpPost.setEntity(entity); CloseableHttpResponse httpResponse = httpClient.execute(httpPost); try { HttpEntity responseEntity = httpResponse.getEntity(); String content = EntityUtils.toString(responseEntity); System.out.println(content); } finally { httpResponse.close(); } } finally { httpClient.close(); } } }
使用ResponseHandler来处理响应,ResponseHandler可以保证在任何状况下都会将底层的HTTP链接释放回链接管理器,从而简化了编码。浏览器
package com.lu.test; import java.io.IOException; import org.apache.http.HttpEntity; import org.apache.http.HttpResponse; import org.apache.http.StatusLine; import org.apache.http.client.ClientProtocolException; import org.apache.http.client.ResponseHandler; import org.apache.http.client.methods.HttpGet; import org.apache.http.impl.client.CloseableHttpClient; import org.apache.http.impl.client.HttpClients; import org.apache.http.util.EntityUtils; public class HttpClientTest { public static void main(String[] args) throws Exception { CloseableHttpClient httpClient = HttpClients.createDefault(); HttpGet httpGet = new HttpGet("http://www.baidu.com"); // 建立一个ResponseHandler来处理响应 ResponseHandler<String> handler = new ResponseHandler<String>() { @Override public String handleResponse(HttpResponse response) throws ClientProtocolException, IOException { StatusLine statusLine = response.getStatusLine(); System.out.println(statusLine.getStatusCode()); HttpEntity entity = response.getEntity(); if (null != entity) { return EntityUtils.toString(entity); } return null; } }; // 执行请求,并传入ResponseHandler来处理响应 String content = httpClient.execute(httpGet, handler); System.out.println(content); } }
HttpClient,我在注释上说建立HttpClient对象就至关于打开一个浏览器,实际上是不许确的。网络
看官方教程上的说明。
dom
HttpClient is NOT a browser. It is a client side HTTP transport library. HttpClient's purpose iside
to transmit and receive HTTP messages. HttpClient will not attempt to process content, execute
javascript embedded in HTML pages, try to guess content type, if not explicitly set, or reformat
request / redirect location URIs, or other functionality unrelated to the HTTP transport.
大体意思就是 HttpClient不是一个浏览器。是一个客户端的HTTP传输库。它的目的就是传送和接受HTTP信息。HttpClient不会尝试去处理内容,执行内嵌在HTML页面中的javascript代码,去猜想内容类型,从新格式化请求或者重定向URI,以及其余一些与HTTP传输无关的功能。
关于其余的函数,包括请求对象,响应对象,都很是简单,就不详细解释。