在爬取网页的时候,有的网站会有反爬虫措施,致使服务器请求拒接,能够使用代理IP来访问,解决请求拒绝的问题apache
代理IP分 透明代理、匿名代理、混淆代理、高匿代理服务器
一、透明代理(Transparent Proxy):透明代理虽然能够“隐藏”IP地址,可是仍是能够从HTTP_X_FORWARDED_FOR来查到IP
REMOTE_ADDR = Proxy IP
HTTP_VIA = Proxy IP
HTTP_X_FORWARDED_FOR = Your IP
二、匿名代理(Anonymous Proxy):匿名代理比透明代理进步了一点:别人只能知道你用了代理,没法知道你是谁
REMOTE_ADDR = proxy IP
HTTP_VIA = proxy IP
HTTP_X_FORWARDED_FOR = proxy IP
三、混淆代理(Distorting Proxies):若是使用了混淆代理,别人仍是能知道你在用代理,可是会获得一个假的IP地址,假装的更逼真
REMOTE_ADDR = Proxy IP
HTTP_VIA = Proxy IP
HTTP_X_FORWARDED_FOR = Random IP address
四、高匿代理(Elite proxy或High Anonymity Proxy):高匿代理让别人根本没法发现你是在用代理
REMOTE_ADDR = Proxy IP
HTTP_VIA = not determined
HTTP_X_FORWARDED_FOR = not determineddom
import org.apache.http.HttpEntity; import org.apache.http.HttpHost; import org.apache.http.client.config.RequestConfig; import org.apache.http.client.methods.CloseableHttpResponse; import org.apache.http.client.methods.HttpGet; import org.apache.http.impl.client.CloseableHttpClient; import org.apache.http.impl.client.HttpClients; import org.apache.http.util.EntityUtils; import org.junit.Test; /** * @author test * @Title: JunitHttpClient * @ProjectName JunitHttpClient * @Description: TODO * @date 2018/12/1216:07 */ public class JunitHttpClient { @Test public void test()throws Exception{ // 建立httpget实例 HttpGet httpGet=new HttpGet("https://www.****.com"); CloseableHttpClient client = setProxy(httpGet, "192.168.1.1", 8888); //设置请求头消息 httpGet.setHeader("User-Agent","Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.110 Safari/537.36"); // 执行http get请求 也能够使用psot CloseableHttpResponse response=client.execute(httpGet); // 获取返回实体 if (response != null){ HttpEntity entity = response.getEntity(); if (entity != null){ System.out.println("网页内容为:"+ EntityUtils.toString(entity,"utf-8")); } } //关闭response response.close(); //关闭httpClient client.close(); } /** * 设置代理 * @param httpGet * @param proxyIp * @param proxyPort * @return */ public CloseableHttpClient setProxy(HttpGet httpGet,String proxyIp,int proxyPort){ // 建立httpClient实例 CloseableHttpClient httpClient= HttpClients.createDefault(); //设置代理IP、端口 HttpHost proxy=new HttpHost(proxyIp,proxyPort,"http"); //也能够设置超时时间 RequestConfig requestConfig = RequestConfig.custom().setProxy(proxy).setConnectTimeout(3000).setSocketTimeout(3000).setConnectionRequestTimeout(3000).build(); RequestConfig requestConfig=RequestConfig.custom().setProxy(proxy).build(); httpGet.setConfig(requestConfig); return httpClient; } }