用程序模拟提交表单登陆百度。javascript
从实用意义上说,这种问题其实意义不大,而且也并不适合写成博客。百度网页在不断变化,而此博客内容却不会相应更新,没法保证内容的正确性。 从学习知识方面说,这种问题适合做为学习课题。这几天学了下python,感触良多。python确实比java灵活,语法也有许多漂亮的特性。好比多行字符串,raw字符串(无需转义的字符串),在java中都没有,好难受。 这种问题须要耐心,像破解密码同样,须要去尝试,去理解,去猜测,耗费时间和精力,性价比较低,有这功夫就不如多学点别的。仍是应该多多学习,孔子曰:终日而思,不如须臾之所学也。意思是说:思考一天不如学习半晌。html
chrome浏览器,ctrl+u打开源代码,f12打开开发者工具。重点监测network,设置成preserve模式,实验以前清空过去的cookie和缓存等信息,排除干扰。剩下的任务就是盯着network的同时,执行登陆,登出,发帖,评论等动做,而后查看cookie的变化及返回结果。从数据中发挥想象力,大胆猜想,寻找规律。java
三步走,登录成功python
1. 首先,访问百度的任何一个页面,都会得到一个百度id(BAIDUID),这是一个cookie;正则表达式
2. 其次,访问https://passport.baidu.com/v2/api/?getapi&tpl=mn&apiver=v3&class=login页面获取tokenchrome
3. 最后,对https://passport.baidu.com/v2/api/?login页面提交表单apache
v2表示version2。此页面get请求后面的参数不一样返回的结果也不一样。 一开始抓包时,在浏览器中看到的参数是这样的:json
- getapi:python3.x
- tpl:mnapi
- apiver:v3
- tt:1461752974956 登陆时间
- class:login行为是登陆,而非其它行为
- gid:36400D4-3078-460D-ABD8-9DEFBA99604B
- logintype:dialogLogin 登陆类型,经过对话框登陆
- callback:bdcbsgyljq1 回调
返回的结果是这样的:
bd__cbs__gyljq1({ "errInfo": { "no": "0" }, "data": { "rememberedUserName": "1661686074@qq.com", "codeString": "", "token": "dda98eb93a3011ca4165a01b342a4622", "cookie": "1", "usernametype": "2", "spLogin": "newuser", "disable": "", "loginrecord": { 'email': [], 'phone': [] } } })
这是一个不纯的json串,有些冗余。errInfo结果为0表示一切顺利,data是一个json串,里面惟一有用的信息是token。 这个get请求参数有许可能是多余的,请求参数不一样,返回的结果不一样,能够直接在浏览器地址栏中测试。去掉callback参数以后,最后变成:getapi&apiver=v3参数,返回的json串就变得十分周正了。
{ "errInfo": { "no": "0" }, "data": { "rememberedUserName": "", "codeString": "", "token": "dda98eb93a3011ca4165a01b342a4622", "cookie": "1", "usernametype": "", "spLogin": "newuser", "disable": "", "loginrecord": { 'email': [], 'phone': [] } } }
再去掉apiver=v3(apiversion=3)属性,参数变为:getapi&class=login时,返回值就变成了键值对的方式:
var bdPass=bdPass||{}; bdPass.api=bdPass.api||{}; bdPass.api.params=bdPass.api.params||{}; bdPass.api.params.login_token='dda98eb93a3011ca4165a01b342a4622'; bdPass.api.params.login_tpl='mn'; document.write('<script type="text/javascript" charset="UTF-8" src="https://passport.baidu.com/js/pass_api_login.js?v=20131115"></script>');
因此,version2须要getapi和class=longin两个属性,version3须要getapi和apiver=v3两个属性,其中没有tpl=mn属性登陆会失败,虽然能够返回json串,可是用于登陆时,必需要有tpl=mn属性。 version2好像log4j得配置文件有没有。如何解析出来呢?能够用正则表达式,也能够用json解析version3的返回结果,还能够用属性解析version2的返回结果。正则表达式效果应该最好。 对于这种问题有两个原则(虽然矛盾,要寻找一个平衡): * 能删的参数尽可能删掉 * 没有必要费精力试参数,直接全弄上,多了总不会错(但可能格式会难看一些)
在浏览器中看到的表单数据项不少,其中有许可能是毫无用处的,没有它们照样登录成功。通过删了测,测了删,发现只有以下表单有用:
{ "token": token, "tpl": "mn", "loginmerge": True, "username": username, "password": password }
再删就要出错了,tpl=mn这个属性仍是必不可少。在这一步里,用到了第二步获取的token。 登陆百度,不须要设置浏览器头部假装成浏览器,也不须要假装referer等头部,直接get,get,post三步走就登录成功了。cookie也不须要管,由于api本身处理了cookie。本次请求会自动带上上次得到的cookie。 登陆百度首页以后,就能够访问百度的各个部分了(包括贴吧,知道等)。 如何验证有没有登录成功呢?有两种方法: 0. 访问www.baidu.com,看看页面里面有没有本身的名字 1. 查看cookie里面有没有PTOKEN和STOKEN等关键cookie
百度返回值说明,no表示错误码(0为正常),errorcode也表示错误码,error表示错误信息,data表示数据:
"no": 40, "err_code": 40, "error": null, "data"
import json from pip._vendor.requests.sessions import Session global username, password, token username = 'xxxxx' password = 'xxxxx' s = Session() # python2.x与python3.x差异很是大 # 过去使用urllib,urllib2,如今使用request包 def showCookie(cookies): for i in cookies: print(i) i.domain = '*' print('*' * 20) # 第一步,访问百度,获取cookie百度ID s.get("http://www.baidu.com") # 第二步,访问密码网页,获取token,此页面返回一个json串。后面的参数不一样返回的结果不一样,抓包以后,尝试着删除了许多没用的参数 resp = s.get("https://passport.baidu.com/v2/api/?getapi&tpl=mn&apiver=v3") # json.dumps能够识别包含单引号的json串,json.loads却不能 t = json.loads(resp.text.replace('\'', '\"')) token = t['data']['token'] # 第三步,提交表单。通过测试,只有下面五个数据是必需的 data = { "token": token, "tpl": "mn", "loginmerge": True, "username": username, "password": password } resp = s.post("https://passport.baidu.com/v2/api/?login", data)
用到第三方库fastjson进行json解析,apache httpclient进行网络请求。
static HttpClient client = HttpClients.createDefault(); static void login(String username, String password) throws ClientProtocolException, IOException { HttpGet homePage = new HttpGet("http://tieba.baidu.com"); client.execute(homePage); HttpGet getToken = new HttpGet( "https://passport.baidu.com/v2/api/?getapi&tpl=mn&apiver=v3&class=login"); HttpResponse resp = client.execute(getToken); String json = EntityUtils.toString(resp.getEntity()); JSONObject obj = JSON.parseObject(json); String token = obj.getJSONObject("data").getString("token"); HttpPost loginPost = new HttpPost( "https://passport.baidu.com/v2/api/?login"); List<NameValuePair> list = new ArrayList<>(); list.add(new BasicNameValuePair("token", token)); list.add(new BasicNameValuePair("username", username)); list.add(new BasicNameValuePair("password", password)); list.add(new BasicNameValuePair("tpl", "mn")); list.add(new BasicNameValuePair("loginmerge", "true")); UrlEncodedFormEntity loginData = new UrlEncodedFormEntity(list); loginPost.setEntity(loginData); client.execute(loginPost); }
终于登录成功了,下一步就要发帖了。百度贴吧的数据类型十分重要,这个层次不分清楚就很差办。
* 贴吧forum,forum是论坛的意思,贴吧是一个大话题,是一个比较大的类型。它下面是许多分支,是大话题的细化。好比:金庸吧包括不少thread,武功最高的人是谁?扫地僧和独孤求败谁厉害?....每个thread都会引起不少post(帖子),而每个帖子又会引来人们的评论。
* 话题thread,提出话题就至关于准备盖个楼。
* 帖子post,每个post都为thread添加了一层楼。
* 评论comment,每个post下面能够有评论,这样针对性才强,盖楼是表达本身的观点犹如重武器,长枪大戟发前人之所未发,评论就像小匕首短兵刃同样更直接。
常见错误类型:
* 没有tbs,返回230308,其中错误码是308,230是前缀
* 265是错误码,230是固定的前缀,这个大概是发帖太频繁,禁止发帖
* 40是验证码:发帖太频繁,须要验证码,若是能破解验证码,那天然是大大的好,下面就是返回的json串,str_reason表示“请点击验证码完成发贴”。
{ "no": 40, "err_code": 40, "error": null, "data": { "autoMsg": "", "fid": 1847502, "fname": "\u5927\u5b66\u751f\u52b1\u5fd7", "tid": 0, "is_login": 1, "content": "", "vcode": { "need_vcode": 1, "str_reason": "\u8bf7\u70b9\u51fb\u9a8c\u8bc1\u7801\u5b8c\u6210\u53d1\u8d34", "captcha_vcode_str": "captchaservice303662633978724c7655642b44707538683879667741516b6c2f4262726d36777477486b356749525449362b39495a426642746d6d744d5178716236766c4a575650742f6f4b4b57534d576656385534766158757678644979672b56742f56776237523631766d6e33754a567a654d62767a7238646a6632703447653477673568695544454a2b6146695a6651525763705657396b6c45614a334d6a75375664425452684977702b306d3866306a346350365755634f763835614f72426d4c5478596e41587749525773372b38746c66443949764156423478776a644d37476a746a674b4374396348574636644d617a457043714d796d48644641676c466a55716e5841587162646465624e4b6171356a733041502f456c7649636f5879326177514c67473164636638482f76487a55", "captcha_code_type": 4, "userstatevcode": 0 }, "mute_text": null } }
tbs从http://tieba.baidu.com/dc/common/tbs得到,只须要get一下,解析出json传中的tbs便可。tbs至关于贴吧通行证,你发的每个评论,盖的每一层楼提交时都须要提交tbs,它们的tbs能够相同,这个tbs是你最近一次得到的tbs,服务器上维持着一份hashmap记录用户id和tbs值。
许多属性是不必的,一次成功以后删繁就简,删了测测了删,发现header是没有用的,许多表单域也是无关紧要的。 表单属性介绍以下:
* kw:thread名称,也就是话题名称
* tid: threadId,也就是话题id,在地址栏中就能够看见tid。
* fid:一个thread好像fid都是同样的,大概跟tid差很少吧,反正据我观察,在一个话题下发了不少贴它是不变的。 打开一个话题主页,好比:http://tieba.baidu.com/p/4195311174,ctrl+u查看源代码,ctrl+f查找关键词fid,很容易发现整页上的fid都是如出一辙的。 百度用什么作主键:用long作主键!百度也不用uuid,博客园也不用uuid,很明显能够从网页上看出来。
static void newPost() throws ClientProtocolException, IOException { HttpPost post = new HttpPost( "http://tieba.baidu.com/f/commit/post/add"); HashMap<String, String> paramMap = new HashMap<String, String>(); paramMap.put("kw", "大学生励志"); paramMap.put("fid", "1847502"); paramMap.put("tid", "4135933166"); paramMap.put("tbs", tbs); paramMap.put("content", "天下大势为我所控"); List<NameValuePair> list = new ArrayList<>(); for (Map.Entry<String, String> i : paramMap.entrySet()) { BasicNameValuePair pair = new BasicNameValuePair(i.getKey(), i.getValue()); System.out.println(pair.getName() + ":" + pair.getValue()); list.add(pair); } RequestConfig config = RequestConfig.custom().setSocketTimeout(5000) .setConnectTimeout(5000).build(); post.setConfig(config); post.setEntity(new UrlEncodedFormEntity(list, "utf-8")); HttpResponse resp = client.execute(post); System.out.println(EntityUtils.toString(resp.getEntity())); }
java确实很长,仍是看看python吧,简短有力更适合描述问题。java并非老是冗长,它要想简单也很容易设计出简洁的API。
java的复杂来源于三个个方面:
* 库设计的不合理,”每次只作一件事,每步只作一件事“的哲学有点像汇编,有些冗长,而且java不屑于设计语法糖
* 问题自己就很复杂,须要进行许多配置,更灵活,python虽短,可能有些事情无法办,由于封装地太严密了,留下的接口太少了。
* 库设计的不合理+问题自己就复杂。这里面有一个几率问题,复杂问题不经常使用,你却让人们用大量的时间去考虑它们,这就不如预先设计一种简单不完善的接口。宁肯简单的缺憾,也不要复杂的完善。举一个例子,选中多行按下tab键以后是应该缩进仍是应该替换,固然是缩进了,若是我要替换我是不会这么操做的,缩进带来的简捷性很是大。
resp = s.get("http://tieba.baidu.com/dc/common/tbs") tbs = json.loads(resp.text)['tbs'] data = { "kw": "大学生励志", "fid": "1847502", # first post id "tid": "4135933166", # 贴吧id "tbs": tbs, # 很重要 "content": "如今下午两点四十二" } resp = s.post("http://tieba.baidu.com/f/commit/post/add",data) print(resp.text) print("over")
static void newReply() throws ClientProtocolException, IOException { String data = "kw=大学生励志&fid=1847502&tid=4135933166"e_id=78685072053&rich_text=1&tbs=" + tbs + "&content=五楼的也能够评论floornum无论用吗&lp_type=0&lp_sub_type=0&new_vcode=1&tag=11&repostid=78685072053&anonymous=0"; HttpPost post = new HttpPost( "http://tieba.baidu.com/f/commit/post/add"); List<NameValuePair> list = new ArrayList<>(); for (String i : data.split("&")) { int p = i.indexOf('='); BasicNameValuePair pair = new BasicNameValuePair(i.substring(0, p), i.substring(p + 1)); System.out.println(pair.getName() + ":" + pair.getValue()); list.add(pair); } post.setEntity(new UrlEncodedFormEntity(list, "utf-8")); HttpResponse resp = client.execute(post); System.out.println(EntityUtils.toString(resp.getEntity())); }
把上面的代码串联起来,是这样子的:
String username = "xxxxxx"; ; String password = "xxxxxx"; login(username, password); tbs = getTbs(); newPost(); newReply();
关键是tbs只须要获取一次,而后做为全局变量存在就能够了,不须要反复获取。
tieba.baidu.com/f/index/feedlist?tagid=all&limit=2000000&offset=0 这个连接十分重要。有些连接须要复制到地址栏才能访问而不能直接跳转过去,由于服务器可能不容许跨域访问。 它的参数 tagid=like | all 表示请求的标签列表的类型,like表示只返回我喜欢的,all表示返回所有。 limit表示条数,offset表示偏移量。它还有许多其余参数,好比last_tid最后一条的时间(用于加载更多),&_表示 这个连接是怎么知道的,访问tieba.baidu.com加载更多就会向这个feedlist发出请求。 经过jsoup解析html就能够获得好多贴吧及它们的tid了,而后点进去就能够得到fid了,有了tid和fid就能够盖楼了。
apache的httpClient组件包含多个部分,好比httpAsycClient是带回调函数的请求服务器;fluent部分是流畅版的httpclient,写起来简直溜溜溜。不信请看:
Request.Get("http://somehost/") .connectTimeout(1000) .socketTimeout(1000) .execute().returnContent().asString(); // Execute a POST with the 'expect-continue' handshake, using HTTP/1.1, // containing a request body as String and return response content as byte array. Request.Post("http://somehost/do-stuff") .useExpectContinue() .version(HttpVersion.HTTP_1_1) .bodyString("Important stuff", ContentType.DEFAULT_TEXT) .execute().returnContent().asBytes(); // Execute a POST with a custom header through the proxy containing a request body // as an HTML form and save the result to the file Request.Post("http://somehost/some-form") .addHeader("X-Custom-header", "stuff") .viaProxy(new HttpHost("myproxy", 8080)) .bodyForm(Form.form().add("username", "vip").add("password", "secret").build()) .execute().saveContent(new File("result.dump"));
import json from pip._vendor.requests.sessions import Session global username, password, token username = 'xxxxx' password = 'xxxxx' s = Session() # python2.x与python3.x差异很是大 # 过去使用urllib,urllib2,如今使用request包 def showCookie(cookies): for i in cookies: print(i) i.domain = '*' print('*' * 20) # 第一步,访问百度,获取cookie百度UID s.get("http://www.baidu.com") # 第二步,访问密码网页,获取token,此页面返回一个json串。后面的参数不一样返回的结果不一样,抓包以后,尝试着删除了许多没用的参数 resp = s.get("https://passport.baidu.com/v2/api/?getapi&tpl=mn&apiver=v3") # 必须把单引号转化成双引号,不然没法经过json进行解析,python3.x开始走向严格和规范了 t = json.loads(resp.text.replace('\'', '\"')) token = t['data']['token'] # 第三步,提交表单。通过测试,只有下面五个数据是必需的 data = { "token": token, "tpl": "mn", "loginmerge": True, "username": username, "password": password } resp = s.post("https://passport.baidu.com/v2/api/?login", data) resp = s.get("http://tieba.baidu.com/dc/common/tbs") tbs = json.loads(resp.text)['tbs'] data = { "kw": "大学生励志", "fid": "1847502", # first post id "tid": "4135933166", # 贴吧id "tbs": tbs, # 很重要 "content": "如今下午两点四十二" } resp = s.post("http://tieba.baidu.com/f/commit/post/add",data) print(resp.text) print("over")
public class Main { static HttpClient client = HttpClients.createDefault(); static String tbs; static String host = "Host: tieba.baidu.com"; static String useragent = "User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/49.0.2623.110 Safari/537.36"; static void login(String username, String password) throws ClientProtocolException, IOException { HttpGet homePage = new HttpGet("http://tieba.baidu.com"); client.execute(homePage); HttpGet getToken = new HttpGet( "https://passport.baidu.com/v2/api/?getapi&tpl=mn&apiver=v3&class=login"); HttpResponse resp = client.execute(getToken); String json = EntityUtils.toString(resp.getEntity()); JSONObject obj = JSON.parseObject(json); String token = obj.getJSONObject("data").getString("token"); HttpPost loginPost = new HttpPost( "https://passport.baidu.com/v2/api/?login"); List<NameValuePair> list = new ArrayList<>(); list.add(new BasicNameValuePair("token", token)); list.add(new BasicNameValuePair("username", username)); list.add(new BasicNameValuePair("password", password)); list.add(new BasicNameValuePair("tpl", "mn")); list.add(new BasicNameValuePair("loginmerge", "true")); UrlEncodedFormEntity loginData = new UrlEncodedFormEntity(list); loginPost.setEntity(loginData); client.execute(loginPost); } static String getTbs() throws ClientProtocolException, IOException { HttpGet get = new HttpGet("http://tieba.baidu.com/dc/common/tbs"); HttpResponse resp = client.execute(get); String s = EntityUtils.toString(resp.getEntity()); JSONObject json = JSON.parseObject(s); return json.getString("tbs"); } static void newReply() throws ClientProtocolException, IOException { String data = "kw=大学生励志&fid=1847502&tid=4135933166"e_id=78685072053&rich_text=1&tbs=" + tbs + "&content=五楼的也能够评论floornum无论用吗&lp_type=0&lp_sub_type=0&new_vcode=1&tag=11&repostid=78685072053&anonymous=0"; HttpPost post = new HttpPost( "http://tieba.baidu.com/f/commit/post/add"); List<NameValuePair> list = new ArrayList<>(); for (String i : data.split("&")) { int p = i.indexOf('='); BasicNameValuePair pair = new BasicNameValuePair(i.substring(0, p), i.substring(p + 1)); System.out.println(pair.getName() + ":" + pair.getValue()); list.add(pair); } post.setEntity(new UrlEncodedFormEntity(list, "utf-8")); HttpResponse resp = client.execute(post); System.out.println(EntityUtils.toString(resp.getEntity())); } static void newPost() throws ClientProtocolException, IOException { HttpPost post = new HttpPost( "http://tieba.baidu.com/f/commit/post/add"); HashMap<String, String> paramMap = new HashMap<String, String>(); paramMap.put("kw", "大学生励志"); paramMap.put("fid", "1847502"); paramMap.put("tid", "4135933166"); paramMap.put("tbs", tbs); paramMap.put("content", "魏印福"); List<NameValuePair> list = new ArrayList<>(); for (Map.Entry<String, String> i : paramMap.entrySet()) { BasicNameValuePair pair = new BasicNameValuePair(i.getKey(), i.getValue()); System.out.println(pair.getName() + ":" + pair.getValue()); list.add(pair); } RequestConfig config = RequestConfig.custom().setSocketTimeout(5000) .setConnectTimeout(5000).build(); post.setConfig(config); post.setEntity(new UrlEncodedFormEntity(list, "utf-8")); HttpResponse resp = client.execute(post); System.out.println(EntityUtils.toString(resp.getEntity())); } public static void main(String[] args) throws ClientProtocolException, IOException { String username = "xxxxxx"; String password = "xxxxxx"; login(username, password); tbs = getTbs(); newPost(); newReply(); } }
这代码确实是写得好
class HttpUtils { /** * map转换成entity * * <a href="http://twitter.com/param">@param</a> map * 待处理的 * <a href="http://twitter.com/return">@return</a> 处理后的数据 */ public static HttpEntity mapToEntity(HashMap<String, String> map) throws Exception { BasicNameValuePair pair = null; List<BasicNameValuePair> params = new ArrayList<BasicNameValuePair>(); for (Map.Entry<String, String> m : map.entrySet()) { pair = new BasicNameValuePair(m.getKey(), m.getValue()); params.add(pair); } HttpEntity entity = new UrlEncodedFormEntity(params, "UTF-8"); return entity; } /** * 取文本之间的字符串 * * <a href="http://twitter.com/param">@param</a> string * 源字符串 * <a href="http://twitter.com/param">@param</a> start * 开始字符串 * <a href="http://twitter.com/param">@param</a> end * 结束字符串 * <a href="http://twitter.com/return">@return</a> 成功返回中间子串,失败返回null */ public static String mid(String string, String start, String end) { int s = string.indexOf(start) + start.length(); int e = string.indexOf(end, s); if (s > 0 && e > s) return string.substring(s, e); return null; } /** * * <a href="http://twitter.com/param">@param</a> regex * 正则表达式 * <a href="http://twitter.com/param">@param</a> input * 待匹配的字符串 * <a href="http://twitter.com/return">@return</a> 返回的是匹配的list集合(可能因为正则表达式的不一样有多条记录) */ public static ArrayList<String> myRegex(String regex, String input) { ArrayList<String> list = new ArrayList<String>(); Pattern p = Pattern.compile(regex); Matcher m = p.matcher(input); while (m.find()) { list.add(m.group()); } return list; } } public class Baidu { private CloseableHttpClient httpClient; // 模拟客户端 private String postFid; // 发帖用的fid private String postName = ""; // 发帖指定的贴吧名 private CloseableHttpResponse response; // 存储请求返回的信息 private String html; // 存储返回的html页面 private boolean isQL = false; // 标记是否在抢二楼 public boolean isQL() { return isQL; } public void setQL(boolean isQL) { this.isQL = isQL; if (isQL == true) System.out.println("开始抢二楼了。"); else System.out.println("关闭抢二楼了。"); } /** * 登陆 **/ public boolean login(String username, String password) { // 是否成功登录的标记 boolean isLogin = false; httpClient = HttpClients.createDefault(); try { /** 1,BAIDUID **/ String baiduId = null; HttpGet get_main = new HttpGet( "http://tieba.baidu.com/dc/common/tbs/"); response = httpClient.execute(get_main); get_main.abort(); HeaderIterator it = response.headerIterator("Set-Cookie"); while (it.hasNext()) baiduId = it.next().toString(); baiduId = HttpUtils.mid(baiduId, ":", ";"); System.out.println("1,BAIDUID:" + baiduId); /** 2,token **/ HttpGet get_token = new HttpGet( "https://passport.baidu.com/v2/api/?getapi&tpl=mn"); response = httpClient.execute(get_token); String token = EntityUtils.toString(response.getEntity(), "utf-8"); get_token.abort(); token = HttpUtils.mid(token, "_token='", "'"); System.out.println("2,TOKEN:" + token); /** 3,Login **/ HashMap<String, String> map = new HashMap<String, String>(); map.put("username", username); map.put("password", password); map.put("token", token); map.put("isPhone", "false"); map.put("quick_user", "0"); map.put("tt", System.currentTimeMillis() + ""); map.put("loginmerge", "true"); map.put("logintype", "dialogLogin"); map.put("splogin", "rate"); map.put("mem_pass", "on"); map.put("tpl", "mn"); map.put("apiver", "v3"); map.put("u", "http://www.baidu.com/"); map.put("safeflg", "0"); map.put("ppui_logintime", "43661"); map.put("charset", "utf-8"); // 封装 HttpEntity entity = HttpUtils.mapToEntity(map); HttpPost http_login = new HttpPost( "https://passport.baidu.com/v2/api/?login"); http_login.setEntity(entity); response = httpClient.execute(http_login); http_login.abort(); it = response.headerIterator(); while (it.hasNext()) { // 这里是根据是否写入的BDUSS-cookie判断是否登陆成功 if (it.next().toString().contains("BDUSS")) { isLogin = true; break; } } System.out.println("3,登陆状态" + isLogin); return isLogin; } catch (Exception e) { throw new RuntimeException("未知错误"); } } /** * 发布帖子 * * <a href="http://twitter.com/throws">@throws</a> Exception */ public String writeTiebaItem(String tiebaName, String title, String content) throws Exception { String tbs = null; HashMap<String, String> paramMap = new HashMap<String, String>(); String nowTime = System.currentTimeMillis() + ""; // 判断是不是第一次在这个吧发帖,若是不是就获取fid,反之没必要,由于fid是固定不变的 if (!postName.equals(tiebaName)) { postFid = getFid(tiebaName); postName = tiebaName; } System.out.println("fid:" + postFid); if (postFid == null) { System.err.println("未知错误"); return "未知错误"; } /** 拿到tbs */ tbs = getTbs(); System.out.println("tbs:" + tbs); paramMap.put("ie", "utf-8"); paramMap.put("kw", postName); paramMap.put("fid", postFid); paramMap.put("tid", "0"); paramMap.put("vcode_md5", ""); paramMap.put("floor_num", "0"); paramMap.put("rich_text", "1"); paramMap.put("tbs", tbs); paramMap.put("content", content); paramMap.put("title", title); paramMap.put("prefix", ""); paramMap.put("files", URLEncoder.encode("[]", "utf-8")); paramMap.put("sign_id", "24179251"); paramMap.put("mouse_pwd", "45,46,39,51,46,39,45,42,22,46,51,47,51,46,51,47,51,46,51,47,51,46," + "51,47,51,46,51,47,22,46,41,39,38,47,22,46,44,41,41,51,40,41,39," + nowTime + "0"); paramMap.put("mouse_pwd_t", nowTime); paramMap.put("mouse_pwd_isclick", "0"); paramMap.put("__type__", "thread"); HttpEntity entity = HttpUtils.mapToEntity(paramMap); HttpPost post = new HttpPost( "http://tieba.baidu.com/f/commit/thread/add"); post.setEntity(entity); response = httpClient.execute(post); html = EntityUtils.toString(response.getEntity()); if (html.contains("\"no\":0,\"err_code\":0")) { return "在" + tiebaName + "吧发帖成功"; } else { return "发帖失败了,错误码信息:" + html; } } private String getFid(String tiebaName) throws Exception { HttpResponse response = null; String fid = null; ArrayList<String> urllist = getTieziUrl(tiebaName); if (urllist.size() == 0) { return null; } // 随便进个帖子 拿到 fid HttpGet get = new HttpGet(urllist.get(1)); response = httpClient.execute(get); html = EntityUtils.toString(response.getEntity()); fid = HttpUtils.myRegex("fid(=|:')[0-9].+?(&|',)", html).get(0); if (fid.contains("=")) { fid = HttpUtils.mid(fid, "=", "&"); } if (fid.contains(":")) { fid = HttpUtils.mid(fid, ":'", "',"); } return fid; } private ArrayList<String> get0Answer(String tiebaName) throws Exception { HttpResponse response = null; ArrayList<String> urlList = new ArrayList<String>(); ArrayList<String> topTidList; String tiebaUrl = "http://tieba.baidu.com/f?ie=utf-8&kw=" + URLEncoder.encode(tiebaName, "UTF-8"); HttpGet get = new HttpGet(tiebaUrl); response = httpClient.execute(get); html = EntityUtils.toString(response.getEntity()); if (html.contains("抱歉,根据相关法律法规和政策,本吧暂不开放")) { return urlList; } Document doc = Jsoup.parse(html); // 获得首页除置顶帖以外的全部帖子 Elements els = doc.select("li[class= j_thread_list clearfix]"); for (Element e : els) { String str = e.text().toString(); // System.out.println(str); // 如过开头是0表明0个回复。 str的内容是: 0 测试 陌生人左右丶 00:36 if (str.startsWith("0")) { Elements els1 = e.getElementsByTag("a"); for (Element e1 : els1) { String url = e1.attr("href"); topTidList = HttpUtils.myRegex("/p/[0-9]{2,12}", url); for (int i = 0; i < topTidList.size(); i++) { url = topTidList.get(i); urlList.add("http://tieba.baidu.com" + url); } } } } return urlList; } /** * <a href="http://twitter.com/param">@param</a> tiebaName * 要获取url的贴吧名称 * <a href="http://twitter.com/return">@return</a> 返回指定贴吧的首页帖子url集合 * <a href="http://twitter.com/throws">@throws</a> IOException */ private ArrayList<String> getTieziUrl(String tiebaName) throws Exception { HttpResponse response = null; ArrayList<String> urlList = new ArrayList<String>(); ArrayList<String> topTidList; try { String tiebaUrl = "http://tieba.baidu.com/f?ie=utf-8&kw=" + URLEncoder.encode(tiebaName, "UTF-8"); HttpGet get = new HttpGet(tiebaUrl); response = httpClient.execute(get); html = EntityUtils.toString(response.getEntity()); if (html.contains("抱歉,根据相关法律法规和政策,本吧暂不开放")) { return urlList; } Document doc = Jsoup.parse(html); Elements els = doc.select("li[class= j_thread_list clearfix]"); for (Element e : els) { Elements els1 = e.getElementsByTag("a"); for (Element e1 : els1) { // 首先拿到指定贴吧的 首页的和全部帖子连接" "/p/2777392166"而后拼接成完整的url String url = e1.attr("href"); topTidList = HttpUtils.myRegex("/p/[0-9]{2,12}", url); for (int i = 0; i < topTidList.size(); i++) { url = topTidList.get(i); urlList.add("http://tieba.baidu.com" + url); } } } } catch (UnsupportedEncodingException e) { e.printStackTrace(); } return urlList; } /** 拿到tbs (下面是一个获取tbs的api) */ private String getTbs() throws Exception { HttpGet get = new HttpGet("http://tieba.baidu.com/dc/common/tbs"); response = httpClient.execute(get); html = EntityUtils.toString(response.getEntity()); return HttpUtils.mid(html, ":\"", "\","); } /** 回复帖 */ public String replyPost(String tid, String content, String tiebaName) throws Exception { /** 暂时还没想到办法来获取floor_num **/ String floor_num = "1"; if (!postName.equals(tiebaName)) { postFid = getFid(tiebaName); postName = tiebaName; } System.out.println("fid:" + postFid); if (postFid == null) { System.err.println("未知错误"); return "未知错误"; } String tbs = getTbs(); String nowTime = System.currentTimeMillis() + ""; // 构造map集合形式的回帖表单 HashMap<String, String> paramMap = new HashMap<String, String>(); paramMap.put("ie", "utf-8"); paramMap.put("kw", postName); paramMap.put("fid", postFid); paramMap.put("tid", tid); paramMap.put("vcode_md5", ""); paramMap.put("floor_num", floor_num); paramMap.put("rich_text", "1"); paramMap.put("tbs", tbs); paramMap.put("content", content); paramMap.put("files", "[]"); paramMap.put("mouse_pwd", "45,46,39,51,46,39,45,42,22,46,51,47,51,46,51,47,51,46,51,47,51,46," + "51,47,51,46,51,47,22,46,41,39,38,47,22,46,44,41,41,51,40,41,39," + nowTime + "0"); paramMap.put("mouse_pwd_t", nowTime); paramMap.put("mouse_pwd_isclick", "0"); paramMap.put("__type__", "reply"); HttpEntity entity = HttpUtils.mapToEntity(paramMap); HttpPost post = new HttpPost( "http://tieba.baidu.com/f/commit/post/add"); // 设置回帖延迟,否则会被百度断定发帖过快 RequestConfig config = RequestConfig.custom().setSocketTimeout(5000) .setConnectTimeout(5000).build(); post.setConfig(config); post.setEntity(entity); response = httpClient.execute(post); html = EntityUtils.toString(response.getEntity()); System.out.println(html); if (html.contains("\"no\":0,\"err_code\":0")) { return "在" + tiebaName + "吧成功抢到一个二楼"; } else { return "回帖失败了,错误码信息:" + html; } } /** 抢二楼 **/ public void TakeTheSecondFloor(final String tiebaName, final String contents[], final int time) { final int len = contents.length; new Thread(new Runnable() { <a href="http://twitter.com/Override">@Override</a> public void run() { while (isQL) { try { Random random = new Random(); int index = random.nextInt(len); String tid; ArrayList<String> linksList = get0Answer(tiebaName); for (int i = 0; i < linksList.size() && linksList.size() != 0; i++) { tid = linksList.get(i).substring(25); String message = replyPost(tid, contents[index], tiebaName); System.out.println(message); } Thread.sleep(time); } catch (Exception e) { e.printStackTrace(); } } } }).start(); } }