按照国际惯例,我首先应该介绍下Jsoup是个什么东西,而后在介绍下具体用法,而后在来个demo演示,其实我也是这么想的,小编今天花了一天的时间从学习—>解析页面,总算是成果圆满了吧,啊哈,可是,一个不会总结的程序猿不是一个帅气的程序猿,啊哈,这就意味着我是个帅气的猿猿java
----------------------------------------------------------------------------------------------------------------------node
1、什么是Jsoup?post
官网网站:http://jsoup.org/ 学习
可在官网下载对应的jar网站
通俗的将Jsoup就是一个解析网页的东西,而后咱们在看下官方的解释:编码
官方解释就是高大上~url
2、Jsoup的基本用法(http://www.open-open.com/jsoup/parsing-a-document.htm)spa
网站写的很详细,我想聪明的你们看看开发文档一看就懂…恩,有道理,正所谓帅的人都能看懂..code
3、demo演示 解析的url:http://sex.guokr.com/htm
写在前面:忽略连接内容,小编就是找到一个不错的网站~,啊哈,别想歪了
1.解析一个ul –>li
咱们来看下这段的源代码:
由此咱们知道了大致的样子,如今咱们来写编码
import org.jsoup.Jsoup; import org.jsoup.nodes.Document; import org.jsoup.nodes.Element; import org.jsoup.select.Elements; import java.io.IOException; /** * 使用Jsoup解析url * @tag:url :http://sex.guokr.com/ * Created by monster on 2015/12/11. */ public class JsoupZX { public static void main(String[] args){ final String url="http://sex.guokr.com/" ; try { Document doc = Jsoup.connect(url).get(); Elements container = doc.getElementsByClass("container"); Document containerDoc = Jsoup.parse(container.toString()); Elements module = containerDoc.getElementsByClass("module-list"); Document moduleDoc = Jsoup.parse(module.toString()); //Elements clearfix = moduleDoc.getElementsByClass("clearfix"); //DOM的形式 Elements clearfix = moduleDoc.select(".clearfix"); //选择器的形式 for (Element clearfixli : clearfix){ Document clearfixliDoc = Jsoup.parse(clearfixli.toString()); Elements kind = clearfixliDoc.select(".board-tag"); //选择器的形式 Elements title = clearfixliDoc.select(".tit-post"); Elements author = clearfixliDoc.select("span a"); System.out.println("类别"+kind.text()); //分类 System.out.println("标题"+title.text()); //标题 System.out.println("做者"+author.text()); //做者 System.out.println("详情连接"+title.attr("href")); //标题下的连接 System.out.println("====================="); } // String title = clearfixli.getElementsByTag("a").text(); // System.out.println(clearfix); } catch (IOException e) { e.printStackTrace(); } } }
结果:
=================================================================================================
2.解析详情页面和评论
连接:http://sex.guokr.com/post/1100992/
上述就是页面的内容
而后咱们看下源码:
内容:
评论:
看完源码后,咱们进行编码:
import org.jsoup.Jsoup; import org.jsoup.nodes.Document; import org.jsoup.nodes.Element; import org.jsoup.select.Elements; import java.io.IOException; /**
* 使用Jsoup解析帖子详情和评论
* @tag: url:http://sex.guokr.com/post/1100992/ * Created by monster on 2015/12/11. */ public class JSoupDetail { public static void main(String args[]){ final String url= "http://sex.guokr.com/post/1100992/"; try { Document doc = Jsoup.connect(url).get(); Elements container = doc.getElementsByClass("container"); Document containerDoc = Jsoup.parse(container.toString()); String articleTitle = containerDoc.getElementById("articleTitle").text(); String authorName = containerDoc.getElementById("authorName").text(); String time = containerDoc.select("span").first().text(); String imgphotoUrl=containerDoc.select("img").get(1).attr("src"); System.out.println("标题:" + articleTitle); //标题 System.out.println("做者:"+authorName); //做者 System.out.println("发布时间:"+time); //发布时间 System.out.println("做者头像的url:"+imgphotoUrl); //发布时间 Element articleContent = containerDoc.getElementById("articleContent"); Document articleContentDoc = Jsoup.parse(articleContent.toString()); int size= articleContentDoc.select("p").size(); System.out.println("段落数:"+size); System.out.println("帖子内容:"); for (int i=0;i<size;i++){ String content = articleContentDoc.select("p").get(i).text(); System.out.println(content); } System.out.println("================================================"); System.out.println("帖子评论区域(按照楼层分布)"); Elements cmts =containerDoc.getElementsByClass("cmts"); Document cmtsDoc = Jsoup.parse(cmts.toString()); System.out.println("评论楼层:"+cmtsDoc.select("span").first().text()); Elements cmtslist =cmtsDoc.getElementsByClass("cmts-list"); for (Element clearfix:cmtslist){ String user = clearfix.select("a").get(1).text(); String userPhotoUrl =clearfix.select("img").get(0).attr("src"); String replyTime = clearfix.select("a").get(3).text(); String floor=clearfix.select("span").text(); System.out.println("评论者:"+user+"\n"+"评论者头像url:"+userPhotoUrl+"\n"+"回复时间:"+replyTime+"\n"+"所在楼层:"+floor); Document replyContentDoc = Jsoup.parse(clearfix.toString()); Elements replyContent = replyContentDoc.getElementsByClass("cmt-content"); System.out.println("评论内容:"); int s =replyContent.select("p").size(); for (int j=0;j<s;j++){ String replycontent = replyContent.select("p").get(j).text(); System.out.println(replycontent); } System.out.println("================================================"); } } catch (IOException e) { e.printStackTrace(); } } }
输出结果:
--------->
以上就是小编的demo,写的有点简单,但愿理解,啊哈~
另外:欢迎关注小编的博客,么么哒