Jsoup进阶之获取指定数据

时间 2019-11-17

标签 jsoup 进阶获取指定数据栏目 Java 繁體版

原文原文链接

使用Jsoup解析html中的指定数据，十分方便。Jsoup工具十分强大，十分好用。但网上彷佛没有很好的例子，本文的目的即在于此。建议仔细阅读代码中的几个例子，Jsoup解析数据不外乎这几种类型。css

第一步：将Jsoup JAR包导入项目html

第二步：使用Jsoup API正则表达式

1，定位工具

经过div的属性值，定位到html的div（块），即所须要内容对应的块。spa

示例代码以下：.net

<div class="content">

2，筛选数据code

a，经过标签头，在div中继续筛选数据。可能会找到不少的数据，这里会用到循环。见eg1。orm

//eg1:解析百度音乐             
Document doc = Jsoup.connect("http://list.mp3.baidu.com/top/singer/A.html").get();//打开连接          
Element singerListDiv = doc.getElementsByAttributeValue("class", "content").first(); //这时候该html流文件存在内存中，css selector class=content 类型          
Elements links = singerListDiv.getElementsByTag("a");//调用class=content 里面的 a 标签                 
for (Element link: links) {//使用循环                  
   String linkHref = link.attr("href");                 
   String linkText = link.text().trim();                 
   System.out.println(linkHref);              
}

b，经过标签名，在div中筛选数据，选中此标签内的全部数据。见eg2htm

//eg2:解析万年历         
Document doc = Jsoup.connect("http://www.nongli.com/item4/index.asp?dt=2012-03-03").get();         
Element infoTable = doc.getElementsByAttributeValue("class", "table002").first();//得到信息表数据         
Elements tableLineInfos = infoTable.select("tr"); //对该信息表继续进行筛选，得到一个tr 集合 数据       
for (Element lineInfo : tableLineInfos) {             
    String lineInfoContent = lineInfo.select("td").last().text().trim(); //得到tr集合中的一个td元素     
    System.out.println("jsoup is :" + lineInfoContent);         
    }

c，限定筛选条件。若是eg5图片

//eg5:查找html元素         
File input = new File("/tmp/input.html");         
Document doc = Jsoup.parse(input, "UTF-8", "http://www.oschina.net/");         
Elements links = doc.select("a[href]"); // 连接         
Elements pngs = doc.select("img[src$=.png]"); // 全部 png 的图片         
Element masthead = doc.select("div.masthead").first();// div with class=masthead         
Elements resultLinks = doc.select("h3.r > a"); // direct a after h3

NOTE: <td colspan="2" class="l3">二月15日<br>壬辰年<br>癸卯月<br>丁卯日<br></td>

此处不能直接单独得到三组数据，得到的是总的三个数据。可经过正则表达式分解

3，获取数据

即element.text()便可得到相关数据

plus: Jsoup有灵活的语法，好比经过class-value对指定div块，可经过select(div.value)找到，更多用法请参考文档！

参考中文文档：

中文jsoup 地址