全部的Filter均实现了NodeFilter接口,此接口只有一个方法Boolean accept(Node node),用于肯定某个节点 是否属于此Filter过滤的范围。 HtmlParser在org.htmlparser.filters包以内一共定义了16个不一样的Filter,也能够分为几类。html
判断类Filter: TagNameFilternode
HasAttributeFilterorm
HasChildFilterhtm
HasParentFilter接口
HasSiblingFilterip
IsEqualFilterget
逻辑运算Filterit
AndFilterio
NotFilterList
OrFilter
XorFilter
其余Filter:
NodeClassFilter
StringFilter
LinkStringFilter
LinkRegexFilter
RegexFilter
CssSelectorNodeFilter
除此以外,能够自定义一些Filter,用于完成特殊需求的过滤
Tag类
主要和NodeClassFilter配合使用
Remark:注释
AppletTag:
BaseHrefTag:
Body Tag:"BODY";//getBody();内部调用额是toPlainTextString();
Bullet:"LI"
BulletList:"UL","OL"
CompositeTag:
DefinitionList:"DL"
DefinitionListBullet:"DD","DT"
Div:"DIV"
DoctypeTag:“!DOCTYPE"
FormTag:
FrameSetTag:
FrameTag:
HeadingTag:"H1","H2","H3","H4","H5","H6"
HeadTag:"HEAD"
Html:"HTML"
ImageTag:
InputTag:"INPUT"
JspTag:"%","%=","%@"
LabelTag:"LABEL"
LinkTag:
MetaTag:
ObjectTag:
OptionTag:
ParagraphTag:"P"
ProcessingInstructionTag:"?"
ScriptTag:
SelectTag:"SELECT"
Span:"SPAN"
StyleTag:"STYLE"
TableColumn:"TD"
TableHeader:"TH"
TableRow:"TR"
TableTag:"TABLE"
TagNode:
TextareaTag:"TEXTAREA"
TitleTag:"TITLE"
TextNode: