HtmlParser中的各类Filter

全部的Filter均实现了NodeFilter接口,此接口只有一个方法Boolean accept(Node node),用于肯定某个节点 是否属于此Filter过滤的范围。 HtmlParser在org.htmlparser.filters包以内一共定义了16个不一样的Filter,也能够分为几类。html

判断类Filter: TagNameFilternode

                  HasAttributeFilterorm

                  HasChildFilterhtm

                  HasParentFilter接口

                  HasSiblingFilterip

                  IsEqualFilterget

逻辑运算Filterit

                  AndFilterio

                  NotFilterList

                  OrFilter

                  XorFilter

其余Filter:

                 NodeClassFilter

                 StringFilter

                 LinkStringFilter

                 LinkRegexFilter

                 RegexFilter

                 CssSelectorNodeFilter

除此以外,能够自定义一些Filter,用于完成特殊需求的过滤

 

Tag类

  主要和NodeClassFilter配合使用

         Remark:注释

         AppletTag:

         BaseHrefTag:

         Body Tag:"BODY";//getBody();内部调用额是toPlainTextString();

         Bullet:"LI"

         BulletList:"UL","OL"

         CompositeTag:

         DefinitionList:"DL"

         DefinitionListBullet:"DD","DT"

         Div:"DIV"

         DoctypeTag:“!DOCTYPE"

         FormTag:

         FrameSetTag:

         FrameTag:

         HeadingTag:"H1","H2","H3","H4","H5","H6"

         HeadTag:"HEAD"

         Html:"HTML"

         ImageTag:

         InputTag:"INPUT"

         JspTag:"%","%=","%@"

         LabelTag:"LABEL"

        

         LinkTag:

         MetaTag:

         ObjectTag:

         OptionTag:

         ParagraphTag:"P"

         ProcessingInstructionTag:"?"

         ScriptTag:

         SelectTag:"SELECT"

         Span:"SPAN"

         StyleTag:"STYLE"

          TableColumn:"TD"

          TableHeader:"TH"

          TableRow:"TR"

          TableTag:"TABLE"

          TagNode:

          TextareaTag:"TEXTAREA"

          TitleTag:"TITLE"

           TextNode:

相关文章
相关标签/搜索