BeautifulSoup的用法

时间 2019-12-08

标签 beautifulsoup 用法繁體版

原文原文链接

BeautifulSoup是一个模块，该模块用于接收一个HTML或XML字符串，而后将其进行格式化，以后遍能够使用他提供的方法进行快速查找指定元素，从而使得在HTML或XML中查找指定元素变得简单。html

 
      from 
      bs4  
      import 
      BeautifulSoup 
     
      html_doc  
      = 
      """ 
     
      <html><head><title>The Dormouse's story</title></head> 
     
      <body> 
     
      asdf 
     
      <div class="title"> 
     
      <b>The Dormouse's story总共</b> 
     
      <h1>f</h1> 
     
      </div> 
     
      <div class="story">Once upon a time there were three little sisters; and their names were 
     
      <a  class="sister0" id="link1">Els<span>f</span>ie</a>, 
     
      <a href="http://example.com/lacie" class="sister" id="link2">Lacie</a> and 
     
      <a href="http://example.com/tillie" class="sister" id="link3">Tillie</a>; 
     
      and they lived at the bottom of a well.</div> 
     
      ad<br/>sf 
     
      <p class="story">...</p> 
     
      </body> 
     
      </html> 
     
      """ 
     
      soup  
      = 
      BeautifulSoup(html_doc, features 
      = 
      "lxml" 
      ) 
     
      # 找到第一个a标签 
     
      tag1  
      = 
      soup.find(name 
      = 
      'a' 
      ) 
     
      # 找到全部的a标签 
     
      tag2  
      = 
      soup.find_all(name 
      = 
      'a' 
      ) 
     
      # 找到id＝link2的标签 
     
      tag3  
      = 
      soup.select( 
      '#link2' 
      )

再给你们举个常常用的例子：python

一，这个参数须要一个字符串，response是一个对象，response.text是字符串，url

二，’lxml‘是解析器，指定用lxmL作解析器spa

三，四，这两个合在一块儿，是找到类选择器为btnlinks的p标签code

五，找出标签元素里href对应的urlorm

注意：find获得的是第一个标签元素，find_all是获得的是一个列表，元素是全部的符合条件的标签，能够经过下标获取元素。xml