scrapy版本:1.5.0css
scrapy内置selector创建在lxml上。html
能够使用xpath和css方法来进行解析,二者都返回列表;node
sel = Selector(text=body).xpath('//div[@class="ip_list"/text()]').extract()express
selector中也能够使用re()方法进行正则解析,使用方法相似于re库;less
class scrapy.selector.
Selector
(response=None, text=None, type=None)scrapy
response is an HtmlResponse or an XmlResponse object that will be used for selecting and extracting data.spa
text is a unicode string or utf-8 encoded text for cases when a response isn’t available. Using text and response together is undefined behavior.code
type defines the selector type, it can be "html", "xml" or None (default).xml
If type is None, the selector automatically chooses the best type based on response type (see below), or defaults to "html" in case it is used together with text.htm
If type is None and a response is passed, the selector type is inferred from the response type as follows:
"html" for HtmlResponse type
"xml" for XmlResponse type
"html" for anything else
Otherwise, if type is set, the selector type will be forced and no detection will occur.
Apply the given regex and return a list of unicode strings with the matches.
regex can be either a compiled regular expression or a string which will be compiled to a regular expression using re.compile(regex)
Serialize and return the matched nodes as a list of unicode strings. Percent encoded content is unquoted.
remove_namespaces
()Remove all namespaces, allowing to traverse the document using namespace-less xpaths. See example below.
selector类对象是内建list的一个子类,能够理解为多个selector对象组合,对selectorlist对象使用xpath,css,extract,re方法能够理解为对list中每个对象使用方法后再将返回组合为一个列表(注意:返回值并非做为一个总体进行插入)。