colly 入门指南 ##3

时间 2019-11-13

标签 colly 入门指南繁體版

原文原文链接

使用colly以前，请确保您拥有最新的版本。有关详细信息，请参阅安装指南。git

让咱们从一些简单的例子开始。github

首先，你须要导入Colly到你的代码库:网络

import "github.com/gocolly/colly"

收集器

Colly的主要实体是一个收集器对象。Collector管理网络通讯，并负责在运行收集器做业时执行附加的回调。要使用colly，您必须初始化一个收集器:函数

c := colly.NewCollector()

回调

您能够将不一样类型的回调函数附加到收集器，以控制收集做业或检索信息。查看包文档中的相关部分。对象

向收集器添加回调

c.OnRequest(func(r *colly.Request) {
    fmt.Println("Visiting", r.URL)
})

c.OnError(func(_ *colly.Response, err error) {
    log.Println("Something went wrong:", err)
})

c.OnResponse(func(r *colly.Response) {
    fmt.Println("Visited", r.Request.URL)
})

c.OnHTML("a[href]", func(e *colly.HTMLElement) {
    e.Request.Visit(e.Attr("href"))
})

c.OnHTML("tr td:nth-of-type(1)", func(e *colly.HTMLElement) {
    fmt.Println("First column of a table row:", e.Text)
})

c.OnXML("//h1", func(e *colly.XMLElement) {
    fmt.Println(e.Text)
})

c.OnScraped(func(r *colly.Response) {
    fmt.Println("Finished", r.Request.URL)
})

回调的调用顺序

1. OnRequestblog

在请求以前调用文档

2. OnErrorget

若是请求期间发生错误，则调用回调函数

3.OnResponseit

收到响应后调用

4. OnHTML

若是接收到的内容是HTML，则在OnResponse以后当即调用

5. OnXML

若是接收到的内容是HTML或XML，则在OnHTML以后当即调用

6. OnScraped

在OnXML回调以后调用