python Beautiful Soup 采集it books pdf,免费下载

时间 2019-12-08

标签 python beautiful soup 采集 books pdf 免费下载栏目 Python 繁體版

原文原文链接

http://www.allitebooks.org/
是我见过最良心的网站，全部书籍免费下载
周末无聊，尝试采集此站全部Pdf书籍。html

采用技术

python3.5
Beautiful souppython

分享代码

最简单的爬虫，没有考虑太多的容错，建议你们尝试的时候，温柔点，别把这个良心网站搞挂掉了json

# www.qingmiaokeji.cn 30
from bs4 import BeautifulSoup
import requests
import json

siteUrl = 'http://www.allitebooks.org/'


def category():
    response = requests.get(siteUrl)
    # print(response.text)
    categoryurl = []
    soup = BeautifulSoup(response.text,"html.parser")
    for a in soup.select('.sub-menu li a'):
        categoryurl.append({'name':a.get_text(),'href':a.get("href")})
    return categoryurl

def  bookUrlList(url):
    # urls = []
    response = requests.get(url['href'])
    soup = BeautifulSoup(response.text,"html.parser")
    a = soup.select(".pagination a[title='Last Page →']")
    nums = 0
    for e in a:
        nums = int(e.get_text())
        # print(e.get_text())
    for i in range(1,nums+1):
        # print(url+"page/"+str(i))
        # urls.append(url+"page/"+str(i))
        bookList(url['href']+"page/"+str(i))

def bookList(url):
    response = requests.get(url)
    soup = BeautifulSoup(response.text,"html.parser")
    article = soup.select(".main-content-inner article .entry-title a")
    for i in article:
        url = i.get("href")
        getBookDetail(url)

def  getBookDetail(url):
    response = requests.get(url)
    soup = BeautifulSoup(response.text,"html.parser")
    title = soup.select(".single-title")[0].text
    imgurl = soup.select(".entry-body-thumbnail .attachment-post-thumbnail")[0].get("src")
    downLoadPdfUrl = soup.select(".download-links a")[0].get("href")
    with open('d:/booklist.txt', 'a+',encoding='utf-8') as f:
        f.write(title+" | ![]("+imgurl+") | "+ downLoadPdfUrl+"\n")


if __name__ == '__main__':
    
    list = category()
    for url in list:
        bookUrlList(url)

1. python - Beautiful Soup(一)
2. python Beautiful Soup库
3. python之Beautiful Soup库
4. Beautiful Soup
5. python-61: Beautiful Soup 4
6. Python Beautiful Soup简介
7. Beautiful Soup Documentation
8. python,xml,dom,tree,parser,soup,beautiful soup
9. beautiful soup安装
10. Beautiful Soup用法
更多相关文章...
• 免费ARP详解 - TCP/IP教程
• MySQL下载步骤详解 - MySQL教程
• ☆技术问答集锦（13）Java Instrument原理
• Docker容器实战(七) - 容器眼光下的文件系统

python Beautiful Soup 采集it books pdf,免费下载

采用技术

分享代码