最近须要从谷歌和必应上爬一批图片,可是基于不一样网站有不一样的规则,因此对于我这个爬虫小白来讲,URL以及正则化表达式的理解和查改就很困难。python
后来在github上发现了很好用的工具,简便快捷,正好分享给你们。git
1.从谷歌上爬取图片数据——google-images-downloadgithub
https://github.com/hardikvasa/google-images-download算法
下载图片的算法逻辑结构:json
安装使用很是简单,可使用如下几个方法之一进行安装:svg
pip install google_images_download
git clone https://github.com/hardikvasa/google-images-download.git cd google-images-download && sudo python setup.py install
转到Github上的repo=> 单击“Clone or Download”==> 单击“Download ZIP”并将其保存到本地磁盘上
安装或下载好以后,进行图片的爬取:工具
googleimagesdownload [Arguments...]
python3 google_images_download.py [Arguments...]
或者网站
python google_images_download.py [Arguments...]
常见的参数及命令以下所示:google
googleimagesdownload -cf example.json
googleimagesdownload --keywords "Polar bears, baloons, Beaches" --limit 20
googleimagesdownload --k "car" -sk 'red,blue,white' -l 10
googleimagesdownload -k "Polar bears, baloons, Beaches" -l 20
googleimagesdownload --keywords "logo" --format svg
googleimagesdownload -k "playground" -l 20 -co red
googleimagesdownload -k "北极熊" -l 5
googleimagesdownload -k "sample" -u <google images page URL>
googleimagesdownload -k "boat" -o "boat_new"
googleimagesdownload --keywords "baloons" --single_image <URL of the images>
googleimagesdownload --keywords "baloons" --size medium --type animated
googleimagesdownload --keywords "universe" --usage_rights labeled-for-reuse
googleimagesdownload --keywords "flowers" --color_type black-and-white
googleimagesdownload --keywords "universe" --aspect_ratio panoramic
googleimagesdownload -si <image url> -l 10
googleimagesdownload --keywords "universe" --specific_site example.com
2.从bing上爬取图片数据——Bulk-Bing-Image-downloaderurl
https://github.com/ostrolucky/Bulk-Bing-Image-downloader
使用很是简单:
bbid.py [-h] [-s SEARCH_STRING] [-f SEARCH_FILE] [-o OUTPUT] [--adult-filter-on] [--adult-filter-off] [--filters FILTERS] [--limit LIMIT]
./bbid.py -s "hello world"
-----------------------持续补充-------------------------