改善wget没法限制下载文件大小的缺陷

wget 没法限制下载文件的大小,若是你的URL列表中有一个很大的文件,势必致使下载过程延长,故用curl得到文件的header,解析出其中的content-length,来得到将要下载的文件长度。若是超过预先设置的threshold,则不予下载。
 
固然目前的bash shell版本对不存在content-length的URL,不作特殊处理,因此仍没法避免大文件的下载。具体扩展思路:
在下载过程当中,不断query文件的大小,当超过必定阈值,kill掉下载进程。
 
因此当前版本还有待改进:
if [ $# -eq 4 ] then     echo "start downloading..."     urllist=$1     limitsize=$2     outfolder=$3     logfolder=$4     echo "url list file:$urllist"     echo "limited file size:$limitsize bytes"     echo "output folder:$outfolder"     echo "log folder:$logfolder" else     echo "usage: ./download.sh <url list> <limited file size> <output folder> <log folder>..."     exit fi if [ -d "$outfolder" ] then     echo "$outfolder exists..." else     echo "make $outfolder..."     mkdir $outfolder fi if [ -d "$logfolder" ] then     echo "$logfolder exists..." else     echo "make $logfolder..."     mkdir $logfolder fi cat $urllist|while read url;do     echo "downloading:$url"     len=$(curl -I -s "$url"|grep Content-Length|cut -d' ' -f2|tr -d '\15')     if [ ! -z $len ]     then         echo "length:$len bytes"         if [ $len -gt $limitsize ]         then             echo "$url is greater than $limitsize bytes, can't be downloaded."         else             echo "$url is smaller than $limitsize bytes, can be downloaded."             filename=$(echo $url|tr -d ':/?\|*<>')             wget -P $outfolder -x -t 3 --save-headers --connect-timeout=10 --read-timeout=10 --level=1 $url -o $logfolder/$filename.txt         fi     else         echo "$url file size is unknown."         filename=$(echo $url|tr -d ':/?\|*<>')         wget -P $outfolder -x -t 3 --save-headers --connect-timeout=10 --read-timeout=10 --level=1 $url -o $logfolder/$filename.txt     fi done
相关文章
相关标签/搜索