python下载时报错 Errno 10060] A connection attempt failed because the connected party did not properly re

时间 2019-12-11

标签 python 下载报错 errno connection attempt failed connected party properly 栏目 Python 繁體版

原文原文链接

def downloadXml(isExists,filedir,filename):
    if not isExists:
        os.mkdir(filedir)
    local = os.path.join(filedir,filename)
    urllib2.urlopen(url,local)

报错：segmentfault

Traceback (most recent call last):
File "C:\Users\william\Desktop\nova xml\New folder\download_xml.py", line 95, in <module>
downloadXml(isExists,filedir,filename)
File "C:\Users\william\Desktop\nova xml\New folder\download_xml.py", line 80, in downloadXml
urllib.urlretrieve(url,local)
File "E:\Python27\lib\urllib.py", line 98, in urlretrieve
return opener.retrieve(url, filename, reporthook, data)
File "E:\Python27\lib\urllib.py", line 245, in retrieve
fp = self.open(url, data)
File "E:\Python27\lib\urllib.py", line 213, in open
return getattr(self, name)(url)
File "E:\Python27\lib\urllib.py", line 350, in open_http
h.endheaders(data)
File "E:\Python27\lib\httplib.py", line 1053, in endheaders
self._send_output(message_body)
File "E:\Python27\lib\httplib.py", line 897, in _send_output
self.send(msg)
File "E:\Python27\lib\httplib.py", line 859, in send
self.connect()
File "E:\Python27\lib\httplib.py", line 836, in connect
self.timeout, self.source_address)
File "E:\Python27\lib\socket.py", line 575, in create_connection
raise err
IOError: [Errno socket error] [Errno 10060] A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond
>>> socket

google查找答案，搜索：urlretrieve Errno 10060网站

在 https://segmentfault.com/q/1010000004386726中提到是：频繁的访问某个网站会被认为是DOS攻击，一般作了Rate-limit的网站都会中止响应一段时间，你能够Catch这个Exception，sleep一段时间而后重试，也能够根据重试的次数作exponential backup off。google

想了一个简单的办法，就是每次下载之间加个延时，将代码修改以下：url

def downloadXml(isExists,filedir,filename):
    if not isExists:
        os.mkdir(filedir)
    local = os.path.join(filedir,filename)
    time.sleep(1)
    urllib.urlretrieve(url,local)

执行。原本是在第80条左右的数据就开始time out，但如今一直执行到2300多条数据。惋惜，最后又time out。 spa

这里，若延长延时，将1s改成5s等，虽然可能不会报错，但我想，这样，太费时间了。由于不报错时，也要延时5s，不如等报错时再延时重试。code

因而，xml

def downloadXml(isExists,filedir,filename):
    if not isExists:
        os.makedirs(filedir)
    local = os.path.join(filedir,filename)
    try:
        urllib.urlretrieve(url,local)
    except Exception as e:
        time.sleep(5)
        urllib.urlretrieve(url,local)

这样的话，发现会卡在某条数据，不向后执行。因此只好改成在某条数据上，最多重试10次。blog

def downloadXml(flag_exists,file_dir,file_name,xml_url):
    if not flag_exists:
        os.makedirs(file_dir)
    local = os.path.join(file_dir,file_name)
    try:
        urllib.urlretrieve(xml_url,local)
    except Exception as e:
        print e
        cur_try = 0
        total_try = 10
        if cur_try < total_try:
            cur_try +=1
            time.sleep(15)
            return downloadXml(flag_exists,file_dir,file_name,xml_url)
        else:
            raise Exception(e)

这样执行后，果真再也不报错，顺利执行完了。但一想，有个问题，使用哪一个URL进行下载失败，没有记录下来。因此又添加了将失败的url写入本地文本的功能。后面能够查看，并手动执行。get

def downloadXml(flag_exists,file_dir,file_name,xml_url):
    if not flag_exists:
        os.makedirs(file_dir)
    local = os.path.join(file_dir,file_name)
    try:
        urllib.urlretrieve(xml_url,local)
    except Exception as e:
        print 'the first error: ',e
        cur_try = 0
        total_try = 10
        if cur_try < total_try:
            cur_try +=1
            time.sleep(15)
            return downloadXml(flag_exists,file_dir,file_name,xml_url)
        else:
            print 'the last error: '
            with open(test_dir + 'error_url.txt','a') as f:
                f.write(xml_url)
            raise Exception(e)

遗憾的是，此次竟再没有失败的url了，多是网站这时流量不大。