python开源项目Scrapy抓取文件乱码解决

时间 2019-11-11

标签 python 开源项目 scrapy 抓取文件乱码解决栏目 Python 繁體版

原文原文链接

scrapy进行页面抓去的时候，保存的文件出现乱码，通过分析是编码的缘由，只须要把编码转换为utf-8便可，代码片断html

......

import chardet

......

content_type = chardet.detect(html_content)

#print(content_type['encoding'])

if content_type['encoding'] != "UTF-8":

html_content = html_content.decode(content_type['encoding'])

html_content = html_content.encode("utf-8")

open(filename,"wb").write(html_content)

....

这样保存的文件就是中文了。

步骤:

先把gb2312的编码转换为unicode编码

而后在把unicode编码转换为utf-8.