Python3.x：BeautifulSoup()解决中文乱码问题

时间 2019-11-17

标签 python3.x python beautifulsoup 解决中文乱码问题栏目 Python 繁體版

原文原文链接

Python3.x：BeautifulSoup()解决中文乱码问题

问题：

　　BeautifulSoup获取网页内容，中文显示乱码；html

解决方案：

　　遇到状况也是比较奇葩，利用chardet获取网页编码，而后在BeautifulSoup构造器中传入from_encoding=参数，获取的仍是一堆乱码；网络

无奈之下，在网络上大搜索一通，结果仍是没搞清楚缘由，可是问题却是找到了解决方案；ide

在这里提供下，给遇到一样问题的码友：编码

若是中文页面编码是gb2312，gbk，在BeautifulSoup构造器中传入from_encoding="gb18030"参数便可解决乱码问题，url

即便分析的页面是utf8的页面使用gb18030也不会出现乱码问题；spa

import requests
from bs4 import BeautifulSoup
all_url = ""
start_html= requests.get(all_url, headers=Hostreferer)
#若是中文页面编码是gb2312，gbk，在BeautifulSoup构造器中传入from_encoding="gb18030"参数便可解决乱码问题，即便分析的页面是utf8的页面使用gb18030也不会出现乱码问题
soup = BeautifulSoup(start_html.content, "html.parser", from_encoding="gb18030")

这里chardet的方式也贴出来，供你们参考：code

import urllib.request 
import chardet 
all_url = ""
charset1=chardet.detect(urllib.request.urlopen(all_url).read() )
print(charset1)
#输出结果： {'encoding': 'GB2312', 'confidence': 0.99, 'language': 'Chinese'}
bmfs = charset1['encoding']
print(bmfs)
#输出结果：GB2312

soup = BeautifulSoup(start_html.content, "html.parser", from_encoding=bmfs)

做者：整合侠
连接：http://www.cnblogs.com/lizm166/p/8319919.html
来源：博客园
著做权归做者全部。商业转载请联系做者得到受权，非商业转载请注明出处。htm