在python2中的编码

时间 2020-08-06
原文原文链接
在python2中的编码python
#_author:star#date:2019/10/29'''字符编码：ASCII:只能存英文和拉丁字符，gb2312:只能6700中文，1980年gbk1.0:存了20000多字符，1995年gb18030:2000年，27000中文unicode：UTF_32一个字符占4个字节unicode：UTF_16一个字符占2个字节或两个以上,65535unicode：UTF_8一个英文用ASCII来存，一个中文占3个字节（1）在 python2 中UTF-8先解码到Unicode而后Unicode在编码到GBK（2）在 python2 中GBK先解码到Unicode而后Unicode在编码到UTF—8（3）在 python3中的encode()和python2 中的encode()不一样，python2里的encode（）只是单纯的编码。python3中的encode()在编码的同时还要将其转为bytes类型，decode（）在解码的同时还要将bytes类型转为字符串'''s='特斯拉's_to_unicode=s.decode('UTF-8')#（1）在 python2 中UTF-8先解码到Unicodeunicode_to_gbk=s_to_unicode.encode("gbk")#，而后Unicode在编码到GBKprint(s)#utf-8 乱码print('unicode:',s_to_unicode)#unicodeprint('gbk:',unicode_to_gbk)#gbkgbk_to_unicode=unicode_to_gbk.decode('gbk')unicode_to_utf8=gbk_to_unicode.encode('utf-8')print(gbk_to_unicode)print(unicode_to_utf8)