iconv命令详解

时间 2019-11-10

标签 iconv 命令详解繁體版

原文原文链接

原文连接：https://blog.csdn.net/u012313689/article/details/53033804css

用途说明

iconv命令是用来转换文件的编码方式的（Convert encoding of given files from one encoding to another），好比它能够将UTF8编码的转换成GB18030的编码，反过来也行。JDK中也提供了相似的工具native2ascii。Linux下的iconv开发库包括iconv_open,iconv_close,iconv等C函数，能够用来在C/C++程序中很方便的转换字符编码，这在抓取网页的程序中颇有用处，而iconv命令在调试此类程序时用得着。html

经常使用参数

首先，咱们要知道支持的字符编码有哪些，这个能够用-l参数获得（List known coded character sets）。java

格式：iconv -lless

其次，是怎样转换，以下所示：curl

格式：iconv -f from-encoding -t to-encoding inputfileide

上面的调用方式，会把输出打印在屏幕上，若是要输出到文件，能够像下面这样函数

格式：iconv -f from-encoding -t to-encoding inputfile -o outputfile工具

使用示例

示例一列出支持的字符编码

[root@new55 ~]# iconv -l
The following list contain all the coded character sets known. This does
not necessarily mean that all combinations of these names can be used for
the FROM and TO command line parameters. One coded character set can be
listed with several different names (aliases).

437, 500, 500V1, 850, 851, 852, 855, 856, 857, 860, 861, 862, 863, 864, 865,
866, 866NAV, 869, 874, 904, 1026, 1046, 1047, 8859_1, 8859_2, 8859_3, 8859_4,
8859_5, 8859_6, 8859_7, 8859_8, 8859_9, 10646-1:1993, 10646-1:1993/UCS4,
ANSI_X3.4-1968, ANSI_X3.4-1986, ANSI_X3.4, ANSI_X3.110-1983, ANSI_X3.110,
ARABIC, ARABIC7, ARMSCII-8, ASCII, ASMO-708, ASMO_449, BALTIC, BIG-5,
BIG-FIVE, BIG5-HKSCS, BIG5, BIG5HKSCS, BIGFIVE, BS_4730, CA, CN-BIG5, CN-GB,
中间省略掉输出了。
EUCJP-OPEN, EUCJP-WIN, EUCJP, EUCKR, EUCTW, FI, FR, GB, GB2312, GB13000,
GB18030, GBK, GB_1988-80, GB_198880, GEORGIAN-ACADEMY, GEORGIAN-PS,
GOST_19768-74, GOST_19768, GOST_1976874, GREEK-CCITT, GREEK, GREEK7-OLD,
GREEK7, GREEK7OLD, GREEK8, GREEKCCITT, HEBREW, HP-ROMAN8, HPROMAN8, HU,
中间省略掉输出了。
TIS620.2529-1, TIS620.2533-0, TIS620, TS-5881, TSCII, UCS-2, UCS-2BE,
UCS-2LE, UCS-4, UCS-4BE, UCS-4LE, UCS2, UCS4, UHC, UJIS, UK, UNICODE,
UNICODEBIG, UNICODELITTLE, US-ASCII, US, UTF-7, UTF-8, UTF-16, UTF-16BE,
UTF-16LE, UTF-32, UTF-32BE, UTF-32LE, UTF7, UTF8, UTF16, UTF16BE, UTF16LE,
UTF32, UTF32BE, UTF32LE, VISCII, WCHAR_T, WIN-SAMI-2, WINBALTRIM,
WINDOWS-31J, WINDOWS-874, WINDOWS-936, WINDOWS-1250, WINDOWS-1251,
WINDOWS-1252, WINDOWS-1253, WINDOWS-1254, WINDOWS-1255, WINDOWS-1256,
WINDOWS-1257, WINDOWS-1258, WINSAMI2, WS2, YUpost

太多了，我只想知道支持哪些中文格式的。
[root@new55 ~]# iconv -l | grep GB
CN-GB//
CSGB2312//
CSISO58GB1988//
EBCDIC-CP-GB//
GB//
GB2312//
GB13000//
GB18030//
GBK//
GB_1988-80//
GB_198880//
ISO646-GB//

有没有发现奇怪的地方，每行显示一个，而且后面加了两个斜杠。
[root@new55 ~]#

示例二将Google香港的Big5编码转换成GBK编码

[root@new55 ~]# curl -s http://www.google.com.hk/ | iconv -f big5 -t gbk
<!doctype html><html><head><meta http-equiv="content-type" content="text/html; charset=Big5"><title>Google</title><script>window.google={kEI:"tFXZTNHKDcGTkAXpvOHhCA",kEXPI:"26637,27404",kCSI:{e:"26637,27404",ei:"tFXZTNHKDcGTkAXpvOHhCA",expi:"26637,27404"},ml:function(){},kHL:"zh-TW",time:function(){return(new Date).getTime()},log:function(b,d,c){var a=new Image,e=google,g=e.lc,f=e.li;a.onerror=(a.onload=(a.onabort=function(){delete g[f]}));g[f]=a;c=c||"/gen_204?atyp=i&ct="+b+"&cad="+d+"&zx="+google.time();a.src=c;e.li=f+1},lc:[],li:0,Toolbelt:{}};
id=ghead><div id=gbar><nobr><b class="gb1">全部網頁</b> <a onclick=gbar.qs(this) href="http://www.google.com.hk/imghp?hl=zh-tw&tab=wi" class="gb1">圖片</a> <a onclick=gbar.qs(this) href="http://video.google.com.hk/?hl=zh-tw&tab=wv" class="gb1">影片</a> <a onclick=gbar.qs(this) href="http://maps.google.com.hk/maps?hl=zh-tw&tab=wl" class="gb1">地圖</a> <a onclick=gbar.qs(this) f||document.f||document.gs;google.ac.i(form,form.q,'','','',{o:1,sw:1});google.mc = [[14,{}],[64,{}],[105,{}],[22,{"m_error":"\u003Cfont color=red\u003E錯誤：\u003C/font\u003E 伺服器無法完成您的要求。請在 30 秒後再試一次。","m_tip":"按一下以取得詳細資訊。"}],[84,{}]];google.med('init');google.History&&google.History.initialize('/')});if(google.j&&google.j.en&&google.j.xi){window.setTimeout(google.j.xi,0);google.fade=null;}</script></div><script>(function(){
中间省略掉输出了。
})();
</script>[root@new55 ~]#

示例三将个人JavaEye博客首页从UTF8转换成GBK

[root@new55 ~]# curl -s http://codingstandards.iteye.com/ | iconv -f utf8 -t gbk
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="zh-CN" dir="ltr">
<head>
    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8"/>
    <title>Bash @ Linux - JavaEye技术网站</title>
    <meta name="description" content="" />
    <meta name="keywords" content="codingstandards Bash @ Linux" />
中间省略掉输出了。
<div class="blog_main">
<div class="blog_title">
<div class="date"><span class='year'>2010</span><span class='sep_year'>-</span><span class='month'>10</span><span class='sep_month'>-</span><span class='day'>17</span></div>
<div class="show_full_flag"><a href='?show_full=true'>全文显示</a></div>
<h3><a href='/blog/786653'>[置顶] 我使用过的Linux命令系列总目录</a></h3>
<strong>文章分类:<a href="http://www.iteye.com/blogs/category/os" >操做系统</a></strong>
</div>
<div class="blog_content">

    我使用过的Linux命令系列总目录
本文连接： http://codingstandards.iteye.com/blog/786653

iconv: 未知 3345 处的非法输入序列

最后一行代表有错，改用下面的就会成功了。
[root@new55 ~]# curl -s http://codingstandards.iteye.com/ | iconv -f utf8 -t gb18030

此处省略输出。有兴趣的读者能够试一下，能够完整的显示整个页面的源代码。由于gbk是gb18030的子集，gb18030包含更多的字符。

[root@new55 ~]#

示例四将梦之都的UTF8转换成GBK

[root@new55 ~]# curl -s http://www.dreamdu.com/ | iconv -futf8 -t gbk
iconv: 未知 0 处的非法输入序列

那就把前面三个字节去掉试试，果真能够了。

[root@new55 ~]# curl -s http://www.dreamdu.com/ | cut -b 4- | iconv -futf8 -t gbk
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
ml xmlns="http://www.w3.org/1999/xhtml" xml:lang="zh-CN" dir="ltr">
ead>
meta http-equiv="content-type" content="text/html; charset=utf-8" />
meta http-equiv="content-language" content="zh-CN" />
link rel="stylesheet" type="text/css" href="/style.css?v=1" media="screen" />
script type="text/javascript" src="/js.js"></script>
title>梦之都 - 网站设计与开发教程</title>
head>
ody>

中间省略掉输出。
body>
tml>

发现问题没有，每行的前面几个字符都消失了！！！ [root@new55 ~]#

iconv命令详解

用途说明

经常使用参数

使用示例

示例一 列出支持的字符编码

示例二 将Google香港的Big5编码转换成GBK编码

示例三 将个人JavaEye博客首页从UTF8转换成GBK

示例四 将梦之都的UTF8转换成GBK

示例一列出支持的字符编码

示例二将Google香港的Big5编码转换成GBK编码

示例三将个人JavaEye博客首页从UTF8转换成GBK

示例四将梦之都的UTF8转换成GBK