Linux下字符编码转换 -- iconv命令
一、获取iconv命令的帮助及命令参数信息[root@CentOS-78 covers]# iconv --helpUsage: iconv [OPTION...] [FILE...]Convert encoding of given files from one encoding to another.Input/Output format specification:-f,
·
一、获取iconv命令的帮助及命令参数信息
[root@CentOS-78 covers]# iconv --help
Usage: iconv [OPTION...] [FILE...]
Convert encoding of given files from one encoding to another.
Input/Output format specification:
-f, --from-code=NAME encoding of original text
-t, --to-code=NAME encoding for output
Information:
-l, --list list all known coded character sets
Output control:
-c omit invalid characters from output
-o, --output=FILE output file
-s, --silent suppress warnings
--verbose print progress information
-?, --help Give this help list
--usage Give a short usage message
-V, --version Print program version
Mandatory or optional arguments to long options are also mandatory or optional
for any corresponding short options.
For bug reporting instructions, please see:
<http://www.gnu.org/software/libc/bugs.html>.
二、得到iconv所支持的字符类型
[root@CentOS-78 covers]# iconv -l
The following list contain all the coded character sets known. This does
not necessarily mean that all combinations of these names can be used for
the FROM and TO command line parameters. One coded character set can be
listed with several different names (aliases).
437, 500, 500V1, 850, 851, 852, 855, 856, 857, 860, 861, 862, 863, 864, 865,
866, 866NAV, 869, 874, 904, 1026, 1046, 1047, 8859_1, 8859_2, 8859_3, 8859_4,
8859_5, 8859_6, 8859_7, 8859_8, 8859_9, 10646-1:1993, 10646-1:1993/UCS4,
ANSI_X3.4-1968, ANSI_X3.4-1986, ANSI_X3.4, ANSI_X3.110-1983, ANSI_X3.110,
ARABIC, ARABIC7, ARMSCII-8, ASCII, ASMO-708, ASMO_449, BALTIC, BIG-5,
BIG-FIVE, BIG5-HKSCS, BIG5, BIG5HKSCS, BIGFIVE, BRF, BS_4730, CA, CN-BIG5,
CN-GB, CN, CP-AR, CP-GR, CP-HU, CP037, CP038, CP273, CP274, CP275, CP278,
CP280, CP281, CP282, CP284, CP285, CP290, CP297, CP367, CP420, CP423, CP424,
CP437, CP500, CP737, CP775, CP803, CP813, CP819, CP850, CP851, CP852, CP855,
CP856, CP857, CP860, CP861, CP862, CP863, CP864, CP865, CP866, CP866NAV,
CP868, CP869, CP870, CP871, CP874, CP875, CP880, CP891, CP901, CP902, CP903,
CP904, CP905, CP912, CP915, CP916, CP918, CP920, CP921, CP922, CP930, CP932,
CP933, CP935, CP936, CP937, CP939, CP949, CP950, CP1004, CP1008, CP1025,
CP1026, CP1046, CP1047, CP1070, CP1079, CP1081, CP1084, CP1089, CP1097,
CP1112, CP1122, CP1123, CP1124, CP1125, CP1129, CP1130, CP1132, CP1133,
CP1137, CP1140, CP1141, CP1142, CP1143, CP1144, CP1145, CP1146, CP1147,
CP1148, CP1149, CP1153, CP1154, CP1155, CP1156, CP1157, CP1158, CP1160,
CP1161, CP1162, CP1163, CP1164, CP1166, CP1167, CP1250, CP1251, CP1252,
CP1253, CP1254, CP1255, CP1256, CP1257, CP1258, CP1282, CP1361, CP1364,
CP1371, CP1388, CP1390, CP1399, CP4517, CP4899, CP4909, CP4971, CP5347,
CP9030, CP9066, CP9448, CP10007, CP12712, CP16804, CPIBM861, CSA7-1, CSA7-2,
CSASCII, CSA_T500-1983, CSA_T500, CSA_Z243.4-1985-1, CSA_Z243.4-1985-2,
CSA_Z243.419851, CSA_Z243.419852, CSDECMCS, CSEBCDICATDE, CSEBCDICATDEA,
CSEBCDICCAFR, CSEBCDICDKNO, CSEBCDICDKNOA, CSEBCDICES, CSEBCDICESA,
... ...
如果想要查询所支持的某一类字符类型,可以通过grep命令进行过滤。
三、应用举例
1. 文件装换
iconv -f ISO88592 -t UTF8 < input.txt > output.txt
ISO88592:原编码格式
UTF8:要转换的编码格式
2. 转换网页
curl -s http://www.dreamdu.com/ | iconv -f utf8 -t gbk
curl -s http://www.google.com.hk/ | iconv -f big5 -t gbk
============================================
参考文献:
http://linux.die.net/man/1/iconv
http://codingstandards.iteye.com/blog/807077
更多推荐
已为社区贡献3条内容
所有评论(0)