S（1）文本文件的编码方式识别

时间 2019-11-13

原文原文链接

java识别文本文件的编码方式应用很广，而且进行检测识别的开源项目也不少，这里说的是开源juniversalchardet,juniversalchardet是用的mozilla的universalchardet库。 java

简单的测试用例(案例来自于juniversalchardet官方网站，juniversalchardet jar文件下载官方网站http://code.google.com/p/juniversalchardet/）测试

import org.mozilla.universalchardet.UniversalDetector;

public class TestDetector {

public static void main(String[] args) throws java.io.IOException {

    byte[] buf = new byte[4096];

    String fileName = args[0];

    java.io.FileInputStream fis = new java.io.FileInputStream(fileName);

    UniversalDetector detector = new UniversalDetector(null);

    int nread;

    while ((nread = fis.read(buf)) > 0 && !detector.isDone()) {

        detector.handleData(buf, 0, nread);

    detector.dataEnd();

    String encoding = detector.getDetectedCharset();

    if (encoding != null) {

         System.out.println("Detected encoding = " + encoding);

     } else {

         System.out.println("No encoding detected.");

（仅给你们推荐一个识别文件编码方式的方法）

网站