Java 读取文本文件


我想用Java 读取文本文件(txt)中的字符,可是对Java的文件操做不怎么熟悉,因而开始翻官方文档,解决了如何从文件中读取一行或者所有数据的问题。

File

File 表明目录或者文件
File类的实例是不可变的;也就是说,一旦建立,由File对象表示的抽象路径名将永远不会改变。下面是File 类的部分方法:java

public File(String pathname)
public File(String parent, String child)
public File(File parent, String child)
public File(URI uri)

public String getName() 
public String getParent()
public String getPath()
public URL toURL()
public boolean canRead()
public boolean canWrite()
public boolean exists()
public boolean isDirectory()
public boolean isFile()
public boolean isHidden()
public long lastModified()
public long length()
public boolean createNewFile()
public boolean delete()
public void deleteOnExit()
....

File 类自己并无提供用于输入输出的方法,它只是表明了计算机中的文件或目录。web

FileReader

FileReader 继承自InputStreamReader ,在类文件中只看到了新增的三个构造方法:
public FileReader(String fileName)
public FileReader(File file)
public FileReader(FileDescriptor fd)
文档说FileReader是用于读取字符文件,将文件以字符流的形式读出,可是仍然没有看到输入的方法,接下来看看它们的父类,看父类有没有咱们要的方法。数组

InputStreamReader

InputStreamReader 继承自抽象类 Reader ,下面是InputstreamReader的所有公共方法:promise

//全部的构造方法参数都是InputStream
  public InputStreamReader(InputStream in) 
  public InputStreamReader(InputStream in, String charsetName) //用指定字符集建立对象
  public InputStreamReader(InputStream in, Charset cs)
  public InputStreamReader(InputStream in, CharsetDecoder dec)
  
  public String getEncoding() //获取字符集
  public int read() //读取单个字符的字符集编码,若是流被读完,返回-1
  public int read(char cbuf[], int offset, int length) //读取部分字符到字符数组cbuf
  public boolean ready() //若是该流的输入缓冲区非空,返回true
  public void close()

终于看到read()方法了,如今我知道怎么从文本文件中读取字符了:缓存

public class Main {
    public static void main(String[] args) throws IOException {

        String fileName ="C:\\Users\\lin\\Desktop\\English.txt";
        FileReader fileReader = new FileReader(fileName);

        char[] chars = new char[10];

        fileReader.read(chars,0,9);//先读取十个字符试试

        for(char ch:chars){
            System.out.print(ch);
        }
    }
}

文件中的内容
这里写图片描述
运行结果:app

insult ��async

中文乱码,先无论它。
但不论是FileReader 仍是 InputStreamReader,都只有两个方法能够用于读出数据 :
public int read()public int read(char[] cbuf,int offset,int length)
显然这种简易的方法不能知足个人需求,而后我又找到了BufferedReaderide

BufferedReader(解决方法在这里)

下面是BufferedReader的文档(jdk1.8)以及来自英语渣不负责任的翻译:svg

Reads text from a character-input stream, buffering characters so as to provide for the efficient reading of characters, arrays, and lines.
从字符输入流读取文本,而且缓冲字符,以便提供对字符、数组和行的有效读取。
The buffer size may be specified, or the default size may be used. The default is large enough for most purposes.
能够指定缓冲字符的数量,若是没有指定的话会使用默认值,这个默认值对大多数的需求来讲是足够大的。
In general, each read request made of a Reader causes a corresponding read request to be made of the underlying character or byte stream.
一般来讲,每一个由Reader构建的读取请求会引发相应的字符或是字节流读取请求。
It is therefore advisable to wrap a BufferedReader around any Reader whose read() operations may be costly, such as FileReaders and InputStreamReaders. For example,
所以建议用BufferedReader 包装 可能耗费高昂代价的Reader的read() 方法,好比:
BufferedReader in = new BufferedReader(new FileReader("foo.in"));
will buffer the input from the specified file.
将会缓存指定的输入流.
Without buffering, each invocation of read() or readLine() could cause bytes to be read from the file, converted into characters, and then returned, which can be very inefficient.
若是没有缓存,每次调用read() 或者readLine() 方法都会从文件中读取字节,将字节转换为字符,而后再返回,这样是很低效的。
Programs that use DataInputStreams for textual input can be localized by replacing each DataInputStream with an appropriate BufferedReader.this

下面是BufferedReader 的所有公有方法:

public BufferedReader(Reader in, int sz)
public BufferedReader(Reader in)
public int read()
public int read(char cbuf[], int off, int len)
public String readLine()
public long skip(long n)
public boolean ready()
public boolean markSupported()
public void mark(int readAheadLimit)
public void reset()
public void close()
public Stream<String> lines()

文档说FileReaderread 方法是比较低效的,同时也给出了解决方案:用BufferedReader 包装FileReader, 因而我修改了个人代码:

public class Main {
    public static void main(String[] args) throws IOException {

        String fileName ="C:\\Users\\lin\\Desktop\\English.txt";
        FileReader fileReader = new FileReader(fileName);

        BufferedReader bufferedReader = new BufferedReader(fileReader);
        
        System.out.println(bufferedReader.readLine());

        bufferedReader.close();
        fileReader.close();
        
    }
}

运行结果:

insult ����

感受好多了,若是要读取文本文件中的所有数据,我是这样作的:

public class Main {
    public static void main(String[] args) throws IOException {

        String fileName ="C:\\Users\\lin\\Desktop\\English.txt";
        FileReader fileReader = new FileReader(fileName);

        BufferedReader bufferedReader = new BufferedReader(fileReader);

        String line =bufferedReader.readLine();

        while (line!=null){
            System.out.println(line);
            line = bufferedReader.readLine();
        }

        bufferedReader.close();
        fileReader.close();

    }
}

运行结果:

insult ����
harsh �����ġ��̶���
intimidate ����
compromise��Э
executionִ��
novel �����С˵
engage����������
revenue-generating ����-���� ����
sweat ����
ownership ����Ȩ
synchronized ͬ��
asynchronized �첽
employee ְ��
hint ���� ���� ��ʾ
indication ָʾ
denote ָ������������ʾ
portion ����
offset ƫ����

解决中文乱码

翻文档的时候看到InputStreamReader 里有一个public String getEncoding() 的方法,jdk1.8对其的描述以下:

Returns the name of the character encoding being used by this stream.
返回该流的字符编码名
If the encoding has an historical name then that name is returned; otherwise the encoding’s canonical(权威的,牧师的) name is returned.
若是这个字符编码有历史名就返回历史名,不然返回规范名。简而言之就是返回该流的字符编码名。
If this instance was created with the InputStreamReader(InputStream, String) constructor then the returned name, being unique for the encoding, may differ from the name passed to the constructor. This method will return null if the stream has been closed.
若是这个实例是由InputStreamReader(InputStream, String) 这个构造方法建立的,那么返回的独一无二的编码名可能和传过来的的形参不一样。若是该流被关闭,则返回null

须要注意的是这个方法返回的是文件流的字符编码,不是文件的编码。
而后我就用了这个方法,发现控制台输出的字符编码是UTF8

public class Main {
    public static void main(String[] args) throws IOException {

        String fileName ="C:\\Users\\lin\\Desktop\\English.txt";
        FileReader fileReader = new FileReader(fileName);

        BufferedReader bufferedReader = new BufferedReader(fileReader);
        
        System.out.println("字符集:"+fileReader.getEncoding()); 
        
        String line =bufferedReader.readLine();

        while (line!=null){
            System.out.println(line);
            line = bufferedReader.readLine();
        }

        bufferedReader.close();
        fileReader.close();

    }
}

我记得win10记事本的默认字符编码是ASCI,因而我把English.txt 改为UTF8 控制台就可以正常显示中文了。
这里写图片描述

FileInputStream

此前介绍的都是用于输出字符流的Java API。
FileInputStream 是字节输出流,将文件以字节流的形式读出
FileInputStream 继承自抽象类InputStream

A FileInputStream obtains input bytes from a file in a file system. What files are available depends on the host environment.
FileInputStream 从文件系统的文件中获取输入的二进制字节。文件是否可用取决于本地的计算机。
FileInputStream is meant for reading streams of raw bytes such as image data. For reading streams of characters, consider using FileReader.
FileInputStream 是为了读取诸如图像此类的原生的二进制字节而设计的。若是要读取字符流,考虑使用FileReader


读出对应着写入,每个InputStream或者Reader都对应着一个OutputStream或者Writer,后者和前者大致相同,再也不赘述。
另外,Java I/O 之因此设计得 看起来如此复杂 ,是由于使用了***装饰模式***,目的是在不破坏原有代码的状况下为功能的扩展提供比继承更好的灵活性,亦即 对修改关闭,对扩展开放