本文节选自《Netkiller Java 手札》git
中国广东省深圳市望海路半岛城邦三期 518067 +86 13113668890 <netkiller@msn.com>
github
$Id: book.xml 606 2013-05-29 09:52:58Z netkiller $编程
版权 © 2015-2018 Neo Chan编程语言
版权声明编辑器
转载请与做者联系,转载时请务必标明文章原始出处和做者信息及本声明。工具
http://netkiller.github.io编码 |
http://netkiller.sourceforge.netspa |
个人系列文档.net
编程语言
Netkiller Architect 手札 |
Netkiller Developer 手札 |
Netkiller Java 手札 |
Netkiller Spring 手札 |
Netkiller PHP 手札 |
Netkiller Python 手札 |
---|---|---|---|---|---|
Netkiller Testing 手札 |
Netkiller Cryptography 手札 |
Netkiller Perl 手札 |
Netkiller Docbook 手札 |
Netkiller Project 手札 |
Netkiller Database 手札 |
咱们运行下面一段程序,向文件 netkiller.bin 中写入一个整形数值 1 ,而后观察文件变化
String filename = "netkiller.bin"; DataOutputStream out = new DataOutputStream(new FileOutputStream(filename)); out.writeInt(1); out.close();
打开终端,使用 xxd 命令查看二进制文件
neo@MacBook-Pro ~/workspace/netkiller % xxd -b netkiller.bin 00000000: 00000000 00000000 00000000 00000001 ....
能够看到一串二进制 00000000 00000000 00000000 00000001,运行下面程序能够讲二进制转换为十进制,注意替换掉空格。
int n = Integer.valueOf("00000000 00000000 00000000 00000001".replaceAll(" ", ""), 2); System.out.println(n);
运行结果是 1 ,为什前面那么多 0 呢?请运行下面一段程序
String filename = "netkiller.bin"; DataOutputStream out = new DataOutputStream(new FileOutputStream(filename)); out.writeInt(Integer.MAX_VALUE); out.close();
如今观察结果
neo@MacBook-Pro ~/workspace/netkiller % xxd -b netkiller.bin 00000000: 01111111 11111111 11111111 11111111 ....
int n = Integer.valueOf("01111111 11111111 11111111 11111111".replaceAll(" ", ""), 2); System.out.println(n);
输出结果是 2147483647, 这是 int 得最大值,2147483647 + 1 会怎么样呢?
String filename = "netkiller.bin"; DataOutputStream out = new DataOutputStream(new FileOutputStream(filename)); out.writeInt(Integer.MAX_VALUE + 1); out.close(); System.out.println(Integer.MAX_VALUE + 1);
输出结果是 -2147483648,正确应该是 2147483648 这就是整形溢出。整形变量得二进制表示方法是4个字节长度32位 00000000 00000000 00000000 00000000 到 01111111 11111111 11111111 11111111 , 其中第一位0表示正数1表示负数。
neo@MacBook-Pro ~/workspace/netkiller % xxd -b netkiller.bin 00000000: 10000000 00000000 00000000 00000000 ....
整形溢出演示,超出整形范围怎么办? 使用 Long 型。
System.out.println(Integer.MAX_VALUE); System.out.println(Integer.MAX_VALUE + 1); System.out.println(Integer.MIN_VALUE); System.out.println(Integer.MIN_VALUE - 1); 输出结果以下: 2147483647 -2147483648 -2147483648 2147483647
负数演示
String filename = "netkiller.bin"; DataOutputStream out = new DataOutputStream(new FileOutputStream(filename)); out.writeInt(-1); out.writeInt(Integer.MAX_VALUE + 1); out.close();
-1 得结果是 11111111 11111111 11111111 11111111
neo@MacBook-Pro ~/workspace/netkiller % xxd -b netkiller.bin 00000000: 11111111 11111111 11111111 11111111 ....
如今咱们存储两个整形数值
String filename = "netkiller.bin"; DataOutputStream out = new DataOutputStream(new FileOutputStream(filename)); out.writeInt(1); out.writeInt(-1); out.close();
很清楚的看到里面有两个数值,1 和 -1
neo@MacBook-Pro ~/workspace/netkiller % xxd -c 4 -b netkiller.bin 00000000: 00000000 00000000 00000000 00000001 .... 00000004: 11111111 11111111 11111111 11111111 ....
读取二进制文件中的 int 数据
DataInputStream in = new DataInputStream(new BufferedInputStream(new FileInputStream(filename))); try { int i = in.readInt(); System.out.println(i); } catch (EOFException e) { e.printStackTrace(); }
String filename = "netkiller.bin"; DataOutputStream out = new DataOutputStream(new FileOutputStream(filename)); out.writeByte(1); out.close();
byte 只占用一个字节8位
neo@MacBook-Pro ~/workspace/netkiller % xxd -c 4 -b netkiller.bin 00000000: 00000001
若是写入 -1 结果是,由此得出 第一位 0 是正数,1 是负数,能够得出他的取值范围 -128 ~ 127。超出范围也会溢出。
neo@MacBook-Pro ~/workspace/netkiller % xxd -c 4 -b netkiller.bin 00000000: 11111111
经常写入最小值与最大值
String filename = "netkiller.bin"; DataOutputStream out = new DataOutputStream(new FileOutputStream(filename)); out.writeByte(Byte.MIN_VALUE); out.writeByte(Byte.MAX_VALUE); out.close();
运行结果
neo@MacBook-Pro ~/workspace/netkiller % xxd -c 1 -b netkiller.bin 00000000: 10000000 . 00000001: 01111111 .
写入一个字符
String filename = "netkiller.bin"; DataOutputStream out = new DataOutputStream(new FileOutputStream(filename)); out.writeBytes("a"); out.close();
写入结果
neo@MacBook-Pro ~/workspace/netkiller % xxd -c 1 -b netkiller.bin 00000000: 01100001 a
从 ASCII 表中查出 01100001 十进制 97 十六进制 61 对应字母 a
写入一段字符串
String filename = "netkiller.bin"; DataOutputStream out = new DataOutputStream(new FileOutputStream(filename)); out.writeBytes("http://www.netkiller.cn"); out.close();
运行结果
neo@MacBook-Pro ~/workspace/netkiller % xxd -c 8 -b netkiller.bin 00000000: 01101000 01110100 01110100 01110000 00111010 00101111 00101111 01110111 http://w 00000008: 01110111 01110111 00101110 01101110 01100101 01110100 01101011 01101001 ww.netki 00000010: 01101100 01101100 01100101 01110010 00101110 01100011 01101110 ller.cn
读取二进制文件中的 byte 字符串,readAllBytes() 能够一次读取全部 byte 到 byte[] 中。
DataInputStream in = new DataInputStream(new BufferedInputStream(new FileInputStream(filename))); try { System.out.println(new String(in.readAllBytes())); } catch (EOFException e) { e.printStackTrace(); }
readByte() 逐字节读取
DataInputStream in = new DataInputStream(new BufferedInputStream(new FileInputStream(filename))); try { char c = ' '; while (true) { try { c = (char) in.readByte(); System.out.print(c); } catch (EOFException e) { System.out.println(); break; } } } catch (Exception e) { e.printStackTrace(); }
如今咱们已经掌握了 byte 的操做方法,如今咱们来作一个例子,读取 int 数据,int 是由 4 个字节组成一组。因此咱们每次取 4个字节。
// 这个例子中,咱们写入三个数值到 netkiller.bin 文件,分别是 1024,-128,2147483647 String filename = "netkiller.bin"; DataOutputStream out = new DataOutputStream(new FileOutputStream(filename)); out.writeInt(1024); out.writeInt(-128); out.writeInt(Integer.MAX_VALUE); out.close();
二进制文件以下
neo@MacBook-Pro ~/workspace/netkiller % xxd -c 4 -b netkiller.bin 00000000: 00000000 00000000 00000100 00000000 .... 00000004: 11111111 11111111 11111111 10000000 .... 00000008: 01111111 11111111 11111111 11111111 ....
从二进制文件读出 int 数据。
String filename = "netkiller.bin"; FileInputStream stream = new FileInputStream(filename); byte[] buffer = new byte[4]; while (stream.read(buffer) != -1) { ByteBuffer byteBuffer = ByteBuffer.wrap(buffer); System.out.println(byteBuffer.getInt()); }
运行结果
1024 -128 2147483647
咱们想文件写入两个布尔类型,一个是 true, 另外一个是 false
String filename = "netkiller.bin"; DataOutputStream out = new DataOutputStream(new FileOutputStream(filename)); out.writeBoolean(true); out.writeBoolean(false); out.close();
运行结果能够看出 boolean 使用了一个字节,最后一位 1 表示true, 0 表示 false。因此对于二进制文件最小单位就是 byte 字节,虽然boolean型只须要一个 1 bit 位,可是存储的最小单位是字节,因此前面须要补7个零 0000000。
neo@MacBook-Pro ~/workspace/netkiller % xxd -c 1 -b netkiller.bin 00000000: 00000001 . 00000001: 00000000 .
使用 ls 命令能够看这个文件占用了 2B(两个字节)
neo@MacBook-Pro ~/workspace/netkiller % ll netkiller.bin -rw-r--r-- 1 neo staff 2B Oct 18 13:47 netkiller.bin
读取二进制文件中的 boolean 数据
DataInputStream in = new DataInputStream(new BufferedInputStream(new FileInputStream(filename))); try { boolean bool = in.readBoolean(); System.out.println(bool); } catch (EOFException e) { e.printStackTrace(); }
String filename = "netkiller.bin"; DataOutputStream out = new DataOutputStream(new FileOutputStream(filename)); out.writeLong(1); out.close();
有了上面 int 型数据的经验,下面一看你就会明白。long 型采用 8 个字节保存数据,是 int 的一倍。取值范围这里就很少说了,也会存在溢出现象。
neo@MacBook-Pro ~/workspace/netkiller % xxd -c 8 -b netkiller.bin 00000000: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000001 ........
取值范围
String filename = "netkiller.bin"; DataOutputStream out = new DataOutputStream(new FileOutputStream(filename)); out.writeLong(Long.MIN_VALUE); out.writeLong(Long.MAX_VALUE); out.close();
输出文件
neo@MacBook-Pro ~/workspace/netkiller % xxd -c 8 -b netkiller.bin 00000000: 10000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 ........ 00000008: 01111111 11111111 11111111 11111111 11111111 11111111 11111111 11111111 ........
读取二进制文件中的 long 数据
DataInputStream in = new DataInputStream(new BufferedInputStream(new FileInputStream(filename))); try { long l = in.readLong(); System.out.println(l); } catch (EOFException e) { e.printStackTrace(); }
有符号 signed char 类型的范围为 -128~127
无符号 unsigned char 的范围为0~ 255
char 与 byte 操做相似,咱们首先去 ASCII 表查找字符 A 对应 65,咱们将 65 写入二进制文件。而后读取该字符,输出结果是 A。
String filename = "netkiller.bin"; DataOutputStream out = new DataOutputStream(new FileOutputStream(filename)); out.writeChar(65); out.close(); DataInputStream in = new DataInputStream(new BufferedInputStream(new FileInputStream(filename))); try { char c = in.readChar(); System.out.println(c); } catch (EOFException e) { e.printStackTrace(); }
从二进制文件中咱们能够看到 char 类型占用2个字节16位
neo@MacBook-Pro ~/workspace/netkiller % xxd -c 2 -b netkiller.bin 00000000: 00000000 01000001 .A
使用 writeChars()写入字符串到二进制文件
String filename = "netkiller.bin"; DataOutputStream out = new DataOutputStream(new FileOutputStream(filename)); out.writeChars("http://www.netkiller.cn"); out.close(); DataInputStream in = new DataInputStream(new BufferedInputStream(new FileInputStream(filename))); char c = ' '; while (true) { try { c = in.readChar(); System.out.print(c); } catch (EOFException e) { System.out.println(); break; } }
二进制文件以下,你会发现第一个字节没有用到,不少 00000000 因此若是存储英文 byte 更适合,char 是双倍 byte 开销。
neo@MacBook-Pro ~/workspace/netkiller % xxd -c 8 -b netkiller.bin 00000000: 00000000 01101000 00000000 01110100 00000000 01110100 00000000 01110000 .h.t.t.p 00000008: 00000000 00111010 00000000 00101111 00000000 00101111 00000000 01110111 .:././.w 00000010: 00000000 01110111 00000000 01110111 00000000 00101110 00000000 01101110 .w.w...n 00000018: 00000000 01100101 00000000 01110100 00000000 01101011 00000000 01101001 .e.t.k.i 00000020: 00000000 01101100 00000000 01101100 00000000 01100101 00000000 01110010 .l.l.e.r 00000028: 00000000 00101110 00000000 01100011 00000000 01101110 ...c.n
存储汉字
String filename = "netkiller.bin"; DataOutputStream out = new DataOutputStream(new FileOutputStream(filename)); String s = "陈"; char name = s.charAt(s.length() - 1); out.writeChar(name); out.close(); DataInputStream in = new DataInputStream(new BufferedInputStream(new FileInputStream(filename))); char c = ' '; while (true) { try { c = in.readChar(); System.out.print(c); } catch (EOFException e) { System.out.println(); break; } }
二进制文件以下,使用两个字节表示一个汉字
neo@MacBook-Pro ~/workspace/netkiller % xxd -c 2 -b netkiller.bin 00000000: 10010110 01001000 .H
转成 Hex 十六进制,获得 96 48 两个数字。
neo@MacBook-Pro ~/workspace/netkiller % hexdump netkiller.bin 0000000 96 48 0000002
如今去搜索引擎搜索“汉字内码”,而后查询“陈”这个汉字,能够看到 Unicode编码16进制就是 96 48
尝试写入汉字字符串
String filename = "netkiller.bin"; DataOutputStream out = new DataOutputStream(new FileOutputStream(filename)); out.writeChars("陈景峰"); out.close(); DataInputStream in = new DataInputStream(new BufferedInputStream(new FileInputStream(filename))); try { char c = ' '; while (true) { try { c = in.readChar(); System.out.print(c); } catch (EOFException e) { System.out.println(); break; } } } catch (Exception e) { e.printStackTrace(); }
neo@MacBook-Pro ~/workspace/netkiller % xxd -b netkiller.bin 00000000: 10010110 01001000 01100110 01101111 01011100 11110000 .Hfo\.
此次咱们使用新的文件名 netkiller.txt
String filename = "netkiller.txt"; DataOutputStream out = new DataOutputStream(new FileOutputStream(filename)); out.writeUTF("峰"); out.close(); DataInputStream in = new DataInputStream(new BufferedInputStream(new FileInputStream(filename))); try { System.out.println(in.readUTF()); } catch (EOFException e) { e.printStackTrace(); }
查看二进制文件,一个汉字怎么这么多字节?
neo@MacBook-Pro ~/workspace/netkiller % xxd -b netkiller.txt 00000000: 00000000 00000011 11100101 10110011 10110000 .....
转成 16 禁止看看。
neo@MacBook-Pro ~/workspace/netkiller % hexdump netkiller.txt 0000000 00 03 e5 b3 b0 0000005
咱们在网上查询 “峰” 字的汉字内码,能够看到UTF-8 内码是 E5 B3 B0。这是由于UTF8使用三个字节存储汉字。 00000000 00000011 多是 UTF 标志位,具体我也不太清楚,总之不是 BOM 信息。
咱们如今写入一个字符串试试
out.writeUTF("陈景峰");
xxd -s 2 -c 3 表示跳过两个字节,三列显示
neo@MacBook-Pro ~/workspace/netkiller % xxd -s 2 -c 3 -b netkiller.txt 00000002: 11101001 10011001 10001000 ... 00000005: 11100110 10011001 10101111 ... 00000008: 11100101 10110011 10110000 ...
UTF字符是能够直接使用文本工具查看的。
neo@MacBook-Pro ~/workspace/netkiller % cat netkiller.txt 陈景峰
String filename = "netkiller.bin"; DataOutputStream out = new DataOutputStream(new FileOutputStream(filename)); out.writeShort(1); out.flush(); out.close();
输出结果,Short 使用两个字节16位表示。
neo@MacBook-Pro ~/workspace/netkiller % xxd -c 2 -b netkiller.bin 00000000: 00000000 00000001 ..
Short 分为有符号和无符号类型
String filename = "netkiller.bin"; DataOutputStream out = new DataOutputStream(new FileOutputStream(filename)); out.writeShort(1); out.writeShort(1); out.writeShort(-1); out.writeShort(-1); out.flush(); out.close(); DataInputStream in = new DataInputStream(new BufferedInputStream(new FileInputStream(filename))); try { System.out.println(in.readShort()); System.out.println(in.readUnsignedShort()); System.out.println(in.readShort()); System.out.println(in.readUnsignedShort()); } catch (EOFException e) { e.printStackTrace(); }
运行结果
1 1 -1 65535
有符号的取值范围
最小值:Short.MIN_VALUE=-32768 (-2的15此方) 最大值:Short.MAX_VALUE=32767 (2的15次方-1)
无符号的取值范围是 0 ~ 65535
String filename = "netkiller.bin"; DataOutputStream out = new DataOutputStream(new FileOutputStream(filename)); out.writeFloat(0); out.writeFloat(1.0f); out.writeFloat(1.1f); out.flush(); out.close(); DataInputStream in = new DataInputStream(new BufferedInputStream(new FileInputStream(filename))); float c = 0; while (true) { try { c = in.readFloat(); System.out.println(c); } catch (EOFException e) { System.out.println(); break; } }
float 使用 4 字节 32 为表示浮点类型,float 不一样于前面数据类型,没法直接读取浮点数,须要通过计算才能得出,有点复杂。
neo@MacBook-Pro ~/workspace/netkiller % xxd -c 4 -b netkiller.bin 00000000: 00000000 00000000 00000000 00000000 .... 00000004: 00111111 10000000 00000000 00000000 ?... 00000008: 00111111 10001100 11001100 11001101 ?...
浮点型示意图
/------------- 32 bit ----------------\ | 1 | 8 | 23 | |--------------------------------------| 31 30 22 0 ^ ^ ^ 符号位 指数位 尾数部分 32位 首先float二进制是从后向前读。与上面全部类型相反。 符号位(Sign) : 0表明正,1表明为负 指数位(Exponent):用于存储科学计数法中的指数数据,而且采用移位存储 尾数部分(Mantissa):尾数部分 将一个内存存储的float二进制格式转化为十进制的步骤: (1)将第22位到第0位的二进制数写出来,在最左边补一位“1”,获得二十四位有效数字。将小数点点在最左边那个“1”的右边。 (2)取出第29到第23位所表示的值n。当30位是“0”时将n各位求反。当30位是“1”时将n增1。 (3)将小数点左移n位(当30位是“0”时)或右移n位(当30位是“1”时),获得一个二进制表示的实数。 (4)将这个二进制实数化为十进制,并根据第31位是“0”仍是“1”加上正号或负号便可。 1.0f = 00111111 10000000 00000000 00000000 Sign 31 位是 0 表示正数 Exponent 23~30 位 0111111 1 Mantissa 0~22 位 0000000 00000000 00000000 获得 | 0 | 0111111 1 | 0000000 00000000 00000000 | 具体细节请参考 IEEE R32.24
String filename = "netkiller.bin"; DataOutputStream out = new DataOutputStream(new FileOutputStream(filename)); out.writeDouble(12.5d); out.flush(); out.close(); DataInputStream in = new DataInputStream(new BufferedInputStream(new FileInputStream(filename))); double d = 0d; while (true) { try { d = in.readDouble(); System.out.println(d); } catch (EOFException e) { System.out.println(); break; } }
二进制文件
neo@MacBook-Pro ~/workspace/netkiller % xxd -c 8 -b netkiller.bin 00000000: 01000000 00101001 00000000 00000000 00000000 00000000 00000000 00000000 @)......
/------------------------- 64 bit ------------------------------\ | 1 | 11 | 52 | |----------------------------------------------------------------| 63 62 51 0 ^ ^ ^ 符号位 指数位 尾数部分 64位 首先float二进制是从后向前读。与上面全部类型相反。 符号位(Sign) : 0表明正,1表明为负 指数位(Exponent):用于存储科学计数法中的指数数据,而且采用移位存储 尾数部分(Mantissa):尾数部分 详细参加考 IEEE R64.53
String filename = "netkiller.bin"; DataOutputStream out = new DataOutputStream(new FileOutputStream(filename)); out.writeInt(1024); out.writeShort(255); out.writeLong(100000000000L); out.writeFloat(3.14f); out.writeDouble(3.141592653579d); out.writeBoolean(true); out.writeChar(165); out.writeChars("陈景峰"); out.writeUTF("Netkiller Java 手札 - http://www.netkiller.cn"); out.writeChars("这是最后一行\r\n"); out.flush(); out.close(); DataInputStream in = new DataInputStream(new BufferedInputStream(new FileInputStream(filename))); System.out.println(in.readInt()); System.out.println(in.readUnsignedShort()); System.out.println(in.readLong()); System.out.println(in.readFloat()); System.out.println(in.readDouble()); System.out.println(in.readBoolean()); System.out.println(in.readChar()); int i = 0; String name = ""; while (i < 3) { try { char c = in.readChar(); name += c; } catch (EOFException e) { break; } i++; } System.out.println(name); System.out.println(in.readUTF()); System.out.println(in.readUTF());
须要注意的一点是 out.writeChars("陈景峰"); 写入char字符串,在读取的时候你须要知道字符串的长度。而后循环取出char数据。
二进制文件内容
neo@MacBook-Pro ~/workspace/netkiller % xxd -c 8 -b netkiller.bin 00000000: 00000000 00000000 00000100 00000000 00000000 11111111 00000000 00000000 ........ 00000008: 00000000 00010111 01001000 01110110 11101000 00000000 01000000 01001000 ..Hv..@H 00000010: 11110101 11000011 01000000 00001001 00100001 11111011 01010100 01000011 ..@.!.TC 00000018: 11001110 00101000 00000001 00000000 10100101 10010110 01001000 01100110 .(....Hf 00000020: 01101111 01011100 11110000 00000000 00101111 01001110 01100101 01110100 o\../Net 00000028: 01101011 01101001 01101100 01101100 01100101 01110010 00100000 01001010 killer J 00000030: 01100001 01110110 01100001 00100000 11100110 10001001 10001011 11100110 ava .... 00000038: 10011100 10101101 00100000 00101101 00100000 01101000 01110100 01110100 .. - htt 00000040: 01110000 00111010 00101111 00101111 01110111 01110111 01110111 00101110 p://www. 00000048: 01101110 01100101 01110100 01101011 01101001 01101100 01101100 01100101 netkille 00000050: 01110010 00101110 01100011 01101110 10001111 11011001 01100110 00101111 r.cn..f/ 00000058: 01100111 00000000 01010100 00001110 01001110 00000000 10001000 01001100 g.T.N..L 00000060: 00000000 00001101 00000000 00001010 ....
16 进制编辑器更好阅读一些
neo@MacBook-Pro ~/workspace/netkiller % hexdump -C netkiller.bin 00000000 00 00 04 00 00 ff 00 00 00 17 48 76 e8 00 40 48 |..........Hv..@H| 00000010 f5 c3 40 09 21 fb 54 43 ce 28 01 00 a5 96 48 66 |..@.!.TC.(....Hf| 00000020 6f 5c f0 00 2f 4e 65 74 6b 69 6c 6c 65 72 20 4a |o\../Netkiller J| 00000030 61 76 61 20 e6 89 8b e6 9c ad 20 2d 20 68 74 74 |ava ...... - htt| 00000040 70 3a 2f 2f 77 77 77 2e 6e 65 74 6b 69 6c 6c 65 |p://www.netkille| 00000050 72 2e 63 6e 8f d9 66 2f 67 00 54 0e 4e 00 88 4c |r.cn..f/g.T.N..L| 00000060 00 0d 00 0a |....| 00000064