The floating-point types are float
and double
, which are conceptually概念 associated with the single-precision 32-bit and double-precision 64-bit format IEEE 754 values and operations as specified指定 in IEEE Standard for Binary Floating-Point Arithmetic, ANSI/IEEE Standard 754-1985 (IEEE, New York).html
浮点类型是float 和double,它们是概念性概念,与单精度32位和双精度64位格式的IEEE 754的值和运算相关,这些是在这个标准中制订的:IEEE标准二进制浮点运算 ANSI / IEEE标准 754-1985(IEEE,纽约)。
java
The IEEE 754 standard includes not only positive and negative numbers that consist of a sign and magnitude量级, but also positive and negative zeros, positive and negative infinities, and special Not-a-Number values (hereafter从此 abbreviated缩写 NaN). A NaN value is used to represent the result of certain某些 invalid operations such as dividing zero by zero. NaN constants of both float
and double
type are predefined as Float.NaN
and Double.NaN
.express
IEEE 754标准不只包括正数和负数,它们包括符号和量级,还包括正零和负零,正负无穷大和特殊非数字值(如下简称为NaN)。NaN值用于表示某些无效操做的结果,例如将零除零。float
和double
类型的NaN常数预约义为Float.NaN
和Double.NaN
。
编程
Every implementation of the Java programming language is required to support two standard sets of floating-point values, called the float value set and the double value set. In addition, an implementation of the Java programming language may support either or both of two extended-exponent扩展指数 floating-point value sets, called the float-extended-exponent value set and the double-extended-exponent value set. These extended-exponent value sets may, under certain circumstances, be used instead of the standard value sets to represent the values of expressions of type float
or double
(§5.1.13, §15.4).oracle
Java编程语言的每一个实现都须要支持两个标准的浮点值集合,称为float value set 和 double value set。此外,Java编程语言的实现能够支持称为float扩展指数值集合和double扩展指数值集合的两个扩展指数浮点值集合中的一个或二者。在某些状况下,这些扩展指数值集能够用来代替标准值集合来表示类型float
或double
(§5.1.13, §15.4)表达式的值。
app
The finite有限的 nonzero values of any floating-point value set can all be expressed in the form s · m · 2(e - N + 1), where s is +1 or -1, m is a positive integer less than 2N, and e is an integer between Emin = -(2K-1-2) and Emax = 2K-1-1, inclusive包含, and where N and K are parameters that depend on the value set. Some values can be represented in this form in more than one way; for example, supposing that a value v in a value set might be represented in this form using certain values for s, m, and e, then if it happened that m were even and e were less than 2K-1, one could halve m and increase e by 1 to produce a second representation for the same value v. A representation in this form is called normalized if m ≥ 2N-1; otherwise the representation is said to be denormalized. If a value in a value set cannot be represented in such a way that m ≥ 2N-1, then the value is said to be adenormalized value, because it has no normalized representation.less
任何浮点值集合中的【有限非零值】均可以用 s · m ·2 (e - N + 1)来表示,其中 s 是 +1 或 -1,m 是小于 2N 的正整数,E 是 [ Emin = -(2K-1-2) , Emax = 2K-1-1 ] 之间的整数,而且其中参数 N 和 K 是依赖于集合的值。一些值能够以多种方式以这种形式表示; 例如,假设值集合中的值v可使用s,m和e的某些值以此形式表示 ,则若是发生m为偶数且e小于2 K -1,则能够将一半米和增长e 1以产生相同的值的第二表示v。在这种形式的表示被称为归一化的,若是m ≥2 N -1 ; 不然表示被称为非规范化。若是在设定的值的值不能在这样的方式来表示中号 ≥2 Ñ -1,则该值被认为是一个非标准化的值,由于它没有归一化表示。
eclipse
The constraints on the parameters N and K (and on the derived parameters Emin and Emax) for the two required and two optional floating-point value sets are summarized in Table 4.1.编程语言
表4.1中总结了两个必需和两个可选浮点值集合的参数 N 和 K(以及派生参数Emin和Emax)的约束。
ide
S:符号位,E:指数位,M:尾数位 float:S1_E8_M23,指数位有8位,指数的取值范围为-2^7~2^7-1(即-128~127) float的取值范围为-2^128 ~ +2^127(10^38级别的数) double:S1_E11_M52,指数位有11位,指取的取值数范围为-2^10~2^10-1(即-1024~1023) double的取值范围为-2^1024 ~ +2^1023(10^308级别的数) PS:官方文档中好像说float指数的取值范围为-126~127,double指取的取值数范围为-1022~1023
float:S1_E8_M23,尾数位有23位,2^23 = 83886_08,一共7位,这意味着最多能有7位有效数字,但能保证的为6位,也即float的精度为6~7位有效数字; double:S1_E11_M52,尾数位有52位,2^52 = 45035_99627_37049_6,一共16位,同理,double的精度为15~16位有效数字。
计算过程:将该数字乘以2,取出整数部分做为二进制表示的第1位(大于等于1为1,小于1为0);而后再将小数部分乘以2,将获得的整数部分做为二进制表示的第2位......以此类推,直到小数部分为0。 特殊状况: 小数部分出现循环,没法中止,则用有限的二进制位没法准确表示一个小数,这也是在编程语言中表示小数会出现偏差的缘由
0.6 * 2 = 1.2 ——————- 1 0.2 * 2 = 0.4 ——————- 0 0.4 * 2 = 0.8 ——————- 0 0.8 * 2 = 1.6 ——————- 1 0.6 * 2 = 1.2 ——————- 1 0.2 * 2 = 0.4 ——————- 0 …………
计算过程:从左到右,v[i] * 2^( - i ), i 为从左到右的index,v[i]为该位的值,直接看例子,很直接的
1 * 2^-1 + 0 * 2^-2 + 0 * 2^-3 + 1 * 2^-4 + 1 * 2^-5 + …… = 1 * 0.5 + 0 + 0 + 1 * 1/16 + 1 * 1/32 + …… = 0.5 + 0.0625 + 0.03125 + …… =无限接近0.6
float f = 0.6f; double d1 = 0.6d; double d2 = f; System.out.println((d1 == d2) + " " + f + " " + d2);//false 0.6 0.6000000238418579 double d = 0.6; System.out.println((float) d + " " + d);//0.6 0.6
//注意,如下案例是刻意挑选出来的,并【不是全部】状况都会出现相似问题的 System.out.println(0.05+0.01); //0.060000000000000005 System.out.println(1.0-0.42); //0.5800000000000001 System.out.println(4.015*100); //401.49999999999994 System.out.println(123.3/100); //1.2329999999999999
double addValue = BigDecimal.valueOf(0.05).add(BigDecimal.valueOf(0.01)).doubleValue(); System.out.println("0.05+0.01=" + (0.05 + 0.01) + " " + addValue);//0.05+0.01=0.060000000000000005 0.06 double subtractValue = BigDecimal.valueOf(1.0).subtract(BigDecimal.valueOf(0.42)).doubleValue(); System.out.println("1.0-0.42=" + (1.0 - 0.42) + " " + subtractValue);//1.0-0.42=0.5800000000000001 0.58 double multiplyValue = BigDecimal.valueOf(4.015).multiply(BigDecimal.valueOf(100)).doubleValue(); System.out.println("4.015*100=" + (4.015 * 100) + " " + multiplyValue);//4.015*100=401.49999999999994 401.5 double divideValue = BigDecimal.valueOf(123.3).divide(BigDecimal.valueOf(100), 10, BigDecimal.ROUND_HALF_UP).doubleValue(); System.out.println("123.3/100=" + (123.3 / 100) + " " + divideValue);//123.3/100=1.2329999999999999 1.233
String pattern = "#,##0.00";//强制保留两位小数,整数部分每三位以逗号分隔,整数部分至少一位 DecimalFormat format = new DecimalFormat(pattern); format.setRoundingMode(RoundingMode.HALF_UP);//默认不是四舍五入,而是HALF_EVEN System.out.println(format.format(0.05 + 0.01)); //0.06 System.out.println(format.format(1.0 - 0.42)); //0.58 System.out.println(format.format(4.015 * 100)); // 401.50 System.out.println(format.format(123.3 / 100)); //1.23
double d = 0.06;//Java当中默认声明的小数是double类型的,其默认后缀"d"或"D"能够省略 float f = 0.06f;//若是要声明为float类型,需显示添加后缀"f"或"F" System.out.println((0.05 + 0.01) + " " + (0.05f + 0.01f));//0.060000000000000005 0.060000002 System.out.println((d == (0.05 + 0.01)) + " " + (f == (0.05f + 0.01f)));//false false System.out.println(d + " " + f + " " + (float) d + " " + (double) f);//0.06 0.06 0.06 0.05999999865889549 System.out.println((d == f) + " " + (d == (double) f) + " " + ((float) d == f));//false false true //虽然向下转型后能够保证相等,可是通常不会主动干丢失精度的事的!