Base64

时间 2020-07-26

标签 base64 base 繁體版

原文原文链接

1：Base64算法的由来java

　　Base64算法最先应用于解决电子邮件传输的问题，在早起，因为"历史问题"，电子邮件只容许ASCII码字符。如要传输一封带有非ASCII码字符的电子邮件，当它经过有“历史问题”的网关时就可能出现问题，这个网关可能会对这个非ASCII码字符的二进制作调整，即将,这个非ASCII码的8位二进制的最高位置为0，此时用户收到的邮件就会是一封存粹的乱码邮件，基于此产生了BASE64算法。web

2：BASE64算法的定义算法

　　Base64算法是一种基于64个字符的编码算法，根据RFC2045(http://www.ietf.org/rfc/rfc2045.txt)的定义：“Base64内容传送编码是一种以任意8位字节序列组成的描述形式，这种形式不宜被人直接识别”。通过BASE64编码后的数据会比元数据略长，为原来的4/3倍，经Base64编码后的字符串的字符数是4位单位的整数倍。apache

　　RFC2045还规定，在电子邮件中，每行为76个字符，每行末须要添加一个回车换行符（“\r\n”）。不管每行是否足够76个字符，都须要添加一个回车换行符，但在实际应用中，每每根据实际需求忽略了这一要求。less

　　RFC2045文件中给出以下字符映射表:ide

　　　　　　　　　　　　　　　　Base64字符映射表svn

在这张字符映射表中，value指的是十进制编码，Encoding指的是字符，工映射了64个字符，这也是Base64算法命名的由来，映射表的最后一个字符是等号，它用来部位，所以，一般咱们在看到一串字符串的末尾有个=号时就会联想到Base64算法。ui

　　Base64算法还有几个同胞兄弟，Base32和Base16算法，为了能在http请求中一Get方式传递二进制数据，有Base64算法衍生出Url Base64算法。this

　　Url Base64算法主要是替换了Base64算法字符映射表中的第62和63个字符，也就是将“+”和“/”符号替换成“-”和“_”。但对于补位符号“=”，一种建议是使用“~”，另外一种建议是使用“.”，其中因为“~”符号与文件系统冲突，不建议使用，而对于“.”符号，若是连续出现两次，则认为是错误的，关于补位符号的问题，commons Codec是彻底杜绝使用补位符号，二Bouncy Castle使用“.”做为补位符号。编码

　　3：Base64算法与加密算法的关系

　　Base64算法有编码和解码操做可充当加密和解密操做，还有一张字符映射表充当了秘钥，Base64算法是借鉴表单置换算法，将原文通过二进制转换后与字符映射表相对应，获得密文，Base64算法常常用作一个简单的“加密”来保护某些数据。

　　严格意义上来说，Base64不能算做是加密算法，由于充当秘钥的字符映射表公开，直接违背了柯克霍夫原则，而且Base64算法的加密强度不够高，不能将Bse64当作咱们所承认的如今加密算法。可是，转换个思路，咱们稍微对字符映射表修改为自定义私有的，那么是否是就能够做为数据加密的一种简单的方式呢？文章末尾咱们来演示。

　　4：Base64实现原理

　　Base64算法主要是将给定的字符与字符编码（如ASCII码，UTF-8码）对应的十进制数据做为基准，作编码操做：

　　　　1）将给定的字符串以字符为单位，转换为对应的字符编码（如ASCII码）。

　　　　2）将得到的字符编码转换成二进制码。

　　　　3）对得到的二进制码作分组转换操做，每3个8位二进制为一组，转换为每4个6位二进制码为一组（不足6位时低位补0）。这是一个分组变化的过程，3个8位二进制码和4个6位二进制码的长度都是24位。

　　　　4）对得到的4-6二进制码补位，像6位二进制添加2位高0，组成4个8位二进制。

　　　　5）将得到的4-8二进制转换为十进制码。

　　　　6）将得到的十进制码转来为Base64字符表中对应的字符。

　　4.1:ASCII码字符编码

　　　　咱们队字符串“A”进行Base64编码，以下所示

　　　　字符　　　　　　A

　　　　ASCII码　　　　65

　　　　二进制　　　　 01000001

　　　　4-6二进制　　 010000　　　　　　010000

　　　　4-8二进制　　　00010000　　　　 00010000

　　　　十进制　　　　 16　　　　　　　　16

　　　　字符表映射码　 Q 　　　　　　Q

　　由此，字符串“A”通过Base64编码后就获得了“QQ==”这样的一个字符串。

　　Base64的解码操做就是编码操做的逆运算，反推上述流程很容易就得到原文信息。

　　4.2：非ASCII码字符编码

　　Base64算法很好地解决了非ASCII码字符的传输问题，譬如中文字符的传输问题。

　　因为ASCII码表示范围有限，所以，咱们使用UTF-8码表来进行编码

　　　　字符　　　　密

　　　　UTF-8　　 -27　　　　　　-81　　　　　　-122

　　　　二进制　　 11100101　　　10101111　　 10000110

　　　　4-6二进制　 111001　　　　 011010　　　　111110　　　　000110　　

　　　　4-8二进制　 00111001　　　 0011010　　 00111110　　 00000110

　　　　十进制　　　67　　　　　　 26　　　　　　 62　　　　　　6　

　　　　字符映射码　5　　　　　　　a　　　　　　　+　　　　　　G

　　字符串“密”通过Base64编码后获得字符串“5a+G”。若是使用其余码表，那么结果就是另外一种形式

　　5.Commons Codec http://commons.apache.org/proper/commons-codec/

　　Apache Commons Codec (TM) software provides implementations of common encoders and decoders such as Base64, Hex, Phonetic and URLs.它遵照了RPC2045相关定义，实现了Base64算法，同时也支持了通常Base64算法的实现

package com.orange.encoder;
import org.apache.commons.codec.binary.Base64;
import java.io.UnsupportedEncodingException;


public class Base64Coder {

    //字符编码
    public  final  static String ENCODING="UTF-8";


    /**
     * Base64通常编码  不遵照RFC2045
     * @param data 待编码数据
     * @return   编码后数据
     * @throws UnsupportedEncodingException
     */
   public static String encode(String data) throws UnsupportedEncodingException {
       byte[] bytes = Base64.encodeBase64(data.getBytes(ENCODING));
       return  new String(bytes,ENCODING);
   }

    /**
     * Base64  遵照RFC2045
     * @param data 待编码数据
     * @return   编码后数据
     * @throws UnsupportedEncodingException
     */
    public static String encodeSafe(String data) throws UnsupportedEncodingException {
        byte[] bytes = Base64.encodeBase64(data.getBytes(ENCODING),true);
        return  new String(bytes,ENCODING);
    }

    /**
     * Base64 解码
     * @param data
     * @return
     * @throws UnsupportedEncodingException
     */
    public  static String decode(String data) throws UnsupportedEncodingException {
        byte[] bytes = Base64.decodeBase64(data.getBytes(ENCODING));
        return  new String(bytes,ENCODING);
    }

}

package com.orange;

import com.orange.encoder.Base64Coder;
import org.junit.Assert;
import org.junit.Test;

import java.io.UnsupportedEncodingException;

public class Base64CoderTest {


    @Test
    public void test() throws UnsupportedEncodingException {
        String str="hello world";
        System.out.println(String.format("原文前:%s",str));

        String encodeData = Base64Coder.encode(str);
        System.out.println(String.format("编码后:%s",encodeData));

        String decodeData = Base64Coder.decode(encodeData);
        System.out.println(String.format("解码后:%s",decodeData));

        Assert.assertEquals(decodeData,str);
    }
}

输出结果为：

原文前:hello world
编码后:aGVsbG8gd29ybGQ=
解码后:hello world

同时，Commons Codec支持更多的输入方式如流输入输出实现，更提供的Base64算法的定制实现，能够自定每行字符数和行末符号,更多详情请查阅Commons Codec文档。

　　在sum.misc包下是Sun公司提供内部使用的专门API，所以不建议使用此包下所提供开发的Base64算法实现。

　　结尾：附上Commons Codec Base64的源代码，咱们这样设想：假如我把编码的数据表格对应的位置改变一下，那是否是就能实现私有的Base64编码？

 /**
     * This array is a lookup table that translates 6-bit positive integer index values into their "Base64 Alphabet"
     * equivalents as specified in Table 1 of RFC 2045.
     *
     * Thanks to "commons" project in ws.apache.org for this code.
     * http://svn.apache.org/repos/asf/webservices/commons/trunk/modules/util/
     */
    private static final byte[] STANDARD_ENCODE_TABLE = {
            'A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M',
            'N', 'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'X', 'Y', 'Z',
            'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm',
            'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z',
            '0', '1', '2', '3', '4', '5', '6', '7', '8', '9', '+', '/'
    };

    /**
     * This is a copy of the STANDARD_ENCODE_TABLE above, but with + and /
     * changed to - and _ to make the encoded Base64 results more URL-SAFE.
     * This table is only used when the Base64's mode is set to URL-SAFE.
     */
    private static final byte[] URL_SAFE_ENCODE_TABLE = {
            'A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M',
            'N', 'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'X', 'Y', 'Z',
            'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm',
            'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z',
            '0', '1', '2', '3', '4', '5', '6', '7', '8', '9', '-', '_'
    };

    /**
     * This array is a lookup table that translates Unicode characters drawn from the "Base64 Alphabet" (as specified
     * in Table 1 of RFC 2045) into their 6-bit positive integer equivalents. Characters that are not in the Base64
     * alphabet but fall within the bounds of the array are translated to -1.
     *
     * Note: '+' and '-' both decode to 62. '/' and '_' both decode to 63. This means decoder seamlessly handles both
     * URL_SAFE and STANDARD base64. (The encoder, on the other hand, needs to know ahead of time what to emit).
     *
     * Thanks to "commons" project in ws.apache.org for this code.
     * http://svn.apache.org/repos/asf/webservices/commons/trunk/modules/util/
     */
    private static final byte[] DECODE_TABLE = {
            -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,
            -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,
            -1, -1, -1, -1, -1, -1, -1, -1, -1, 62, -1, 62, -1, 63, 52, 53, 54,
            55, 56, 57, 58, 59, 60, 61, -1, -1, -1, -1, -1, -1, -1, 0, 1, 2, 3, 4,
            5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23,
            24, 25, -1, -1, -1, -1, 63, -1, 26, 27, 28, 29, 30, 31, 32, 33, 34,
            35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51
    };