不修改Mysql 服务器字符集(character_set_server=utf8mb4)的前提下,使用Jade插入Emoji字符.html
Mysql服务器字符集设置:java
mysql> show variables like 'character%'; +--------------------------+---------------------------------------+ | Variable_name | Value | +--------------------------+---------------------------------------+ | character_set_client | utf8 | | character_set_connection | utf8 | | character_set_database | latin1 | | character_set_filesystem | binary | | character_set_results | utf8 | | character_set_server | latin1 | | character_set_system | utf8 | | character_sets_dir | /opt/mysql/server-5.6/share/charsets/ | +--------------------------+---------------------------------------+ mysql> show create table t\G *************************** 1. row *************************** Table: t Create Table: CREATE TABLE `t` ( `data` varchar(10) CHARACTER SET utf8mb4 DEFAULT NULL ) ENGINE=InnoDB DEFAULT CHARSET=latin1
DAO:
mysql
@DAO(catalog="temp") public interface Utf8mb4TestDAO { @SQL("insert into t select :1") public void insertEmoji(String data); @SQL("set names utf8mb4") public void setNamesUtf8mb4(); }
单元测试:sql
Config:服务器
String options = String.format("connectTimeout=%s&generateSimpleParameterMetadata=true&useUnicode=true&characterEncoding=UTF-8", timeout); //String options = String.format("connectTimeout=%s&generateSimpleParameterMetadata=true&characterEncoding=UTF-8", timeout); //这个也行
总结:单元测试
须要在jdbc url中指定characterEncoding为UTF-8,同时在插入emoji字符前须要先执行:set names utf8mb4测试
其余失败状况补充:ui
若没有执行setNamesUtf8mb4,仅执行insert的话,会报错:编码
Caused by: java.sql.SQLException: Incorrect string value: '\xF0\x9F\x98\x81' for column 'data' at row 1
2. 必需要在jdbc url参数中指定characterEncoding=UTF-8,不然虽然能够成功插入到表中,可是是乱码。url
3. 将set names与insert放在一条SQL中不支持,以下所示:
@SQL("set names utf8mb4; insert into t select :1") public void insertEmojiWithSetNamesUtf8mb4(String data);
会报错:
Caused by: com.mysql.jdbc.exceptions.jdbc4.MySQLSyntaxErrorException: You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'insert into t select '????'' at line 1
解决方法:
在jdbc url中添加一个参数:allowMultiQueries=true,以下所示:
String options = String.format("connectTimeout=%s&generateSimpleParameterMetadata=true&characterEncoding=UTF-8&allowMultiQueries=true", timeout);
官方文档:
allowMultiQueries Allow the use of ';' to delimit multiple queries during one statement (true/false), defaults to 'false', and does not affect the addBatch() and executeBatch() methods, which instead rely on rewriteBatchStatements. Default: false Since version: 3.1.1
其余补充:
不能在jdbc url中指定characterEncoding=utf8mb4,由于
Connector/J did not support utf8mb4 for servers 5.5.2 and newer.
(摘自:http://dev.mysql.com/doc/relnotes/connector-j/en/news-5-1-13.html)
不然会报错:
2015-05-01 16:38:00 ERROR com.alibaba.druid.pool.DruidDataSource create connection error java.sql.SQLException: Unsupported character encoding 'utf8mb4'.
另外Jdbc目前支持的字符集有:
Table 5.3 MySQL to Java Encoding Name Translations
摘自:http://dev.mysql.com/doc/connector-j/en/connector-j-reference-charsets.html
2. 若Mysql服务器字符集配置为:character_set_server=utf8mb4,则创建链接时会自动执行set names utf8mb4。以下所示:
Connector/J now auto-detects servers configured with character_set_server=utf8mb4 or treats the Java encoding utf-8 passed using characterEncoding=... as utf8mb4 in the SET NAMES= calls it makes when establishing the connection. (Bug #54175)
(摘自:http://dev.mysql.com/doc/relnotes/connector-j/en/news-5-1-13.html)
以下所示:
方案二:
直接保存Emoji字符二进制内容,即表字段类型为blob,以下所示:
CREATE TABLE `t2` ( `data` blob ) ENGINE=InnoDB DEFAULT CHARSET=latin1 mysql> show variables like 'character%'; +--------------------------+---------------------------------------+ | Variable_name | Value | +--------------------------+---------------------------------------+ | character_set_client | utf8 | | character_set_connection | utf8 | | character_set_database | latin1 | | character_set_filesystem | binary | | character_set_results | utf8 | | character_set_server | latin1 | | character_set_system | utf8 | | character_sets_dir | /opt/mysql/server-5.6/share/charsets/ | +--------------------------+---------------------------------------+
jdbc url:
jdbc:mysql://localhost:3306/temp?connectTimeout=1000&generateSimpleParameterMetadata=true
DAO:
@SQL("insert into t2 select :1") public void insertEmojiAsBlob(byte[] data); @SQL("select data from t2 limit 1") public byte[] getEmojiFromT2();
单元测试:
问题一:
为何插入emoji字符前前都显式执行了set names utf8mb4,却仍须要在jdbc url中显式指定characterEncoding=UTF-8?
由于若不显式指定characterEncoding为UTF-8的话,默认的字符集为cp1252(由于character_set_server=latin1),这时会经过SingleByteCharsetConverter来对emoji字符编码,原本一个emoji字符是四个字节却被编码成两个字节,因而最终的效果至关于在命令行中执行以下的命令:
mysql> insert into t select '??';
这样的话即便执行了set names utf8mb4也无济于事。相关程序代码以下所示:
若显式指定了characterEncoding=UTF-8,
注意:此时不是乱码,而是正常的四个字节: f0, 9f, 98, 81