#1, 字符集相关的参数名和概念 mysql
MySQL的字符集设置比较自由。能够设置不少种组合,相关的变量和参数有: sql
(root@localhost)[sample3]> show global variables like '%cha%'; +-------------------------------+----------------------------+ | Variable_name | Value | +-------------------------------+----------------------------+ | character_set_client | utf8 | | character_set_connection | utf8 | | character_set_database | utf8 | | character_set_filesystem | binary | | character_set_results | utf8 | | character_set_server | utf8 | | character_set_system | utf8 | | character_sets_dir | /opt/mysql/share/charsets/ | | innodb_change_buffer_max_size | 25 | | innodb_change_buffering | all | +-------------------------------+----------------------------+ 10 rows in set (0.00 sec) (root@localhost)[sample3]> show variables like '%cha%'; +-------------------------------+----------------------------+ | Variable_name | Value | +-------------------------------+----------------------------+ | character_set_client | latin1 | | character_set_connection | latin1 | | character_set_database | latin1 | | character_set_filesystem | binary | | character_set_results | latin1 | | character_set_server | latin1 | | character_set_system | utf8 | | character_sets_dir | /opt/mysql/share/charsets/ | | innodb_change_buffer_max_size | 25 | | innodb_change_buffering | all | +-------------------------------+----------------------------+ 10 rows in set (0.00 sec)
其中global variables表示全局变量。也就是默认状况下,新创建的数据库若是不显式的指定字符集相关参数,将使用这些参数。也叫作全局字符集变量数据库
其中variables没带global,表示当前session生效的的参数。所谓当前session,表示若是更改过相关的参数,离开这个session之后,就会恢复默认的参数。也叫作链接时字符集变量安全
各个变量的概念:服务器
1,character_set_client:客户端字符集,即数据在client端时字符集状态。session
2,character_set_connection:链接时转换字符集,即客户端和服务端链接时,字符集装换成的字符集ide
3,character_set_server:服务端处理时候使用的字符集编码
4,character_set_database:数据库层面存储默认使用的字符集spa
5,character_set_results:数据返回时所用的字符集code
指定字符集参数有多种方式,
1,在编译时指定,主要是:
-DDEFAULT_CHARSET=utf8 \
-DDEFAULT_COLLATION=utf8-general_ci \
其中CHARSET是指字符集,COLLATION是指相关的校对规则(也称排序规则)
2,参数文件,也就是my.cnf中设定
character_set_server=utf8
collation_server=utf8_general_ci
参数人间的设定将会覆盖编译时设定的字符集和校对规则。
3,启动MySQL服务的时候指定:
--character_set_server: 指定全局粒度的默认字符集
--collation_server:指定全局粒度的默认校对规则
启动时指定的参数将覆盖参数文件以及编译时指定的字符集和校对规则。
这些参数,若是从大方向分的话,能够分为两类:
1,链接时使用的字符集,即为show variable like '%character%' 显示的那些字符集
2,存储时使用的字符集,分为4个级别
1)SERVER,全局级别
2)DATABASE,数据库级别
3)TABLE,表级别
4)column,列级别
2# MySQL查询的基本过程和乱码的造成以及如何避免乱码
1)查询的基本过程
MySQL查询基本过程以下:
1, 程序将字符转换成二进制格式
2,MySQL 客户端发出查询(client端字符集)====>到达server端链接器(connection字符集)====>
内部转换并查询(列字符集、表字符集、数据库字符集、server端字符集,转换优先级逐级递减)=====>
查询结果返回给result(result字符集)
这里先要回答一个latin字符集为什么能存放汉字的问题(汉字每一个字符占用2个字节长度,latin不支持双字节长度)。
其实是由于OS/APP层已经将汉字转换为单字符串的形式。通常来讲,为了正常显示和处理汉字,
OS层面和程序层面也必定设定了字符集,这个字符集就会将汉字先一步处理成二进制。好比OS和程序层面设置的是UTF8,
那么实际上输入一个汉字,咱们在mysql client端其实是得到3个单字节,而非双字节,这样latin字符集就能够处理了。
这些字符集的转换,其实是字符长度的转换。好比若是CLIENT端时LATIN1,connection是UTF8, 那么就会发生3个字节长度转换成6个字节长度。每一个单字符后面都会被填0,这样变长了天然是没事的, 顶多显示的时候再转回来,把后面填的零都切掉。可是反过来,若是是client段是utf8,connection是latin1, 那么3个字节长度的utf8就会被转成1个字节长度的latin1,后两位丢失,之后是怎么都变不回来的。 这种丢失非零位的过程是不可逆的。因此咱们必须得保证设定字符集的时候 (列字符集、表字符集、数据库字符集、server段字符集)>= connection字符集 >= client字符集来避免字符编码丢失问题。 一种可能发生的情况: client(latin1)==>connection(utf8)==>服务端内部的存储时字符集(latin1), 并不会发生字符编码丢失,由于connection转换到存储时字符集时,只是切掉了client转connection时后面填的零,至关于转回来了。 3,实验: 1)client字符集为utf8,connection 为latin1,存储字符集为latin1 (root@localhost)[sample3]> show create table test2; +-------+---------------------------------------------------------------------------------------------------------------------------+ | Table | Create Table | +-------+---------------------------------------------------------------------------------------------------------------------------+ | test2 | CREATE TABLE `test2` ( `id` int(11) DEFAULT NULL, `name` char(20) DEFAULT NULL ) ENGINE=InnoDB DEFAULT CHARSET=latin1 | +-------+---------------------------------------------------------------------------------------------------------------------------+ 1 row in set (0.00 sec) (root@localhost)[sample3]> show variables like '%cha%'; +-------------------------------+----------------------------+ | Variable_name | Value | +-------------------------------+----------------------------+ | character_set_client | utf8 | | character_set_connection | latin1 | | character_set_database | latin1 | | character_set_filesystem | binary | | character_set_results | utf8 | | character_set_server | latin1 | | character_set_system | utf8 | | character_sets_dir | /opt/mysql/share/charsets/ | | innodb_change_buffer_max_size | 25 | | innodb_change_buffering | all | +-------------------------------+----------------------------+ 10 rows in set (0.00 sec) (root@localhost)[sample3]> insert into test2 values (1,'中国'); Query OK, 1 row affected, 1 warning (0.00 sec) (root@localhost)[sample3]> select * from test2; +------+------+ | id | name | +------+------+ | 1 | ?? | +------+------+ 1 row in set (0.00 sec) 这种状况,已经丢失了字符,转换result是没有用的。 (root@localhost)[sample3]> set character_set_results=latin1; Query OK, 0 rows affected (0.00 sec) (root@localhost)[sample3]> select * from test2; +------+------+ | id | name | +------+------+ | 1 | ?? | +------+------+ 1 row in set (0.00 sec) 2)client字符集为utf8,connection为utf8,存储字符集为latin1 (root@localhost)[sample3]> show variables like '%cha%'; +-------------------------------+----------------------------+ | Variable_name | Value | +-------------------------------+----------------------------+ | character_set_client | utf8 | | character_set_connection | utf8 | | character_set_database | latin1 | | character_set_filesystem | binary | | character_set_results | utf8 | | character_set_server | latin1 | | character_set_system | utf8 | | character_sets_dir | /opt/mysql/share/charsets/ | | innodb_change_buffer_max_size | 25 | | innodb_change_buffering | all | +-------------------------------+----------------------------+ 10 rows in set (0.00 sec) (root@localhost)[sample3]> (root@localhost)[sample3]> insert into test2 values (1,'中国'); ERROR 1366 (HY000): Incorrect string value: '\xE4\xB8\xAD\xE5\x9B\xBD' for column 'name' at row 1 (root@localhost)[sample3]> 这里直接报错,5.6之后增强了数据库数据的安全性,由于会丢失数据,因此不容许插入。 3)client字符集为latin1,connection为utf8,存储字符集为latin1 (root@localhost)[sample3]> show variables like '%cha%'; +-------------------------------+----------------------------+ | Variable_name | Value | +-------------------------------+----------------------------+ | character_set_client | latin1 | | character_set_connection | utf8 | | character_set_database | latin1 | | character_set_filesystem | binary | | character_set_results | utf8 | | character_set_server | latin1 | | character_set_system | utf8 | | character_sets_dir | /opt/mysql/share/charsets/ | | innodb_change_buffer_max_size | 25 | | innodb_change_buffering | all | +-------------------------------+----------------------------+ 10 rows in set (0.00 sec) (root@localhost)[sample3]> select * from test2; +------+---------------+ | id | name | +------+---------------+ | 1 | ä¸å›½ | +------+---------------+ 1 row in set (0.00 sec) 乱码了,这里为什么会乱码呢?按理说转换过程当中只会切掉填充的零才对。其实是result的问题。这个result是utf8, 而client存的时候就是3位,到connection转到6位,到存储时转回3位,这时候到result又转到6位,天然是乱码的。 只要result转回latin1,就能够了。 (root@localhost)[sample3]> set character_set_results=latin1; Query OK, 0 rows affected (0.00 sec) (root@localhost)[sample3]> select * from test2; +------+--------+ | id | name | +------+--------+ | 1 | 中国 | +------+--------+ 1 row in set (0.00 sec) 4)不丢失字符,可是字符集不一样的乱码. (root@localhost)[sample2]> show create table test; +-------+-------------------------------------------------------------------------------------------+ | Table | Create Table | +-------+-------------------------------------------------------------------------------------------+ | test | CREATE TABLE `test` ( `name` char(20) DEFAULT NULL ) ENGINE=InnoDB DEFAULT CHARSET=utf8 | +-------+-------------------------------------------------------------------------------------------+ 1 row in set (0.00 sec) (root@localhost)[sample2]> show variables like '%cha%'; +-------------------------------+----------------------------+ | Variable_name | Value | +-------------------------------+----------------------------+ | character_set_client | utf8 | | character_set_connection | utf8 | | character_set_database | utf8 | | character_set_filesystem | binary | | character_set_results | gbk | | character_set_server | utf8 | | character_set_system | utf8 | | character_sets_dir | /opt/mysql/share/charsets/ | | innodb_change_buffer_max_size | 25 | | innodb_change_buffering | all | +-------------------------------+----------------------------+ 10 rows in set (0.00 sec) (root@localhost)[sample2]> insert into test values('中国'); Query OK, 1 row affected (0.01 sec) (root@localhost)[sample2]> (root@localhost)[sample2]> select * from test; +------+ | name | +------+ | | +------+ 1 row in set (0.00 sec) (root@localhost)[sample2]> set character_set_results=utf8; Query OK, 0 rows affected (0.00 sec) (root@localhost)[sample2]> select * from test; +--------+ | name | +--------+ | 中国 | +--------+ 1 row in set (0.00 sec) (root@localhost)[sample2]> 修改客户端字符集的5中方法: 一、 运行,set names <字符集>; 二、 在SQL文件中指定set names <字符集>;,用source命令导入sql文件 如: mysql> source test.sql 三、 在SQL文件中指定set names <字符集>;,而后经过重定向符,或者-e参数来执行 [root@mysql01 3307]# vi test.sql set names utf8; select * from mysql.user; [root@mysql01 3307]# mysql -uroot -p<password> -S /data/3307/mysql.sock <test.sql [root@mysql01 3307]# mysql -uroot -p<password> -S /data/3307/mysql.sock -e "set names <字符集>; select * from <database>.<table>;" 四、 经过指定mysql命令的字符集参数实现 --default-character-set=<字符集> [root@mysql01 3307]# mysql -uroot -p<password> -S /data/3307/mysql.sock --default-chratacter-set=utf8 <database> <test.sql 五、 在配置文件里设置客户端即服务器端相关参数,此设置永久生效, [client] default-character-set=utf8 ###对于client参数,退出从新登陆,便可生效。 更改服务端的方法: [mysqld] default-character-set=utf8 ####5.1 character-set-server=utf8 ####5.5 [root@mysql01 3307]# mysql -uroot -p -S /data/3307/mysql.sock -e "show variables like '%chara%';" Warning: Using a password on the command line interface can be insecure. +--------------------------+----------------------------+ | Variable_name | Value | +--------------------------+----------------------------+ | character_set_client | utf8 | | character_set_connection | utf8 | | character_set_database | utf8 | | character_set_filesystem | binary | | character_set_results | utf8 | | character_set_server | utf8 | | character_set_system | utf8 | | character_sets_dir | /opt/mysql/share/charsets/ | +--------------------------+----------------------------+ [root@mysql01 3307]# 这些参数中,其中client,connection,results默认会跟随系统的字符集设置,/etc/systemconfig/i18n 迷思。。。 character-set-server=utf8 collation-server=utf8_unicode_ci skip-character-set-client-handshake 设置 [client] default-character-set=utf8 [mysqld] character-set-server=utf8 [mysql] default-character-set=utf8 不能锁定client端字符集 [client] default-character-set=utf8 [mysqld] character-set-server=utf8 skip-character-set-client-handshake ##加入忽略客户端设置,使用服务端设置 [mysql] default-character-set=utf8 这样设置之后,mysql客户端字符集锁定为utf8