接着上篇文章继续讲字符集的故事。这一篇文章主要讲MYSQL的各个字符集设置,关于基础理论部分,参考于这里。html
1. MYSQL的系统变量mysql
– character_set_server:默认的内部操做字符集 linux
– character_set_client:客户端来源数据使用的字符集 sql
– character_set_connection:链接层字符集 数据库
– character_set_results:查询结果字符集 ide
– character_set_database:当前选中数据库的默认字符集 this
– character_set_system:系统元数据(字段名等)字符集 编码
简单来讲,对于使用MYSQL C API的咱们来讲,主要关心的是3个字符集,即character_set_client, character_set_connection和character_set_results。可是从个人使用的角度上来讲,总以为character_set_connection有点多余。 spa
2. MySQL中的字符集转换过程 orm
这一节彻底盗版的http://www.laruence.com/2008/01/05/12.html。为了阅读起来方便,再贴一遍。
1) MySQL Server收到请求时将请求数据从character_set_client转换为character_set_connection;
2) 进行内部操做前将请求数据从character_set_connection转换为内部操做字符集,其肯定方法以下:
• 使用每一个数据字段的CHARACTER SET设定值;
• 若上述值不存在,则使用对应数据表的DEFAULT CHARACTER SET设定值(MySQL扩展,非SQL标准);
• 若上述值不存在,则使用对应数据库的DEFAULT CHARACTER SET设定值;
• 若上述值不存在,则使用character_set_server设定值。
3) 将操做结果从内部操做字符集转换为character_set_results。
上面从character_set_connection转换到内部操做字符集的过程看起来比较复杂,可是若是咱们在MYSQL建表的时候指定了数据表的字符集,就能够简单认为这个“内部操做字符集”就是对应表的字符集。因此说,我比较推荐在建表的时候带上这句话“DEFAULT CHARSET=xxx”,其中的xxx能够经过”select character_set_name from information_schema.CHARACTER_SETS”来获取。建议是”UTF8”。
3. MySQL中的字符集转换实验
我这里的环境是这样的。
CREATE TABLE `tbl_test` (
`id` int ,
name varchar(20000),
uptime date,
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8
实验一:正确地处理中文的过程
这个实验的大体过程是,
须要注意的点是,我首先将二进制中的硬编码(utf8格式)的char*串转换成wchar_t*串,而后调整中文。在出去以前再将wchar_t*串调整为gbk的char*串。通过试验,下面的代码运行正常。
#include <vector> #include <string> #include <tr1/memory> #include <sstream> #include "common/dbcomm/DbComm.h" using namespace std; COMMON::DbLocation dbLocation1; void InsertBySqlStatmentTest1(); int main() { dbLocation1.SetDbId("TEST_DB1"); dbLocation1.SetIp("127.0.0.1"); dbLocation1.SetPort("3306"); dbLocation1.SetUser("cup_dba"); dbLocation1.SetPassword("123456"); InsertBySqlStatmentTest1(); return 0; } void InsertBySqlStatmentTest1() { try { vector<COMMON::DbLocation> dbLocations_array; dbLocations_array.push_back(dbLocation1); dbLocations_array.push_back(dbLocation2); tr1::shared_ptr<COMMON::IDbTasks> mysqlTasks( new COMMON::MysqlDbTasks(dbLocations_array, true) ); mysqlTasks->Connect(); cout << "Connect success" << endl; { COMMON::DbExecuteAction* char_action = mysqlTasks->Execute(); COMMON::ExecuteFilter char_filter("set names utf8"); char_action->Do(&char_filter, &dbLocation1); // change the character_set_client to gbk COMMON::ExecuteFilter char_filter2("SET character_set_client = gbk"); char_action->Do(&char_filter2, &dbLocation1); char_action->EndAction(); } COMMON::DbExecuteAction* insert_action = mysqlTasks->Insert(5000); stringstream ss; ss << "INSERT INTO tbl_test(id, name, uptime) VALUES" << "(" << 100 << "," << "'你好'," << "'20130101')"; string statement = ss.str(); // use mbstowcs to change the sql statement to wide-char-string // we use the default value of fexec-charset, which is utf-8, to compile this file with gcc. setlocale(LC_ALL, "zh_CN.utf8"); size_t wcs_size = mbstowcs(NULL, statement.c_str(), 0); wchar_t* dest = new wchar_t[wcs_size + 1]; wmemset(dest, L'\0', wcs_size + 1); mbstowcs(dest, statement.c_str(), statement.size() * sizeof(char)); // change the last '好' to '饕' wchar_t *tmp = wcsrchr(dest, L'好'); *tmp = L'饕'; // change the sql statement to the charset that corresponds to the character_set_client of mysql setlocale(LC_ALL, "zh_CN.gbk"); size_t mbs_size = wcstombs(NULL, dest, 0); char* buf_mbs = new char [mbs_size + 1]; memset(buf_mbs, '\0', mbs_size + 1); wcstombs(buf_mbs, dest, wcs_size * sizeof(wchar_t)); // try to insert into mysql COMMON::InsertFilter insertFilter(buf_mbs); insert_action->Do(&insertFilter); insert_action->EndAction(); cout << "EndAction success" << endl; mysqlTasks->Disconnect(); cout << "Disconnect success" << endl; } catch (COMMON::ThrowableException& e) { cout << e.What() << endl; } catch (...) { cout << "unknown exception" << std::endl; } }
实验二:错误地处理中文的过程
如今来作一些修改,咱们先把状况变得简单一些,咱们不恶意地去set character_set_client=gbk,而是只运行set names utf8。而后在拿到拼凑好的sql语句的时候,利用string::find方法找到‘你’,而后直接利用结果的数字下标来修改为‘饕’。具体的代码以下
#include <vector> #include <string> #include <tr1/memory> #include <sstream> #include "common/dbcomm/DbComm.h" using namespace std; COMMON::DbLocation dbLocation1; void InsertBySqlStatmentTest1(); int main() { dbLocation1.SetDbId("TEST_DB1"); dbLocation1.SetIp("127.0.0.1"); dbLocation1.SetPort("3306"); dbLocation1.SetUser("cup_dba"); dbLocation1.SetPassword("123456"); InsertBySqlStatmentTest1(); return 0; } void InsertBySqlStatmentTest1() { try { vector<COMMON::DbLocation> dbLocations_array; dbLocations_array.push_back(dbLocation1); tr1::shared_ptr<COMMON::IDbTasks> mysqlTasks( new COMMON::MysqlDbTasks(dbLocations_array, true) ); mysqlTasks->Connect(); cout << "Connect success" << endl; { // ************这里再也不恶做剧地修改character_set_client为gbk************** COMMON::DbExecuteAction* char_action = mysqlTasks->Execute(); COMMON::ExecuteFilter char_filter("set names utf8"); char_action->Do(&char_filter, &dbLocation1); char_action->EndAction(); } COMMON::DbExecuteAction* insert_action = mysqlTasks->Insert(5000); stringstream ss; ss << "INSERT INTO tbl_test(id, name, uptime) VALUES" << "(" << 100 << "," << "'你好'," << "'20130101')"; // ************直接修改string************** string statement = ss.str(); size_t pos = statement.find('你'); statement[pos] = '饕'; // try to insert into mysql COMMON::InsertFilter insertFilter(statement); insert_action->Do(&insertFilter); insert_action->EndAction(); cout << "EndAction success" << endl; mysqlTasks->Disconnect(); cout << "Disconnect success" << endl; } catch (COMMON::ThrowableException& e) { cout << e.What() << endl; } catch (...) { cout << "unknown exception" << std::endl; } }
结果是,
为了追寻错误的缘由,让咱们从十六进制的角度来看。
能够看到,
size_t pos = statement.find('你'); statement[pos] = '饕';
实质只改动了一个字节(utf8编码,从‘你’的E4BDA0到‘何’的E4BC95,咱们的改动,就是那个95,他是‘饕’的一个字节。)这个现象也符合咱们对于string行为的认识。
4. 总结和建议