遇到如题的这么一个场景:须要在MySQL的一张innodb引擎的表(tableA)上添加一个惟一索引(idx_col1_u)。可是表中已经有大量重复数据,对于每一个key(col1),有的重复2行,有的重复N行。php
此时,作数据的手工清理,或者SQL处理无疑是很是耗时的。html
印象中MySQL有一个独有的 alter ignore add unique index的语法。mysql
语法以下:sql
ALTER [ONLINE | OFFLINE] [IGNORE] TABLE tbl_name
行为相似于insert ignore,即遇到冲突的unique数据则直接抛弃而不报错。对于加惟一索引的状况来讲就是建一张空表,而后加上惟一索引,将老数据用insert ignore语法插入到新表中,遇到冲突则抛弃数据。ide
文档中对于alter ignore的注释:详见:http://dev.mysql.com/doc/refman/5.1/en/alter-table.htmlthis
IGNORE
is a MySQL extension to standard SQL. It controls howALTER TABLE
works if there are duplicates on unique keys in the new table or if warnings occur when strict mode is enabled. IfIGNORE
is not specified, the copy is aborted and rolled back if duplicate-key errors occur. IfIGNORE
is specified, only the first row is used of rows with duplicates on a unique key. The other conflicting rows are deleted. Incorrect values are truncated to the closest matching acceptable value.spa
然而在执行了 alter ignore table tableA add unique index idx_col1_u (col1) 后,仍是报了如下错误:code
#1062 - Duplicate entry '111' for key 'col1'.orm
不是会自动丢弃重复数据么?世界观被颠覆了。查了下资料原来是alter ignore的语法不支持innodb。server
得知alter ignore的实现彻底取决于存储引擎的内部实现,而不是server端强制的,具体描述以下:
For ALTER TABLE with the IGNORE keyword, IGNORE is now part of the information provided to the storage engine. It is up to the storage engine whether to use this when choosing between the in-place or copy algorithm for altering the table. For InnoDB index operations, IGNORE is not used if the index is unique, so the copy algorithm is used
详见:http://bugs.mysql.com/bug.php?id=40344
固然解决这个问题的tricky的方法仍是有的,也比较直白粗暴。具体以下:
ALTER TABLE tableA ENGINE MyISAM;
ALTER IGNORE TABLE tableA ADD UNIQUE INDEX idx_col1_u (col1)
ALTER TABLE table ENGINE InnoDB;
updated in 2013-09-26:
@jyzhou 分享提到,能够不用改为MyISAM,而直接使用set old_alter_table = 1; 的方法。具体作法以下:
set old_alter_table = 1;
ALTER IGNORE TABLE tableA ADD UNIQUE INDEX idx_col1_u (col1)
具体原理:http://dev.mysql.com/doc/refman/5.1/en/server-system-variables.html#sysvar_old_alter_table