常常要对数据库中的数据进行去重,有时还须要使用外部表填冲数据,本文档记录数据去重与外表填充数据。
date:2016/8/17
author:wangxl数据库
对user_info1表去重,并添加age项。code
user_info1: +----+----------+------+------+ | id | name | sex | age | +----+----------+------+------+ | 1 | xiaolong | 1 | NULL | | 2 | xiaoyun | 1 | NULL | | 3 | xiaoqin | 2 | NULL | | 4 | xiaolong | 1 | NULL | | 5 | xiaodong | 1 | NULL | | 6 | xiaokai | 1 | NULL | | 7 | xiaohong | 2 | NULL | | 8 | xiaolong | 1 | NULL | | 9 | xiaohong | 2 | NULL | | 10 | xiaofen | 2 | NULL | +----+----------+------+------+ user_info2: +----------+------+ | name | age | +----------+------+ | xiaolong | 26 | | xiaoyun | 28 | | xiaoqin | 27 | | xiaodong | 27 | | xiaokai | 27 | | xiaohong | 24 | | xiaofen | 22 | +----------+------+
(1) 找出有重复字段 select * from user_info1 where name in (select name from user_info1 group by name having count(name) > 1); (2) 找出要删除的记录,重复记录是根据单个字段(name)来判断,只留有id最小的记录 select * from user_info1 where name in (select name from user_info1 group by name having count(name) > 1) and id not in (select min(id) from user_info1 group by name having count(name) > 1); (3) 删除表中多余的重复记录 delete from user_info1 where name in (select name from user_info1 group by name having count(name) > 1) and id not in (select min(id) from user_info1 group by name having count(name) > 1); 报错:ERROR 1093 (HY000): You can't specify target table 'user_info1' for update in FROM clause
更换思路:找出每组中非最小id并删除,以下:ci
(4) 找出每组最小id select min(id) from user_info1 group by name (5) 找出每组非最小id select * from user_info1 where id not in (select min(id) from user_info1 group by name); (6) 删除每组中非最小id所在行 delete from user_info1 where id not in (select id from select min(id) from user_info1 group by name); ERROR 1093 (HY000): You can't specify target table 'user_info1' for update in FROM clause 更正: delete from user_info1 where id not in (select minid from (select min(id) as minid from user_info1 group by name) a);、 结果展现: +----+----------+------+------+ | id | name | sex | age | +----+----------+------+------+ | 1 | xiaolong | 1 | NULL | | 2 | xiaoyun | 1 | NULL | | 3 | xiaoqin | 2 | NULL | | 5 | xiaodong | 1 | NULL | | 6 | xiaokai | 1 | NULL | | 7 | xiaohong | 2 | NULL | | 10 | xiaofen | 2 | NULL | +----+----------+------+------+
对于没有primary key的话,怎么去重呢?文档
(7) 建立表test (8) insert into test select distinct(name),sex,age from user_info1 group by name; 暂时没想出一句话解决方案.
update user_info1 t set age=(select age from user_info2 where name=t.name); 结果以下: +----+----------+------+------+ | id | name | sex | age | +----+----------+------+------+ | 1 | xiaolong | 1 | 26 | | 2 | xiaoyun | 1 | 28 | | 3 | xiaoqin | 2 | 27 | | 5 | xiaodong | 1 | 27 | | 6 | xiaokai | 1 | 27 | | 7 | xiaohong | 2 | 24 | | 10 | xiaofen | 2 | 22 | +----+----------+------+------+