MySQL数据库查询中的重复记录过滤

    今天帮别人修改程序,需要分组去最大值。也就是所谓,某一字段有重复字,过滤掉有重复的数据。弄了半天,最后总结一下。

    首先搞点模拟数据出来

create table t2 (
  id int primary key,
  gid char,
  col1 int,
  col2 int
) engine=myisam;

insert into t2 values 
(1,'A',31,6),
(2,'B',25,83),
(3,'C',76,21),
(4,'D',63,56),
(5,'E',3,17),
(6,'A',29,97),
(7,'B',88,63),
(8,'C',16,22),
(9,'D',25,43),
(10,'E',45,28),
(11,'A',2,78),
(12,'B',30,79),
(13,'C',96,73),
(14,'D',37,40),
(15,'E',14,86),
(16,'A',32,67),
(17,'B',84,38),
(18,'C',27,9),
(19,'D',31,21),
(20,'E',80,63),
(21,'A',89,9),
(22,'B',15,22),
(23,'C',46,84),
(24,'D',54,79),
(25,'E',85,64),
(26,'A',87,13),
(27,'B',40,45),
(28,'C',34,90),
(29,'D',63,8),
(30,'E',66,40),
(31,'A',83,49),
(32,'B',4,90),
(33,'C',81,7),
(34,'D',11,12),
(35,'E',85,10),
(36,'A',39,75),
(37,'B',22,39),
(38,'C',76,67),
(39,'D',20,11),
(40,'E',81,36);

期望结果
1) N=1 取GID每组 COL2最大的记录
  +----+------+------+------+
  | id | gid | col1 | col2 |
  +----+------+------+------+
  | 6 | A | 29 | 97 |
  | 15 | E | 14 | 86 |
  | 24 | D | 54 | 79 |
  | 28 | C | 34 | 90 |
  | 32 | B | 4 | 90 |
  +----+------+------+------+
2) N=3 取GID每组 COL2最大的3条记录
  +----+------+------+------+
  | id | gid | col1 | col2 |
  +----+------+------+------+
  | 6 | A | 29 | 97 |
  | 11 | A | 2 | 78 |
  | 36 | A | 39 | 75 |
  | 32 | B | 4 | 90 |
  | 2 | B | 25 | 83 |
  | 12 | B | 30 | 79 |
  | 28 | C | 34 | 90 |
  | 23 | C | 46 | 84 |
  | 13 | C | 96 | 73 |
  | 24 | D | 54 | 79 |
  | 4 | D | 63 | 56 |
  | 9 | D | 25 | 43 |
  | 15 | E | 14 | 86 |
  | 25 | E | 85 | 64 |
  | 20 | E | 80 | 63 |
  +----+------+------+------+

    过程不阐述了,得到的最优算法是

SELECT gid,col2,1 INTO @i,@j,@mc FROM t2v LIMIT 1;
SELECT id,gid,col1,col2 FROM (
SELECT a.id,a.gid,a.col1,a.col2,
IF(@i<>gid,@mc:=1,@mc) AS a2,
IF(@i<>gid,@i:=gid,@i) AS b1,
IF(@i=gid AND col2>=@j,@mc:=@mc+1,@mc) AS a1

 FROM t2v a
 ORDER BY gid,col2 DESC) aa
 WHERE a2<=3

    如果还加上索引,这个语句的查询效果还是非常给力的,该语句支持单一字段数据重复,或者多字段重复,并且支持重复数据显示N条,例子语句为3条

    不过后来发现,在PHP的函数mysql_query中悲剧了,不支持一次执行多条语句,简单的很,数组搞定,如下

$query = ‘delete from ecs_goods_attr where attr_id=138 and goods_id=442;Insert into ecs_goods_attr (goods_attr_id,goods_id,attr_id,attr_value,attr_price)values(Null,442,138,”欧版 白色”,0);update ecs_goods set goods_number=10,shop_price=955 where goods_id=442;’

$query_e = explode(';','$query');
foreach ($query_e as $k =>$v)
 {
  mysql_query($query_e[$k]);
 }

最后在来一个变通了一下SQL语句,一句话搞定,不错,效率比上面的那个来稍微查一点。

SELECT a.id,a.gid,a.col1,a.col2 FROM t2v a
LEFT JOIN t2v b
ON a.gid=b.gid AND a.col2<=b.col2
GROUP BY a.id,a.gid,a.col1,a.col2
HAVING COUNT(b.id)<=3
ORDER BY a.gid,a.col2 desc

我们不Hack软件,我们只是优秀软件的搬运工。
麦氪搜(iMacSO.com) » MySQL数据库查询中的重复记录过滤