微信公众号搜"智元新知"关注
微信扫一扫可直接关注哦!

如何在PostgreSQL查询中排序不同的元组

我正在尝试在Postgres中提交一个只返回不同元组查询.在我的示例查询中,我不希望对于cluster_id / Feed_id组合多次存在条目的重复条目.如果我做一个简单的事:

select distinct on (cluster_info.cluster_id,Feed_id) 
   cluster_info.cluster_id,num_docs,Feed_id,url_time 
   from url_info 
   join cluster_info on (cluster_info.cluster_id = url_info.cluster_id) 
   where Feed_id in (select pot_seeder from potentials) 
   and num_docs > 5 and url_time > '2012-04-16';

我得到了,但我也想根据num_docs进行分组.所以,当我做以下事情时:

select distinct on (cluster_info.cluster_id,url_time 
   from url_info join cluster_info 
   on (cluster_info.cluster_id = url_info.cluster_id) 
   where Feed_id in (select pot_seeder from potentials) 
   and num_docs > 5 and url_time > '2012-04-16' 
   order by num_docs desc;

我收到以下错误

ERROR:  SELECT disTINCT ON expressions must match initial ORDER BY expressions
LINE 1: select distinct on (cluster_info.cluster_id,Feed_id) cluste...

我想我理解为什么我会收到错误(除非我以某种方式明确描述该组,否则不能通过元组进行分组)但是我该怎么做?或者,如果我对错误的解释不正确,有没有办法实现我的初始目标?

解决方法

最左边的ORDER BY项不能与disTINCT子句的项不一致.我引用 the manual about DISTINCT

The disTINCT ON expression(s) must match the leftmost ORDER BY
expression(s). The ORDER BY clause will normally contain additional
expression(s) that determine the desired precedence of rows within
each disTINCT ON group.

尝试:

SELECT *
FROM  (
    SELECT disTINCT ON (c.cluster_id,Feed_id) 
           c.cluster_id,url_time 
    FROM   url_info u
    JOIN   cluster_info c ON (c.cluster_id = u.cluster_id) 
    WHERE  Feed_id IN (SELECT pot_seeder FROM potentials) 
    AND    num_docs > 5
    AND    url_time > '2012-04-16'
    ORDER  BY c.cluster_id,url_time
           -- first columns match disTINCT
           -- the rest to pick certain values for dupes
           -- or did you want to pick random values for dupes?
    ) x
ORDER  BY num_docs DESC;

或者使用GROUP BY:

SELECT c.cluster_id,url_time 
FROM   url_info u
JOIN   cluster_info c ON (c.cluster_id = u.cluster_id) 
WHERE  Feed_id IN (SELECT pot_seeder FROM potentials) 
AND    num_docs > 5
AND    url_time > '2012-04-16'
GROUP  BY c.cluster_id,Feed_id 
ORDER  BY num_docs DESC;

如果c.cluster_id,Feed_id是所有(在本例中都是)表中包含SELECT列表中的列的主键列,那么这只适用于Postgresql 9.1或更高版本.

否则,您需要GROUP BY其余列或聚合或提供更多信息.

版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 [email protected] 举报,一经查实,本站将立刻删除。

相关推荐