可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试):
问题:
I am using Rails 4.2 with PostgreSQL. I have a Product
model and a Purchase
model with Product
has many
Purchases
. I want to find the distinct recently purchased products. Initially I tried:
Product.joins(:purchases)
.select("DISTINCT products.*, purchases.updated_at") #postgresql requires order column in select
.order("purchases.updated_at DESC")
This however results in duplicates because it tries to find all tuples where the pair (product.id
and purchases.updated_at
) has a unique value. However I just want to select the products with distinct id
after the join. If a product id appears multiple times in the join, only select the first one. So I also tried:
Product.joins(:purchases)
.select("DISTINCT ON (product.id) purchases.updated_at, products.*")
.order("product.id, purchases.updated_at") #postgres requires that DISTINCT ON must match the leftmost order by clause
This doesn't work because I need to specify product.id
in the order
clause because of this constraint which outputs unexpected order.
What is the rails way to achieve this?
回答1:
Use a subquery and add a different ORDER BY
clause in the outer SELECT
:
SELECT *
FROM (
SELECT DISTINCT ON (pr.id)
pu.updated_at, pr.*
FROM Product pr
JOIN Purchases pu ON pu.product_id = pr.id -- guessing
ORDER BY pr.id, pu.updated_at DESC NULLS LAST
) sub
ORDER BY updated_at DESC NULLS LAST;
Details for DISTINCT ON
:
- Select first row in each GROUP BY group?
Or some other query technique:
- Optimize GROUP BY query to retrieve latest record per user
But if all you need from Purchases
is updated_at
, you can get this cheaper with a simple aggregate in a subquery before you join:
SELECT *
FROM Product pr
JOIN (
SELECT product_id, max(updated_at) AS updated_at
FROM Purchases
GROUP BY 1
) pu ON pu.product_id = pr.id -- guessing
ORDER BY pu.updated_at DESC NULLS LAST;
About NULLS LAST
:
- PostgreSQL sort by datetime asc, null first?
Or even simpler, but not as fast while retrieving all rows:
SELECT pr.*, max(updated_at) AS updated_at
FROM Product pr
JOIN Purchases pu ON pu.product_id = pr.id
GROUP BY pr.id -- must be primary key
ORDER BY 2 DESC NULLS LAST;
Product.id
needs to be defined as primary key for this to work. Details:
- PostgreSQL - GROUP BY clause
- Return a grouped list with occurrences using Rails and PostgreSQL
If you fetch only a small selection (with a WHERE
clause restricting to just one or a few pr.id
for instance), this will be faster.
回答2:
So building on @ErwinBrandstetter answer, I finally found the right way of doing this. The query to find distinct recent purchases is
SELECT *
FROM (
SELECT DISTINCT ON (pr.id)
pu.updated_at, pr.*
FROM Product pr
JOIN Purchases pu ON pu.product_id = pr.id
) sub
ORDER BY updated_at DESC NULLS LAST;
The order_by
isn't needed inside the subquery, since we are anyway ordering in the outer query.
The rails way of doing this is -
inner_query = Product.joins(:purchases)
.select("DISTINCT ON (products.id) products.*, purchases.updated_at as date") #This selects all the unique purchased products.
result = Product.from("(#{inner_query.to_sql}) as unique_purchases")
.select("unique_purchases.*").order("unique_purchases.date DESC")
The second (and better) way to do this as suggested by @ErwinBrandstetter is
SELECT *
FROM Product pr
JOIN (
SELECT product_id, max(updated_at) AS updated_at
FROM Purchases
GROUP BY 1
) pu ON pu.product_id = pr.id
ORDER BY pu.updated_at DESC NULLS LAST;
which can written in rails as
join_query = Purchase.select("product_id, max(updated_at) as date")
.group(1) #This selects most recent date for all purchased products
result = Product.joins("INNER JOIN (#{join_query.to_sql}) as unique_purchases ON products.id = unique_purchases.product_id")
.order("unique_purchases.date")
回答3:
To build on erwin-brandstetter's answer, this is how you could do this with ActiveRecord (should be close at least):
Product
.select('*')
.joins('INNER JOIN (SELECT product_id, max(updated_at) AS updated_at FROM Purchases GROUP BY 1) pu ON pu.product_id = pr.id')
.order('pu.updated_at DESC NULLS LAST')
回答4:
I ended up with this -
Product.joins(:purchases)
.select("DISTINCT ON (products.id) products.*, purchases.updated_at as date")
.sort_by(&:date)
.reverse
Still looking for a better way to do this.
回答5:
Try to do this:
Product.joins(:purchases)
.select("DISTINCT ON (products_id) purchases.product_id, purchases.updated_at, products.*")
.order("product_id, purchases.updated_at") #postgres requires that DISTINCT ON must match the leftmost order by clause