Joining two tables with aggregates

2019-08-05 10:14发布

问题:

I've got two tables described below:

CREATE TABLE categories
(
  id integer NOT NULL,
  category integer NOT NULL,
  name text,
  CONSTRAINT kjhfskfew PRIMARY KEY (id)
)
WITH (
  OIDS=FALSE
);

CREATE TABLE products_
(
  id integer NOT NULL,
  date date,
  id_employee integer,
  CONSTRAINT grh PRIMARY KEY (id)
)
WITH (
  OIDS=FALSE
);

Now I have to do report in which I need following information: categories.category, categories.name (all of them, so string_agg is ok) - could be many assigned to one category and products_.id_employee -> but not with comma as above with category name but the one with newest date assigned (and here is my problem);

I've tried already constructions as:

SELECT
  DISTINCT ON (category ) category,
  string_agg(name, ','),
  (SELECT
     id_employee
   FROM products_
   WHERE date = (SELECT
                   max(date)
                 FROM products_
                 WHERE id IN (SELECT
                                id
                              FROM categories
                              WHERE id = c.id)))
FROM categories c
ORDER BY category;

But PostgreSQL says that subquery is returning to many rows... Please help!

EXAMPLE INSERTS:

INSERT INTO categories(
            id, category, name)
    VALUES (1,22,'car'),(2,22,'bike'),(3,22,'boat'),(4,33,'soap'),(5,44,'chicken');

INSERT INTO products_(
            id, date, id_employee)
    VALUES (1,'2009-11-09',11),(2,'2010-09-09',2),(3,'2013-01-01',4),(5,'2014-09-01',90);

OK, I've solved this problem. This one works just fine:

WITH max_date AS (
    SELECT
      category,
      max(date)             AS date,
      string_agg(name, ',') AS names
    FROM test.products_
      JOIN test.categories c
      USING (id)
    GROUP BY c.category
)
SELECT
  max(id_employee) AS id_employee,
  md.category,
  names
FROM test.products_ p
  LEFT JOIN max_date md
  USING (date)
  LEFT JOIN test.categories
  USING (category)
WHERE p.date = md.date AND p.id IN (SELECT
                                      id
                                    FROM test.categories
                                    WHERE category = md.category)
GROUP BY category, names;

回答1:

It seems that id is being used to join the two tables, which seems strange to me.

In any case, the base query for the category names is:

SELECT c.category, string_agg(c.name, ','),
FROM categories c
group by c.category;

The question is: how to get the most recent name? This approach uses the row_number() function:

SELECT c.category, string_agg(c.name, ','), cp.id_employee
FROM categories c left outer join
     (select c.category, c.name, p.id_employee,
             row_number() over (partition by c.category order by date desc) as seqnum
      from categories c left outer join
           products_ p
           on c.id = p.id
     ) cp
     on cp.category = c.category and
        cp.seqnum = 1
group by c.category, cp.id_employee;