Hey everyone. I believe this is a 'greatest-n-per-group' question but even after looking at several questions on StackOverflow, I'm unsure how to apply this to my situation...
I'm using a MySQL database and have a basic blog-type system set up about Computer Applications... The tables look like this:
POSTS
post_id
post_created
post_type -- could be article, review, feature, whatever
post_status -- 'a' approved or 'd' for draft
APPS
app_id
app_name
app_platform -- Windows, linux, unix, etc..
APP_TO_POST -- links my posts to its relevant application
atp_id
atp_app_id
atp_post_id
I'm using the following basic query to pull all articles for the application with the name 'Photoshop' where the post type is an 'Article' and the status of the article is 'a' for approved:
SELECT apps.app_name, apps.app_platform, posts.post_created, posts.post_id
FROM apps
JOIN app_to_post ON app_to_post.atp_app_id = apps.app_id
JOIN posts ON app_to_post.atp_post_id = posts.post_id
WHERE apps.app_name = 'Photoshop'
AND
posts.post_type = 'Article'
AND
posts.post_status = 'a'
Which gets me these expected results:
app_name app_platform post_created post_id
Photoshop Windows Oct. 20th, 2009 1
Photoshop Windows Dec. 1, 2009 3
Photoshop Macintosh Nov. 10th, 2009 2
Would anyone be able to lend a hand on how I could alter that query to only pull the most recent article per application platform? So for example, I'd like my results to look like this:
app_name app_platform post_created post_id
Photoshop Windows Dec. 1, 2009 3
Photoshop Macintosh Nov. 10th, 2009 2
And omit one of the 'Photoshop Windows'
articles because it isn't the most recent one.
If I simply tack on a MAX(post_created)
and a GROUP BY app_platform
my results don't always group correctly. From how I understand it I need to perform some kind of inner join of a sub query?
Since you have plenty of JOIN
s, I suggest creating a VIEW
first:
CREATE VIEW articles AS
SELECT a.app_name, a.app_platform, p.post_created, p.post_id
FROM apps a
JOIN app_to_post ap ON ap.atp_app_id = a.app_id
JOIN posts p ON ap.atp_post_id = p.post_id
WHERE p.post_type = 'Article' AND p.post_status = 'a';
Then you can use a NULL-self-join:
SELECT a1.app_name, a1.app_platform, a1.post_created, a1.post_id
FROM articles a1
LEFT JOIN articles a2 ON
a2.app_platform = a1.app_platform AND a2.post_created > a1.post_created
WHERE a2.post_id IS NULL;
Test case:
CREATE TABLE posts (
post_id int,
post_created datetime,
post_type varchar(30),
post_status char(1)
);
CREATE TABLE apps (
app_id int,
app_name varchar(40),
app_platform varchar(40)
);
CREATE TABLE app_to_post (
atp_id int,
atp_app_id int,
atp_post_id int
);
INSERT INTO posts VALUES (1, '2010-10-06 05:00:00', 'Article', 'a');
INSERT INTO posts VALUES (2, '2010-10-06 06:00:00', 'Article', 'a');
INSERT INTO posts VALUES (3, '2010-10-06 07:00:00', 'Article', 'a');
INSERT INTO posts VALUES (4, '2010-10-06 08:00:00', 'Article', 'a');
INSERT INTO posts VALUES (5, '2010-10-06 09:00:00', 'Article', 'a');
INSERT INTO apps VALUES (1, 'Photoshop', 'Windows');
INSERT INTO apps VALUES (2, 'Photoshop', 'Macintosh');
INSERT INTO app_to_post VALUES (1, 1, 1);
INSERT INTO app_to_post VALUES (1, 1, 2);
INSERT INTO app_to_post VALUES (1, 2, 3);
INSERT INTO app_to_post VALUES (1, 2, 4);
INSERT INTO app_to_post VALUES (1, 1, 5);
Result:
+-----------+--------------+---------------------+---------+
| app_name | app_platform | post_created | post_id |
+-----------+--------------+---------------------+---------+
| Photoshop | Macintosh | 2010-10-06 08:00:00 | 4 |
| Photoshop | Windows | 2010-10-06 09:00:00 | 5 |
+-----------+--------------+---------------------+---------+
2 rows in set (0.00 sec)
As a side-note, in general you don't need a surrogate key for your junction table. You might as well set up a composite primary key (and ideally foreign keys to the referenced tables):
CREATE TABLE app_to_post (
atp_app_id int,
atp_post_id int,
PRIMARY KEY (atp_app_id, atp_post_id),
FOREIGN KEY (atp_app_id) REFERENCES apps (app_id),
FOREIGN KEY (atp_post_id) REFERENCES posts (post_id)
) ENGINE=INNODB;
Let's first consider how to obtain rows with maximum value from your query result and your desirable result:
Your result: (Let's call it table T)
app_name app_platform post_created post_id
Photoshop Windows Oct. 20th, 2009 1
Photoshop Windows Dec. 1, 2009 3
Photoshop Macintosh Nov. 10th, 2009 2
The result you want:
app_name app_platform post_created post_id
Photoshop Windows Dec. 1, 2009 3
Photoshop Macintosh Nov. 10th, 2009 2
In order to get the result, you should:
- Calculate the maximum post_id for each platform for table T.
- Join maximum result with the origin table T to obtain values in other columns of the row.
The query is below:
SELECT
t1.app_name,t1.app_platform,t1.post_created,t1.post_id
FROM
(SELECT app_platform, MAX(post_created) As MaxPostCreated
FROM T
GROUP BY app_platform) AS t2 JOIN
T AS t1
WHERE
t1.app_platform = t2.app_platform1
AND t2.MaxPostCreated = t1.post_created
In this query, the subquery performed the first step, and join performs the second step.
The final result combining with your partial answer is show below (with a view):
CREATE VIEW T
SELECT a.app_name, a.app_platform, p.post_created, p.post_id
FROM apps a
JOIN app_to_post ap ON ap.atp_app_id = a.app_id
JOIN posts p ON ap.atp_post_id = p.post_id
WHERE p.post_type = 'Article' AND p.post_status = 'a';
SELECT
t1.app_name,t1.app_platform,t1.post_created,t1.post_id
FROM
(SELECT app_platform, MAX(post_created) As MaxPostCreated
FROM T
GROUP BY app_platform) AS t2 JOIN
T AS t1
WHERE
t1.app_platform = t2.app_platform1
AND t2.MaxPostCreated= t1.post_created
By the way, our team is actually now developing a tool trying to automatically help users to write queries, and users can provide input-output examples to the tool, and the tool will produce a query. (The first part of the query is actually generated by the tool! The link to our prototype is https://github.com/Mestway/Scythe)
Hope this can help you. :)
You're on the right track.
Try adding
group by app_name,app_platform
having post_created=max(post_created)
Or if your post_id is sequential where a higher value will always reflect a later post, use this having clause: having post_id=max(post_id)