There is a table messages
that contains data as shown below:
Id Name Other_Columns
-------------------------
1 A A_data_1
2 A A_data_2
3 A A_data_3
4 B B_data_1
5 B B_data_2
6 C C_data_1
If I run a query select * from messages group by name
, I will get the result as:
1 A A_data_1
4 B B_data_1
6 C C_data_1
What query will return the following result?
3 A A_data_3
5 B B_data_2
6 C C_data_1
That is, the last record in each group should be returned.
At present, this is the query that I use:
SELECT
*
FROM (SELECT
*
FROM messages
ORDER BY id DESC) AS x
GROUP BY name
But this looks highly inefficient. Any other ways to achieve the same result?
I've not yet tested with large DB but I think this could be faster than joining tables:
Clearly there are lots of different ways of getting the same results, your question seems to be what is an efficient way of getting the last results in each group in MySQL. If you are working with huge amounts of data and assuming you are using InnoDB with even the latest versions of MySQL (such as 5.7.21 and 8.0.4-rc) then there might not be an efficient way of doing this.
We sometimes need to do this with tables with even more than 60 million rows.
For these examples I will use data with only about 1.5 million rows where the queries would need to find results for all groups in the data. In our actual cases we would often need to return back data from about 2,000 groups (which hypothetically would not require examining very much of the data).
I will use the following tables:
The temperature table is populated with about 1.5 million random records, and with 100 different groups. The selected_group is populated with those 100 groups (in our cases this would normally be less than 20% for all of the groups).
As this data is random it means that multiple rows can have the same recordedTimestamps. What we want is to get a list of all of the selected groups in order of groupID with the last recordedTimestamp for each group, and if the same group has more than one matching row like that then the last matching id of those rows.
If hypothetically MySQL had a last() function which returned values from the last row in a special ORDER BY clause then we could simply do:
which would only need to examine a few 100 rows in this case as it doesn't use any of the normal GROUP BY functions. This would execute in 0 seconds and hence be highly efficient. Note that normally in MySQL we would see an ORDER BY clause following the GROUP BY clause however this ORDER BY clause is used to determine the ORDER for the last() function, if it was after the GROUP BY then it would be ordering the GROUPS. If no GROUP BY clause is present then the last values will be the same in all of the returned rows.
However MySQL does not have this so let's look at different ideas of what it does have and prove that none of these are efficient.
Example 1
This examined 3,009,254 rows and took ~0.859 seconds on 5.7.21 and slightly longer on 8.0.4-rc
Example 2
This examined 1,505,331 rows and took ~1.25 seconds on 5.7.21 and slightly longer on 8.0.4-rc
Example 3
This examined 3,009,685 rows and took ~1.95 seconds on 5.7.21 and slightly longer on 8.0.4-rc
Example 4
This examined 6,137,810 rows and took ~2.2 seconds on 5.7.21 and slightly longer on 8.0.4-rc
Example 5
This examined 6,017,808 rows and took ~4.2 seconds on 8.0.4-rc
Example 6
This examined 6,017,908 rows and took ~17.5 seconds on 8.0.4-rc
Example 7
This one was taking forever so I had to kill it.
You can take view from here as well.
http://sqlfiddle.com/#!9/ef42b/9
FIRST SOLUTION
SECOND SOLUTION
I arrived at a different solution, which is to get the IDs for the last post within each group, then select from the messages table using the result from the first query as the argument for a
WHERE x IN
construct:I don't know how this performs compared to some of the other solutions, but it worked spectacularly for my table with 3+ million rows. (4 second execution with 1200+ results)
This should work both on MySQL and SQL Server.
If performance is really your concern you can introduce a new column on the table called
IsLastInGroup
of type BIT.Set it to true on the columns which are last and maintain it with every row insert/update/delete. Writes will be slower, but you'll benefit on reads. It depends on your use case and I recommend it only if you're read-focused.
So your query will look like: