I have a table of sensor data. Each row has a sensor id, a timestamp, and other fields. I want to select a single row with latest timestamp for each sensor, including some of the other fields.
I thought that the solution would be to group by sensor id and then order by max(timestamp) like so:
SELECT sensorID,timestamp,sensorField1,sensorField2
FROM sensorTable
GROUP BY sensorID
ORDER BY max(timestamp);
This gives me an error saying that "sensorField1 must appear in the group by clause or be used in an aggregate."
What is the correct way to approach this problem?
You can only select columns that are in the group or used in an aggregate function. You can use a join to get this working
This can de done in a relatively elegant way using
SELECT DISTINCT
, as follows:The above works for PostgreSQL (some more info here) but I think also other engines. In case it's not obvious, what this does is sort the table by sensor ID and timestamp (newest to oldest), and then returns the first row (i.e. latest timestamp) for each unique sensor ID.
In my use case I have ~10M readings from ~1K sensors, so trying to join the table with itself on a timestamp-based filter is very resource-intensive; the above takes a couple of seconds.
I had mostly the same problem and ended up a a different solution that makes this type of problem trivial to query.
I have a table of sensor data (1 minute data from about 30 sensors)
and I have a sensor table that has lots of mostly static stuff about the sensor but the relevant fields are these:
The tvLastupdate and tvLastValue are set in a trigger on inserts to the SensorReadings table. I always have direct access to these values without needing to do any expensive queries. This does denormalize slightly. The query is trivial:
I use this method for data that is queried often. In my case I have a sensor table, and a large event table, that have data coming in at the minute level AND dozens of machines are updating dashboards and graphs with that data. With my data scenario the trigger-and-cache method works well.
You can join the table with itself (on sensor id), and add
left.timestamp < right.timestamp
as join condition. Then you pick the rows, whereright.id
isnull
. Voila, you got the latest entry per sensor.http://sqlfiddle.com/#!9/45147/37
But please note, that this will be very resource intensive if you have a little amount of ids and many values! So, I wouldn't recommend this for some sort of Measuring-Stuff, where each Sensor collects a value every minute. However in a Use-Case, where you need to track "Revisions" of something that changes just "sometimes", it's easy going.
For the sake of completeness, here's another possible solution:
Pretty self-explaining I think, but here's more info if you wish, as well as other examples. It's from the MySQL manual, but above query works with every RDBMS (implementing the sql'92 standard).