I would like to know if there is support for OFFSET in AWS Athena. For mysql the following query is running but in athena it is giving me error. Any example would be helpful.
select * from employee where empSal >3000 LIMIT 300 OFFSET 20
I would like to know if there is support for OFFSET in AWS Athena. For mysql the following query is running but in athena it is giving me error. Any example would be helpful.
select * from employee where empSal >3000 LIMIT 300 OFFSET 20
Athena is basically managed Presto. Since Presto 311 you can use OFFSET m LIMIT n
syntax or ANSI SQL equivalent: OFFSET m ROWS FETCH NEXT n ROWS ONLY
.
You can read more in Beyond LIMIT, Presto meets OFFSET and TIES.
For older versions (and this includes AWS Athena as of this writing), you can use row_number()
window function to implement OFFSET + LIMIT.
For example, instead of
SELECT * FROM elb_logs
OFFSET 5 LIMIT 5 -- this doesn't work, obviously
You can execute
SELECT * FROM (
SELECT row_number() over() AS rn, * FROM elb_logs)
WHERE rn BETWEEN 5 AND 10;
Note: the execution engine will still need to read offset+limit rows from the underlying table, but this is still much better than sending all these rows back to the client and taking a sublist there.
Warning: see https://stackoverflow.com/a/45114359/65458 for explanation why avoiding OFFSET in queries is generally a good idea.
OFFSET Is not supported by AWS Athena. You can see all the supported SELECT parameters here: SELECT
It seems that the current accepted solution does not work properly with the ORDER BY
keyword as the row_number()
is applied before ordering. I believe an exact solution allowing you to use ORDER BY
would be as follows:
SELECT * FROM (
SELECT row_number() over() AS rn, *
FROM (
SELECT *
FROM elb_logs
ORDER BY id ASC
)
)
WHERE rn BETWEEN 5 AND 10;
You could limit and filter by a natural key of the data.
For example, if you had an id
column in your dataset you could do the following:
SELECT id, * FROM elb_logs
WHERE id > __LAST_SEEN_ID__
ORDER BY id
LIMIT 500
So your offset would be defined implicitly, using the filter, based on the last id that you have processed.