AWS Athena (Presto) OFFSET support

2020-07-10 08:35发布

问题:

I would like to know if there is support for OFFSET in AWS Athena. For mysql the following query is running but in athena it is giving me error. Any example would be helpful.

select * from employee where empSal >3000 LIMIT 300 OFFSET 20

回答1:

Athena is basically managed Presto. Since Presto 311 you can use OFFSET m LIMIT n syntax or ANSI SQL equivalent: OFFSET m ROWS FETCH NEXT n ROWS ONLY.

You can read more in Beyond LIMIT, Presto meets OFFSET and TIES.

For older versions (and this includes AWS Athena as of this writing), you can use row_number() window function to implement OFFSET + LIMIT.

For example, instead of

SELECT * FROM elb_logs
OFFSET 5 LIMIT 5 -- this doesn't work, obviously

You can execute

SELECT * FROM (
    SELECT row_number() over() AS rn, * FROM elb_logs)
WHERE rn BETWEEN 5 AND 10;

Note: the execution engine will still need to read offset+limit rows from the underlying table, but this is still much better than sending all these rows back to the client and taking a sublist there.

Warning: see https://stackoverflow.com/a/45114359/65458 for explanation why avoiding OFFSET in queries is generally a good idea.



回答2:

OFFSET Is not supported by AWS Athena. You can see all the supported SELECT parameters here: SELECT



回答3:

It seems that the current accepted solution does not work properly with the ORDER BY keyword as the row_number() is applied before ordering. I believe an exact solution allowing you to use ORDER BY would be as follows:

SELECT * FROM (
  SELECT row_number() over() AS rn, *
  FROM ( 
    SELECT *
    FROM elb_logs
    ORDER BY id ASC 
  )
)
WHERE rn BETWEEN 5 AND 10;


回答4:

You could limit and filter by a natural key of the data.

For example, if you had an id column in your dataset you could do the following:

SELECT id, * FROM elb_logs
WHERE id > __LAST_SEEN_ID__
ORDER BY id
LIMIT 500 

So your offset would be defined implicitly, using the filter, based on the last id that you have processed.