AWS Athena (Presto) OFFSET support

2020-07-10 08:33发布

I would like to know if there is support for OFFSET in AWS Athena. For mysql the following query is running but in athena it is giving me error. Any example would be helpful.

select * from employee where empSal >3000 LIMIT 300 OFFSET 20

4条回答
太酷不给撩
2楼-- · 2020-07-10 09:02

It seems that the current accepted solution does not work properly with the ORDER BY keyword as the row_number() is applied before ordering. I believe an exact solution allowing you to use ORDER BY would be as follows:

SELECT * FROM (
  SELECT row_number() over() AS rn, *
  FROM ( 
    SELECT *
    FROM elb_logs
    ORDER BY id ASC 
  )
)
WHERE rn BETWEEN 5 AND 10;
查看更多
在下西门庆
3楼-- · 2020-07-10 09:04

Athena is basically managed Presto. Since Presto 311 you can use OFFSET m LIMIT n syntax or ANSI SQL equivalent: OFFSET m ROWS FETCH NEXT n ROWS ONLY.

You can read more in Beyond LIMIT, Presto meets OFFSET and TIES.

For older versions (and this includes AWS Athena as of this writing), you can use row_number() window function to implement OFFSET + LIMIT.

For example, instead of

SELECT * FROM elb_logs
OFFSET 5 LIMIT 5 -- this doesn't work, obviously

You can execute

SELECT * FROM (
    SELECT row_number() over() AS rn, * FROM elb_logs)
WHERE rn BETWEEN 5 AND 10;

Note: the execution engine will still need to read offset+limit rows from the underlying table, but this is still much better than sending all these rows back to the client and taking a sublist there.

Warning: see https://stackoverflow.com/a/45114359/65458 for explanation why avoiding OFFSET in queries is generally a good idea.

查看更多
▲ chillily
4楼-- · 2020-07-10 09:17

OFFSET Is not supported by AWS Athena. You can see all the supported SELECT parameters here: SELECT

查看更多
Explosion°爆炸
5楼-- · 2020-07-10 09:22

You could limit and filter by a natural key of the data.

For example, if you had an id column in your dataset you could do the following:

SELECT id, * FROM elb_logs
WHERE id > __LAST_SEEN_ID__
ORDER BY id
LIMIT 500 

So your offset would be defined implicitly, using the filter, based on the last id that you have processed.

查看更多
登录 后发表回答