Can I delete data (rows in tables) from Athena?

2019-07-14 04:44发布

Is it possible to delete data stored in S3 through an Athena query? I have some rows I have to delete from a couple of tables (they point to separate buckets in S3).

I couldn't find a way to do it in the Athena User Guide: https://docs.aws.amazon.com/athena/latest/ug/athena-ug.pdf and DELETE FROM isn't supported, but I'm wondering if there is an easier way than trying to find the files in S3 and deleting them.

3条回答
神经病院院长
2楼-- · 2019-07-14 04:56

You can leverage Athena to find out all the files that you want to delete and then delete them separately. There is a special variable "$path".

Select "$path" from <table> where <condition to get row of files to delete>

To automate this, you can have iterator on Athena results and then get filename and delete them from S3.

查看更多
来,给爷笑一个
3楼-- · 2019-07-14 04:59

I also would like to add that after you find the files to be updated you can filter the rows you want to delete, and create new files using CTAS: https://docs.aws.amazon.com/athena/latest/ug/ctas.html

Later you can replace the old files with the new ones created by CTAS. I think it is the most simple way to go

查看更多
做个烂人
4楼-- · 2019-07-14 05:17

I would just like to add to Dhaval's answer.

You can find out the path of the file with the rows that you want to delete and instead of deleting the entire file, you can just delete the rows from the S3 file which I am assuming would be in the Json format.

The process is to download the particular file which has those rows, remove the rows from that file and upload the same file to S3.

This just replaces the original file with the one with modified data (in your case, without the rows that got deleted). After the upload, Athena would tranform the data again and the deleted rows won't show up.

查看更多
登录 后发表回答