We currently generate a daily CSV export that we upload to an S3 bucket, into the following structure:
<report-name>
|--reportDate-<date-stamp>
|-- part0.csv.gz
|-- part1.csv.gz
We want to be able to run reports partitioned by daily export.
According to this page, you can partition data in Redshift Spectrum by a key which is based on the source S3 folder where your Spectrum table sources its data. However, from the example, it looks like you need an ALTER
statement for each partition:
alter table spectrum.sales_part
add partition(saledate='2008-01-01')
location 's3://bucket/tickit/spectrum/sales_partition/saledate=2008-01/';
alter table spectrum.sales_part
add partition(saledate='2008-02-01')
location 's3://awssampledbuswest2/tickit/spectrum/sales_partition/saledate=2008-02/';
Is there any way to set the table up so that data is automatically partitioned by the folder it comes from, or do we need a daily job to ALTER
the table to add that day's partition?