I did the below in AWS Redshift cluster to read the Parquet file from S3.
create external schema s3_external_schema
from data catalog
database 'dev'
iam_role 'arn:aws:iam::<MyuniqueId>:role/<MyUniqueRole>'
create external database if not exists;
then
CREATE external table s3_external_schema.SUPPLIER_PARQ_1 (
S_SuppKey BIGINT ,
S_Name varchar(64) ,
S_Address varchar(64) ,
S_NationKey int ,
S_Phone varchar(18) ,
S_AcctBal decimal(13, 2) ,
S_Comment varchar(105))
partitioned by (Supplier bigint, nation int)
stored as PARQUET
location 's3://<My Bucket>/<File partition>/';
Both the above create statements were successful. When the ran the below query it is returning 0.
select * from s3_external_schema."supplier_parq_1"
limit 10;
No rows returned.
Am I missing any grants or access permission to make it return the query output.
In addition to what Jon Scott said, if an external table is created with partitions unless one runs
ALTER TABLE
to add partitions, no data will be shownWhat needs to be run after the table is created and subsequent partitions are added is
To check the error messages for external table queries
SVL_S3LOG
table should be used.Steps to debug a non-working Redshift-Spectrum query
If 1. or 2. are working, then it is likely your definition is not right in spectrum. instead just use the definition as created within the hive metastore like this.
You can then just use the newly defined redshift spectrum schema without further definition.