I did the below in AWS Redshift cluster to read the Parquet file from S3.
create external schema s3_external_schema
from data catalog
database 'dev'
iam_role 'arn:aws:iam::<MyuniqueId>:role/<MyUniqueRole>'
create external database if not exists;
then
CREATE external table s3_external_schema.SUPPLIER_PARQ_1 (
S_SuppKey BIGINT ,
S_Name varchar(64) ,
S_Address varchar(64) ,
S_NationKey int ,
S_Phone varchar(18) ,
S_AcctBal decimal(13, 2) ,
S_Comment varchar(105))
partitioned by (Supplier bigint, nation int)
stored as PARQUET
location 's3://<My Bucket>/<File partition>/';
Both the above create statements were successful. When the ran the below query it is returning 0.
select * from s3_external_schema."supplier_parq_1"
limit 10;
No rows returned.
Am I missing any grants or access permission to make it return the query output.
Steps to debug a non-working Redshift-Spectrum query
- try same query using athena: easiest way is to run a glue crawler
against the s3 folder, it should create a hive metastore table that
you can straight away query (using same sql as you have already) in
athena.
- use s3 select : using aws console navigate to one of your parquet
files, right click and "select", then click parquet (may be already
defaulted) and see what you get.
If 1. or 2. are working, then it is likely your definition is not right in spectrum. instead just use the definition as created within the hive metastore like this.
create external schema some_schema from data catalog
database 'the_name_you_gave_the_hive_db'
iam_role 'whatever'
create external database if not exists;
You can then just use the newly defined redshift spectrum schema without further definition.
In addition to what Jon Scott said, if an external table is created with partitions unless one runs ALTER TABLE
to add partitions, no data will be shown
What needs to be run after the table is created and subsequent partitions are added is
ALTER TABLE s3_external_schema.SUPPLIER_PARQ_1 ADD PARTITION ('...')
LOCATION 's3://<My Bucket>/<File partition>/'
To check the error messages for external table queries SVL_S3LOG
table should be used.