Using Redshift Spectrum to read the data in extern

I did the below in AWS Redshift cluster to read the Parquet file from S3.

create external schema s3_external_schema 
from data catalog 
database 'dev' 
iam_role 'arn:aws:iam::<MyuniqueId>:role/<MyUniqueRole>'
create external database if not exists;

then

CREATE external table s3_external_schema.SUPPLIER_PARQ_1 (
S_SuppKey BIGINT ,
S_Name varchar(64) ,
S_Address varchar(64) ,
S_NationKey int ,
S_Phone varchar(18) ,
S_AcctBal decimal(13, 2) ,
S_Comment varchar(105))
partitioned by (Supplier bigint, nation int)
stored as PARQUET
location 's3://<My Bucket>/<File partition>/';

Both the above create statements were successful. When the ran the below query it is returning 0.

select * from s3_external_schema."supplier_parq_1"
limit 10;

No rows returned.

Am I missing any grants or access permission to make it return the query output.

标签： amazon-redshift amazon-redshift-spectrum

2条回答

何必那么认真

2楼-- · 2019-08-18 12:38

In addition to what Jon Scott said, if an external table is created with partitions unless one runs ALTER TABLE to add partitions, no data will be shown

What needs to be run after the table is created and subsequent partitions are added is

ALTER TABLE s3_external_schema.SUPPLIER_PARQ_1 ADD PARTITION ('...')
LOCATION 's3://<My Bucket>/<File partition>/'

To check the error messages for external table queries SVL_S3LOG table should be used.

0人赞添加讨论(0) 举报

够拽才男人

3楼-- · 2019-08-18 12:44

Steps to debug a non-working Redshift-Spectrum query

try same query using athena: easiest way is to run a glue crawler against the s3 folder, it should create a hive metastore table that you can straight away query (using same sql as you have already) in athena.
use s3 select : using aws console navigate to one of your parquet files, right click and "select", then click parquet (may be already defaulted) and see what you get.

If 1. or 2. are working, then it is likely your definition is not right in spectrum. instead just use the definition as created within the hive metastore like this.

create external schema some_schema from data catalog
database 'the_name_you_gave_the_hive_db'
iam_role 'whatever'
create external database if not exists;

You can then just use the newly defined redshift spectrum schema without further definition.

0人赞添加讨论(0) 举报

Using Redshift Spectrum to read the data in extern

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间