可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效，请关闭广告屏蔽插件后再试):

问题:

I did the below in AWS Redshift cluster to read the Parquet file from S3.

create external schema s3_external_schema 
from data catalog 
database 'dev' 
iam_role 'arn:aws:iam::<MyuniqueId>:role/<MyUniqueRole>'
create external database if not exists;

then

CREATE external table s3_external_schema.SUPPLIER_PARQ_1 (
S_SuppKey BIGINT ,
S_Name varchar(64) ,
S_Address varchar(64) ,
S_NationKey int ,
S_Phone varchar(18) ,
S_AcctBal decimal(13, 2) ,
S_Comment varchar(105))
partitioned by (Supplier bigint, nation int)
stored as PARQUET
location 's3://<My Bucket>/<File partition>/';

Both the above create statements were successful. When the ran the below query it is returning 0.

select * from s3_external_schema."supplier_parq_1"
limit 10;

No rows returned.

Am I missing any grants or access permission to make it return the query output.

回答1:

Steps to debug a non-working Redshift-Spectrum query

try same query using athena: easiest way is to run a glue crawler against the s3 folder, it should create a hive metastore table that you can straight away query (using same sql as you have already) in athena.
use s3 select : using aws console navigate to one of your parquet files, right click and "select", then click parquet (may be already defaulted) and see what you get.

If 1. or 2. are working, then it is likely your definition is not right in spectrum. instead just use the definition as created within the hive metastore like this.

create external schema some_schema from data catalog
database 'the_name_you_gave_the_hive_db'
iam_role 'whatever'
create external database if not exists;

You can then just use the newly defined redshift spectrum schema without further definition.

回答2:

In addition to what Jon Scott said, if an external table is created with partitions unless one runs ALTER TABLE to add partitions, no data will be shown

What needs to be run after the table is created and subsequent partitions are added is

ALTER TABLE s3_external_schema.SUPPLIER_PARQ_1 ADD PARTITION ('...')
LOCATION 's3://<My Bucket>/<File partition>/'

To check the error messages for external table queries SVL_S3LOG table should be used.

Using Redshift Spectrum to read the data in extern

问题:

回答1:

回答2:

收藏的人(0)

Using Redshift Spectrum to read the data in extern

问题:

回答1:

回答2:

收藏的人(0)

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮