Maximum Number of Columns in Hive External Tables

2019-02-20 16:02发布

问题:

I'm trying to set up Hive on Amazon's EMR to pull data from a DynamoDB table and dump it to S3. I've followed the instructions found here, and had success with most of our tables. With one DynamoDB table, however, I get an error (shown below).

The table in question has a lot of columns (>100), and cutting the mapping down to only a subset of them allows the script to run, so I'm assuming that this is the problem, but I can't find any documentation around this.

Is there some sort of hard limit on the number of columns I can define? Or is there some other limit that I'm likely to be hitting here? Is there a way to work around this?


The error I'm getting looks like:

FAILED: Error in metadata: javax.jdo.JDODataStoreException: Put request failed : INSERT INTO `TABLE_PARAMS` (`PARAM_VALUE`,`TBL_ID`,`PARAM_KEY`) VALUES (?,?,?)
NestedThrowables:
org.datanucleus.store.mapped.exceptions.MappedDatastoreException: INSERT INTO `TABLE_PARAMS` (`PARAM_VALUE`,`TBL_ID`,`PARAM_KEY`) VALUES (?,?,?)
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask

The script I'm trying to run looks like:

CREATE EXTERNAL TABLE hive_WSOP_DEV_STATS_input (col1 string, col2 string...)
    STORED BY 'org.apache.hadoop.hive.dynamodb.DynamoDBStorageHandler'
    TBLPROPERTIES ( "dynamodb.table.name" = "DYNAMO_TABLE_NAME",
        "dynamodb.column.mapping" = "col1:col1,col2:col2...");

回答1:

I ran into a similar problem a couple of years ago. If I recall correctly the issue is that hive places a limit on the length of text in the query that it is writing into the database. If you look at the call stack you can probably find out if that variable is configurable and if not where to edit the code.