I'm trying to set up Hive on Amazon's EMR to pull data from a DynamoDB table and dump it to S3. I've followed the instructions found here, and had success with most of our tables. With one DynamoDB table, however, I get an error (shown below).
The table in question has a lot of columns (>100), and cutting the mapping down to only a subset of them allows the script to run, so I'm assuming that this is the problem, but I can't find any documentation around this.
Is there some sort of hard limit on the number of columns I can define? Or is there some other limit that I'm likely to be hitting here? Is there a way to work around this?
The error I'm getting looks like:
FAILED: Error in metadata: javax.jdo.JDODataStoreException: Put request failed : INSERT INTO `TABLE_PARAMS` (`PARAM_VALUE`,`TBL_ID`,`PARAM_KEY`) VALUES (?,?,?)
NestedThrowables:
org.datanucleus.store.mapped.exceptions.MappedDatastoreException: INSERT INTO `TABLE_PARAMS` (`PARAM_VALUE`,`TBL_ID`,`PARAM_KEY`) VALUES (?,?,?)
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask
The script I'm trying to run looks like:
CREATE EXTERNAL TABLE hive_WSOP_DEV_STATS_input (col1 string, col2 string...)
STORED BY 'org.apache.hadoop.hive.dynamodb.DynamoDBStorageHandler'
TBLPROPERTIES ( "dynamodb.table.name" = "DYNAMO_TABLE_NAME",
"dynamodb.column.mapping" = "col1:col1,col2:col2...");