Creating a hive table with ~40K columns

2019-09-04 06:34发布

问题:

I'm trying to create a fairly large table. ~3 millions rows and ~40K columns using hive. To begin, I'm creating an empty table and inserting the data into the table.

However, I hit an error when trying this.

Unable to acquire IMPLICIT, SHARED lock default after 100 attempts. FAILED: Error in acquiring locks: Locks on the underlying objects cannot be acquire. retry after some time

The query is pretty straightforward:

create external table database.dataset (
var1 decimal(10,2),
var2 decimal(10,2),
...
var40000 decimal(10,2)
) location 'hdfs://nameservice1/root/user1/project1';

Anybody seen this error before? Cloudera says there are no limits on number of columns, but clearly hitting some system limitation here.

Additionally, I can create a smaller hive table in the specified location.

回答1:

Ran across this blog post which appears to identify and fix the problem: http://gbif.blogspot.com/2014/03/lots-of-columns-with-hive-and-hbase.html

Short answer: there is a limit on the number of characters hive will pass in a query, but you can increase that with the following option change:

alter table "SERDE_PARAMS" alter column "PARAM_VALUE" type text;

Untested as I went with a different tool to handle the data (for the problem above) since hive was failing for unknown reasons. If you come across something similar, try this out and give an update please.