Change column type in hive

2019-05-28 13:39发布

问题:

I just started learning hive yesterday and I am stuck on changing the type of a column in hive. I wanted to ask if the changing of column type has some kind of restriction on them because I can only do specific type of changes like I can convert int to double, string to double, double to string but I cant change string to int, double to int.

ALTER TABLE student CHANGE rollno rollno int;
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. Unable to alter table. The following columns have types incompatible with the existing columns in their respective positions :
rollno

hive> DESCRIBE FORMATTED student
> ;
OK
# col_name              data_type               comment             

rollno                  int                                         
name                    string                                      
phno                    string                                      
city                    string   

    ALTER TABLE student CHANGE rollno rollno double;
OK
Time taken: 0.144 seconds

ALTER TABLE student CHANGE rollno rollno int;

FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. Unable to alter table. The following columns have types incompatible with the existing columns in their respective positions :

rollno

Cant do with strings either. So please guide me towards the answer as I cant seem to find anything through google.

回答1:

Databases ( I worked on Teradata ) usually will not allow to modify a column datatype from a higher order to lower order or cases where the datatypes may clash with the data that is already stored/will be stored in the table.

Eg-1: We can modify an int column to a double column as the new modified datatype is capable of handing the already existing data in the table. where as modifying a double column to int might create issues because an int column is not capable of holding a double data.

Eg-2: Any(almost) column can be modified to String datatype, as this datatype is capable of holding a wide range of data. Any data with type double,float,int can be depicted as a string and hence this conversion will be accepted most of the time.

Eg-3: Another case that I came across is that a CHAR to CHAR conversion fails at times. When and Why? It fails only if you are trying to degrade the length( CHAR(20) to CHAR(10) ) but will go fine when you try to upgrade ( CHAR(10) to CHAR(20) ). This check will maintain data sanity where the data would not be let to truncate, in case the data length in the table is greater than the new datatype length.

I am a little surprised as to why Hive would accept the conversion from string to double. I tried this and it is forcing NULL for string values when the datatype is converted from string to double.



回答2:

If you just want to change the column type,not caring about implicit conversions for example,the old columns type is wrong.

You can try this:

set hive.metastore.disallow.incompatible.col.type.changes=false;


回答3:

String to int or double to int conversion is not possible.

You can refer the Allowed Implicit Conversions in Hive.

It gives you the entire list whatever is possible or not.



回答4:

Follow this link to the end of the page, there there is a table with the implicit conversions you can do :

Language Types https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Types

Simply put you cannot convert a 'larger' type to a 'smaller' one without specific code.



回答5:

String to int conversion is not possible because string contains chars and not able to convert them to int. Double to int conversion is not possible in hive because the data might loss in the conversion.

eg: lets say if the column type is int and the values are

1

2

3

If we want to convert the column type to double, the values will be converted as follows.

1.0

2.0

3.0

There is no loss in data even after conversion.

But the column data type is double and let the values are

1.2

2.0

3.5

4.1

and we want the column to int these values will be converted to

1

2

3

4

so clearly there will be loss in the data. that's why hive is not allowed to double to int conversion.



标签: hadoop hive