I just started learning hive yesterday and I am stuck on changing the type of a column in hive. I wanted to ask if the changing of column type has some kind of restriction on them because I can only do specific type of changes like I can convert int to double, string to double, double to string but I cant change string to int, double to int.
ALTER TABLE student CHANGE rollno rollno int;
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. Unable to alter table. The following columns have types incompatible with the existing columns in their respective positions :
rollno
hive> DESCRIBE FORMATTED student
> ;
OK
# col_name data_type comment
rollno int
name string
phno string
city string
ALTER TABLE student CHANGE rollno rollno double;
OK
Time taken: 0.144 seconds
ALTER TABLE student CHANGE rollno rollno int;
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. Unable to alter table. The following columns have types incompatible with the existing columns in their respective positions :
rollno
Cant do with strings either. So please guide me towards the answer as I cant seem to find anything through google.
Databases ( I worked on Teradata ) usually will not allow to modify a column datatype from a higher order to lower order or cases where the datatypes may clash with the data that is already stored/will be stored in the table.
Eg-1: We can modify an int
column to a double
column as the new modified datatype is capable of handing the already existing data in the table. where as modifying a double
column to int
might create issues because an int
column is not capable of holding a double
data.
Eg-2: Any(almost) column can be modified to String datatype, as this datatype is capable of holding a wide range of data. Any data with type double,float,int can be depicted as a string and hence this conversion will be accepted most of the time.
Eg-3: Another case that I came across is that a CHAR
to CHAR
conversion fails at times. When and Why? It fails only if you are trying to degrade the length( CHAR(20)
to CHAR(10)
) but will go fine when you try to upgrade ( CHAR(10)
to CHAR(20)
). This check will maintain data sanity where the data would not be let to truncate, in case the data length in the table is greater than the new datatype length.
I am a little surprised as to why Hive would accept the conversion from string to double. I tried this and it is forcing NULL for string values when the datatype is converted from string to double.
If you just want to change the column type,not caring about implicit conversions
for example,the old columns type is wrong.
You can try this:
set hive.metastore.disallow.incompatible.col.type.changes=false;
String to int
or double to int
conversion is not possible.
You can refer the Allowed Implicit Conversions in Hive.
It gives you the entire list whatever is possible or not.
Follow this link to the end of the page, there there is a table with the implicit conversions you can do :
Language Types https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Types
Simply put you cannot convert a 'larger' type to a 'smaller' one without specific code.
String to int conversion is not possible because string contains chars and not able to convert them to int.
Double to int conversion is not possible in hive because the data might loss in the conversion.
eg: lets say if the column type is int and the values are
1
2
3
If we want to convert the column type to double, the values will be converted as follows.
1.0
2.0
3.0
There is no loss in data even after conversion.
But the column data type is double and let the values are
1.2
2.0
3.5
4.1
and we want the column to int these values will be converted to
1
2
3
4
so clearly there will be loss in the data. that's why hive is not allowed to double to int conversion.