In one of my use cases, I am trying to migrate traditional SQL based warehousing application into Hive. Neither I am an expert in SQL not in HiveQL. However, I have had decent success in creating tables and importing csv into hive tables. But then I hit a road block while replicating SQL based analytics query in Hive QL. In one of the analytics algorithms, we need to create multiple variables and calculate the sum of those variables, like below
var x = query1
var y = query2
var z = query3
var count = x + y + z
I am struggling to get this done in HiveQL.
In fact, all these variables form a column, and one more column is total that carries the sum of var1, var2, and var3. Is there any aggregate function something like sum that i can use to add values in the column of a particular row namely var1, var2, and var3, and store them in column total for that row?
Is it possible to do something like below. Is it not possible to tell Hive during the creation of table TEMP itself?
INSERT INTO TABLE TEMP VALUES (1, 2, 1+2);
If the above is not possible, I tried the below workaround
DROP TABLE T_TEMP;
DROP TABLE VARI;
CREATE TABLE VARI(value INT);
INSERT INTO TABLE VARI VALUES(1);
INSERT INTO TABLE VARI VALUES(2);
CREATE TABLE T_TEMP(var1 INT, var2 INT, total INT);
INSERT INTO TABLE T_TEMP
SELECT
1,
2,
sum(t.value)
from VARI as t;
I am using Apache Hue and prefer to write all the queries and statements in Hive editor that Hue proviedes, and fire the query in the end.
Is there any other clean approaches available, other than, a)writing into temp column, b) adding the values in INSERT directly. I prefer to have variables x, y, z and be able to sum them count = x + y + z.