I need to create a temporary table in HIVE using an existing table that has 7 columns. I just want to get rid of duplicates with respect to first three columns and also retain the corresponding values in the other 4 columns. I don't care which row is actually dropped while de-duplicating using first three rows alone.
相关问题
-
hive: cast array
> into map - Find function in HIVE
- Hive Tez reducers are running super slow
- Set parquet snappy output file size is hive?
- Hive 'cannot alter table' error
相关文章
- 在hive sql里怎么把"2020-10-26T08:41:19.000Z"这个字符串转换成年月日
- SQL query Frequency Distribution matrix for produc
- Cloudera 5.6: Parquet does not support date. See H
- converting to timestamp with time zone failed on A
- Hive error: parseexception missing EOF
- ClassNotFoundException: org.apache.spark.SparkConf
- How to get previous day date in Hive
- Hive's hour() function returns 12 hour clock v
You could use something as below if you are not considered about ordering
Below is another approach, which gives much control over ordering but slower than above approach
rank() over(..) function returns more than one column with rank as '1' if order by columns are all equal. In our case if there are 2 columns with exact same values for all seven columns then there will be duplicates when we use filter as col_rank =1. These duplicates can be eleminated using max and group by clauses as written in above query.