In one of my use case, i have two tables namely flow and conf. The flow table contains list of all flight data. It has columns creationdate,datafilename,aircraftid. The conf table contains configuration information. It has columns configdate, aircraftid, configurationame. There are multiple versions of configurations created for one aircraft type. So, when we process a datafilename, we need to identify the aircraftid from the flow table, and pick up the configuration from conf table that was created just before the datafilename was created. So, i tried this,
FROM (
SELECT
F_FILE_CREATION_DATE,
F_FILE_ARCHIVED_RELATIVE_PATH,
F_FILE_ARCHIVED_NAME,
K_AIRCRAFT
from T_FLOW f )x left join
(
select c.config_date, c.aircraft_id, c.configurationfrom t_conf c
) y on y.aircraft_id = x.K_AIRCRAFT
select
x.F_FILE_CREATION_DATE,
x.F_FILE_ARCHIVED_RELATIVE_PATH,
x.F_FILE_ARCHIVED_NAME,
x.K_AIRCRAFT,
y.config_date,
y.aircraft_id,
y.configuration;
This picks up all the configurations created for the aircraft which is obvious as there is no condition to check conf.config_date < flow.f_file_creation_date. I tried to include this condition like this,
FROM (
SELECT
F_FILE_CREATION_DATE,
F_FILE_ARCHIVED_RELATIVE_PATH,
F_FILE_ARCHIVED_NAME,
K_AIRCRAFT
from T_FLOW f )x join
(
select c.config_date, c.aircraft_id, c.FILEFILTER from t_conf c
) y on y.aircraft_id = x.K_AIRCRAFT where y.config_date < x.f_file_creation_date
select
x.F_FILE_CREATION_DATE,
x.F_FILE_ARCHIVED_RELATIVE_PATH,
x.F_FILE_ARCHIVED_NAME,
x.K_AIRCRAFT,
y.config_date,
y.aircraft_id,
y.filefilter;
This time failed with the error
required (...)+ loop did not match anything at input 'where' in statement
Can someone give me a hint or two where i am going wrong and on how to fix this?
select f.f_file_creation_date
,f.f_file_archived_relative_path
,f.f_file_archived_name
,f.k_aircraft
,c.config_date
,c.aircraft_id
,c.filefilter
from t_flow as f
join (select config_date
,aircraft_id
,filefilter
,lead (config_date,1,date '3000-01-01') over
(
partition by aircraft_id
order by config_date
) as next_config_date
from t_conf
) c
on c.aircraft_id =
f.k_aircraft
where f.f_file_creation_date >= c.config_date
and f.f_file_creation_date < c.next_config_date
Please read carefully
Posting a question
When you post a data related question -
- Supply a data sample: source data + required results.
It is going to be more clear than any explanation you give.
It will also supply a common background for further discussions and a way for you and others to verify the correctness of the given solutions.
- Supply the size properties (records/volume) of the tables.
It is important for performance considerations ans might impact the given solution.
SQL
- Hive currently does not support any JOIN condition type other than equijoin (e.g.
t1.X = t2.X and t1.Y = t2.Y
). This is why you get an error.
If you are doing an inner join (and not outer join) then you can move the non-equijoin conditions to the WHERE clause.
- Stick to ISO SQL standard. There is a conventional order for SQL clauses: SELECT-FROM-WHERE...
You gain nothing from esoteric syntax except for esoteric error messages.
- There is no reason what so ever to use sub-queries in order to narrow the columns list.
Just to make it perfectly clear - There isn't any performance gain doing that. More than that, if it would have work as you assume (and it does not) the performance would have been worse, not better.
I can't reproduce your error. I guess your query is valid.
What version do you use for Hive ? I tested this query with hive 2.1.1.
DROP TABLE IF EXISTS t_flow;
CREATE TABLE IF NOT EXISTS t_flow (
f_file_creation_date DATE
, f_file_archived_relative_path STRING
, f_file_archived_name STRING
, k_aircraft STRING
);
-- Conf table contains configuration information.
-- It has columns configdate, aircraftid, configurationame
DROP TABLE IF EXISTS t_conf;
CREATE TABLE IF NOT EXISTS t_conf (
config_date DATE
, aircraft_id STRING
, filefilter STRING
);
SELECT
x.f_file_creation_date,
x.f_file_archived_relative_path,
x.f_file_archived_name,
x.k_aircraft,
y.config_date,
y.aircraft_id,
y.filefilter
FROM
(SELECT
f_file_creation_date,
f_file_archived_relative_path,
f_file_archived_name,
k_aircraft
FROM t_flow f) x
JOIN
(SELECT
c.config_date,
c.aircraft_id,
c.filefilter
FROM t_conf c) y on y.aircraft_id = x.k_aircraft where y.config_date < x.f_file_creation_date;