I have two tables one in mysql test.employee and other in hive default.dept
I want to pass empid of test.employee table as a parameter to query in hive table and store data into HDFS
ExecuteSQL -> select empid from test.employee (gives 10 records)
SelectHiveQL -> SELECT * FROM default.dept where empid = ${empid} (should retrieve 10 records)
image description here
You could do the following:
- ExecuteSQL - to retrieve the employee records
- ConvertAvroToJson - for later processing of empid
- SplitJson - to split into one flow file per row
- EvaluateJsonPath - to get the value of empid into a flow file attribute
- ReplaceText - to set the content to the HiveQL statement (using expression language as you have done above)
- SelectHiveQL - to get the department records
Note that this executes a Hive SELECT for each of the empid values, so each execution of SelectHiveQL will produce a single record. I'm not sure (given HiveQL semantics for the IN clause, for example) how to get a single HiveQL statement since it's kind of a join between a "table of constants" and the Hive table, not to mention the NiFi processing is more complex as you won't want the SplitJson and would likely have to process all the records at once (with ExecuteScript, e.g.)
ExecuteSQL ->SplitAvro ->ConvertAvroToJson -> EvaluateJsonPath -> ReplaceText ->SelectHiveQL -> PutHDFS
ExecuteSQL ==> Query -> select empid from employees
EvaluateHsonPath ==> destination ->flowfile-attribute
==> empid = $.empid
ReplaceText ==> ReplacementValue -> ${empid}
SelectHiveQL -> select * from dept where empid = ${empid}