可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效，请关闭广告屏蔽插件后再试):

问题:

I have two tables one in mysql test.employee and other in hive default.dept I want to pass empid of test.employee table as a parameter to query in hive table and store data into HDFS

ExecuteSQL -> select empid from test.employee (gives 10 records)

SelectHiveQL -> SELECT * FROM default.dept where empid = ${empid} (should retrieve 10 records)

image description here

回答1:

You could do the following:

ExecuteSQL - to retrieve the employee records
ConvertAvroToJson - for later processing of empid
SplitJson - to split into one flow file per row
EvaluateJsonPath - to get the value of empid into a flow file attribute
ReplaceText - to set the content to the HiveQL statement (using expression language as you have done above)
SelectHiveQL - to get the department records

Note that this executes a Hive SELECT for each of the empid values, so each execution of SelectHiveQL will produce a single record. I'm not sure (given HiveQL semantics for the IN clause, for example) how to get a single HiveQL statement since it's kind of a join between a "table of constants" and the Hive table, not to mention the NiFi processing is more complex as you won't want the SplitJson and would likely have to process all the records at once (with ExecuteScript, e.g.)

回答2:

ExecuteSQL ->SplitAvro ->ConvertAvroToJson -> EvaluateJsonPath -> ReplaceText ->SelectHiveQL -> PutHDFS

ExecuteSQL       ==> Query -> select empid from employees
EvaluateHsonPath ==> destination ->flowfile-attribute
                 ==> empid = $.empid
ReplaceText      ==> ReplacementValue  -> ${empid}
SelectHiveQL   -> select * from dept where empid = ${empid}