Below is my flow:
GetFile > ExecuteSparkInteractive > PutFile
I want to read files from GetFile
processor in ExecuteSparkInteractive
processor, apply some transformations and put it in some location. Below is my flow
I wrote spark scala code
under code
section of spark processor:
val sc1=sc.textFile("local_path")
sc1.foreach(println)
There is nothing happening in the flow. So how can I read files in spark processor using GetFile processor.
2nd Part:
I tried below flow just for practice:
ExecuteScript > PutFile > LogMessage
and I have mentioned below code in executescript processor:
readFile = open("/home/cloudera/Desktop/sample/data","r")
for line in readFile:
lines = line.strip()
finalline = re.sub(pattern='((?<=[0-9])[0-9]|(?<=\.)[0-9])',repl='X',string=lines)
readFile = open("/home/cloudera/Desktop/sample/data","w")
readFile.write(finalline)
Code works fine but it doesn't write the formatted data into the destination folder. So where am I going wrong over here. Also, I installed pandas in local machine and ran pandas code from the executescript processor but nifi doesn't read pandas module. Why is it so ? I tried my best. Also, I couldn't find any relevant links for this where I can get basic flow
This is not really how it works... GetFile is picking up files local to the NiFi node and bringing them into the NiFi flow for processing. ExecuteSparkInteractive kicks off a spark job on a remote Spark cluster, it does not transfer data to Spark. So you would likely want to put the data somewhere Spark can access it, maybe GetFile -> PutHDFS -> ExecuteSparkInteractive.