i want to filter my dataframe from an external file.
this is how my dataframe look like:
val Insert=Append_Ot.filter(col("Name2").equalTo("brazil") || col("Name2").equalTo("france") || col("Name2").equalTo("algeria")|| col("Name2").equalTo("tunisia") || col("Name2").equalTo("egypte") )
The number of countries that i want to filter them is changeable, so created an external this file:
1 brazil
2 france
3 algeria
4 tunisia
5 egypte
i want to create UDF to filter my dataframe from this file.
Thank you
You need to create a Seq
from the file with which you want to filter.
Something that looks like this:
val l = List("Brasil", "Algeria", "Tunisia", "Egypt")
You can use textFile
method. Suppose your file contains:
1 Algeria
2 Tunisia
3 Brasil
4 Egypt
You can use:
val countries = sc.textFile("hdfs://namenode/user/cloudera/file").map(_.split(" ")(1)).collect
which will give you:
countries : Array[String] = Array(Algeria, Tunisia, Brasil, Egypt)
And then, use the isin
function on your column Name2
val Insert = Append_Ot.where($"Name2".isin( countries : _* ) )