How to use isin function with values from text fil

I'd like to filter a dataframe using an external file.

This is how I use the filter now:

val Insert = Append_Ot.filter(
  col("Name2").equalTo("brazil") ||
  col("Name2").equalTo("france") ||
  col("Name2").equalTo("algeria") ||
  col("Name2").equalTo("tunisia") ||
  col("Name2").equalTo("egypte"))

Instead of using hardcoded string literals, I'd like to create an external file with the values to filter by.

So I create this file:

val filter_numfile = sc.textFile("/user/zh/worskspace/filter_nmb.txt")
  .map(_.split(" ")(1))
  .collect

This gives me:

filter_numfile: Array[String] = Array(brazil, france, algeria, tunisia, egypte)

And then, I use isin function on Name2 column.

val Insert = Append_Ot.where($"Name2".isin(filter_numfile: _*))

But this gives me an empty dataframe. Why?

I am just adding some information to philantrovert answer in filter dataframe from external file

His answer is perfect but there might be some case unmatch so you will have to check for case mismatch as well

tl;dr Make sure that the letters use consistent case, i.e. they are all in upper or lower case. Simply use upper or lower standard functions.

lets say you have input file as

1 Algeria
2 tunisia
3 brazil
4 Egypt

you read the text file and change all the countries to lowercase as

val countries = sc.textFile("path to input file").map(_.split(" ")(1).trim)
  .collect.toSeq
val array = Array(countries.map(_.toLowerCase) : _*)

Then you have your dataframe

val Append_Ot = sc.parallelize(Seq(("brazil"),("tunisia"),("algeria"),("name"))).toDF("Name2")

where you apply following condition

import org.apache.spark.sql.functions._
val Insert = Append_Ot.where(lower($"Name2").isin(array : _* ))

you should have output as

+-------+
|Name2  |
+-------+
|brazil |
|tunisia|
|algeria|
+-------+

The empty dataframe might be due to spelling mismatch too.

How to use isin function with values from text fil

问题:

回答1:

收藏的人(0)

How to use isin function with values from text fil

问题:

回答1:

收藏的人(0)

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮