I am trying to run an EMR scalding job and the Scala code is suppose to fetch the content of a text file located in an S3 bucket. The scala.io.source
library is messing up with the correct location of the S3 path.
I am giving the parameter runidfile to the EMR job :
--runidfile s3://my-bucket/input.txt
The scala code does the following :
val runid_path = args("runidfile")
val runid = Source.fromFile(runid_path).getLines().mkString
The code somehow doesn't accept the "//" in the S3 path and I get an error:
Caused by: java.io.FileNotFoundException: s3:/my-bucket/input.txt (No such file or directory)
at java.io.FileInputStream.open(Native Method)
at java.io.FileInputStream.(FileInputStream.java:146)
at scala.io.Source$.fromFile(Source.scala:90)
at scala.io.Source$.fromFile(Source.scala:75)
at scala.io.Source$.fromFile(Source.scala:53)
at com.move.scalding.userEvents.RecommenderValidator.(RecommenderValidator.scala:37)
Is there any solution or a workaround to this? I tried using Source.fromURL
, but S3 is not a valid protocol so it doesn't accept it.