I am using Spark Streaming to fetch tweets from twitter by creating a StreamingContext as :
val ssc = new StreamingContext("local[3]", "TwitterFeed",Minutes(1))
and creating twitter stream as :
val tweetStream = TwitterUtils.createStream(ssc, Some(new OAuthAuthorization(Util.config)),filters)
then saving it as text file
tweets.repartition(1).saveAsTextFiles("/tmp/spark_testing/")
and the problem is that the tweets are being saved as folders based on batch time but I need all the data of each batch in a same folder.
Is there any workaround for it?
Thanks
We can do this using Spark SQL's new DataFrame saving API which allow appending to an existing output. By default, saveAsTextFile, won't be able to save to a directory with existing data (see https://spark.apache.org/docs/latest/sql-programming-guide.html#save-modes ). https://spark.apache.org/docs/latest/streaming-programming-guide.html#dataframe-and-sql-operations covers how to setup a Spark SQL context for use with Spark Streaming.
Assuming you copy the part from the guide with the SQLContextSingleton, The resulting code would look something like:
(Note the above example used JSON to save the result, but you can use different output formats too).