I am trying to process a file which contains a lot of special characters such as German umlauts(ä,ü,o) etc. as follows :
sc.hadoopConfiguration.set("textinputformat.record.delimiter", "\r\n\r\n")
sc.textFile("/file/path/samele_file.txt")
But upon reading the contents, these special characters are not recognized.
I think the default encoding is not in UTF-8 or similar formats. I would like to know if there is a way to set encoding on this textFile method such as:
sc.textFile("/file/path/samele_file.txt",mode="utf-8")`