The result of correlation in Spark MLLib is a of type org.apache.spark.mllib.linalg.Matrix. (see http://spark.apache.org/docs/1.2.1/mllib-statistics.html#correlations)
val data: RDD[Vector] = ...
val correlMatrix: Matrix = Statistics.corr(data, "pearson")
I would like to save the result into a file. How can I do this?
The answer by Dylan Hogg was great, to enhance it slightly, add a column index. (In my use case, once I created a file and downloaded it, it was not sorted due to the nature of parallel process etc.)
ref: https://www.safaribooksonline.com/library/view/scala-cookbook/9781449340292/ch10s12.html
substitute with this line and it will put a sequence number on the line (starting w/ 0) making it easier to sort when you go to view it
As Matrix is Serializable, you can write it using normal Scala.
You can find an example here.
Thank you for your suggestion. I came out with this solution. Thanks to Ignacio for his suggestions
Here is a simple and effective approach to save the Matrix to hdfs and specify the separator.
(The transpose is used since .toArray is in column major format.)