I have a CSV file, which contains a data matrix. The first column of this matrix contains a label and the other columns contain values, which are associated to the label (i.e. to the first column). Now I want to read this CSV file and put the data into a Map[String,Array[String]] in Scala. The key of the Map should be the label (this in the first column) and the Map values should be the other values (these one in the rest of the columns). To read the CSV file I use opencsv.
val isr: InputStreamReader = new InputStreamReader(getClass.getResourceAsStream("test.csv"))`
val data: IndexedSeq[Array[String]] = new CSVReader(isr).readAll.asScala.toIndexedSeq`
Now I have all data in an IndexedSeq[Array[String]]
. Can I use this functional way here or should I better chose an iterative way, because it can get complex to read all data at once? Well, now I need to create the Map from this IndexedSeq. Therefor I map the IndexedSeq to an IndexedSeq of Tupel[String,Array[String]]
to seperate the label value from the rest of the values and then I create the Map from this.
val result: Map[String, Array(String) = data.filter(e => !e.isEmpty).map(e => (e.head,e.tail)).toMap
This works for small examples but when I use it to read the content of my CSV file it throws a java.lang.RuntimeException. I also tried to create the map with a groupBy or to create several Maps (one for each line) and to reduce them afterwards to one big Map, but without success. I also read another post on stackoverflow and somebody assumes that toMap has a complexity of O(n²). I got this at the end of my StackTrace (whole Stacktrace is quite long).
Exception in thread "main" java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:601)
at org.jetbrains.plugins.scala.testingSupport.specs2.JavaSpecs2Runner.runSingleTest(JavaSpecs2Runner.java:130)
at org.jetbrains.plugins.scala.testingSupport.specs2.JavaSpecs2Runner.main(JavaSpecs2Runner.java:76)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:601)
at com.intellij.rt.execution.application.AppMain.main(AppMain.java:120)
Caused by: java.lang.RuntimeException: can not create specification: com.test.MyClassSpec
at scala.sys.package$.error(package.scala:27)
at org.specs2.specification.SpecificationStructure$.createSpecification(BaseSpecification.scala:96)
at org.specs2.runner.ClassRunner.createSpecification(ClassRunner.scala:64)
at org.specs2.runner.ClassRunner.start(ClassRunner.scala:35)
at org.specs2.runner.ClassRunner.main(ClassRunner.scala:28)
at org.specs2.runner.NotifierRunner.main(NotifierRunner.scala:24)
... 11 more
Process finished with exit code 1
Does anybody know another way to create a Map from the data in a CSV file?
Not quite what you asked for but here's how to do it using my own dogfood:
product-collections
This worked for me:
The
split
Breaks up each line of the CSV file in simple records. Thecount
is only there to check if the file is really read.So now we can use this to read in a real CSV file (although I only tested it with a small file):
This works quite well with simple CSV files. If you have more complex ones (e.g. entries spilt over several lines), you might have to use a more complex CSV parser (e.g. Apache Commons CSV. But usually sucha aperser will also give you some kind of iterator and you can use the same
map(... zip ...)
function on it.You could skip the intermediary
List
of tuple and just build the map directly like this:Not sure if this will fix your issue though, but you did ask if there was another way to build the map. You can read more about
collection.breakOut
here:Scala: List[Tuple3] to Map[String,String]