I would like to access csv files in scala in a strongly typed manner. For example, as I read each line of the csv, it is automatically parsed and represented as a tuple with the appropriate types. I could specify the types beforehand in some sort of schema that is passed to the parser. Are there any libraries that exist for doing this? If not, how could I go about implementing this functionality on my own?
相关问题
- Unusual use of the new keyword
- sqlyog export query result as csv
- C#: How do i get 2 lists into one 2-tuple list in
- Get Runtime Type picked by implicit evidence
- What's the point of nonfinal singleton objects
相关文章
- Gatling拓展插件开发,check(bodyString.saveAs("key"))怎么实现
- RDF libraries for Scala [closed]
- Why is my Dispatching on Actors scaled down in Akk
- How to read local csv file in client side javascri
- Given a list and a bitmask, how do I return the va
- How do you run cucumber with Scala 2.11 and sbt 0.
- GRPC: make high-throughput client in Java/Scala
- Symfony : Doctrine data fixture : how to handle la
I built my own idea to strongly typecast the final product, more than the reading stage itself..which as pointed out might be better handled as stage one with something like Apache CSV, and stage 2 could be what i've done. Here's the code you are welcome to it. The idea is to typecast the CSVReader[T] with type T .. upon construction, you must supply the reader with a Factor object of Type[T] as well. The idea here is that the class itself (or in my example a helper object) decides the construction detail and thus decouples this from the actual reading. You could use Implicit objects to pass the helper around but I've not done that here. The only downside is that each row of the CSV must be of the same class type, but you could expand this concept as needed.
next the example Helper Factory and example "Main"
Example CSV (tab seperated.. might need to repair if you copy from an editor)
And finally the writer (notice the factory methods require this as well with "makerow"
I've created a strongly-typed CSV helper for Scala, called object-csv. It is not a fully fledged framework, but it can be adjusted easily. With it you can do this:
Where Person is case class, defined like this:
Read more about it in GitHub, or in my blog post about it.
This is made more complicated than it ought to because of the nontrivial quoting rules for CSV. You probably should start with an existing CSV parser, e.g. OpenCSV or one of the projects called scala-csv. (There are at least three.)
Then you end up with some sort of collection of collections of strings. If you don't need to read massive CSV files quickly, you can just try to parse each line into each of your types and take the first one that doesn't throw an exception. For example,
If you do need to parse them fairly quickly and you don't know what might be there, you should probably use some sort of matching (e.g. regexes) on the individual items. Either way, if there's any chance of error you probably want to use
Try
orOption
or somesuch to package errors.If you know the the # and types of fields, maybe like this?:
You can use kantan.csv, which is designed with precisely that purpose in mind.
Imagine you have the following input:
Using kantan.csv, you could write the following code to parse it:
And you'd get an iterator where each entry is of type
(Int, String, Either[Float, Boolean])
. Note the bit where the last column in your CSV can be of more than one type, but this is conveniently handled withEither
.This is all done in an entirely type safe way, no reflection involved, validated at compile time.
Depending on how far down the rabbit hole you're willing to go, there's also a shapeless module for automated case class and sum type derivation, as well as support for scalaz and cats types and type classes.
Full disclosure: I'm the author of kantan.csv.
If your content has double-quotes to enclose other double quotes, commas and newlines, I would definitely use a library like opencsv that deals properly with special characters. Typically you end up with
Iterator[Array[String]]
. Then you useIterator.map
orcollect
to transform eachArray[String]
into your tuples dealing with type conversions errors there. If you need to do process the input without loading all in memory, you then keep working with the iterator, otherwise you can convert to aVector
orList
and close the input stream.So it may look like this:
Depending on how you need to deal with errors, you can return
Left
for errors andRight
for success tuples to separate the errors from the correct rows. Also, I sometimes wrap of all this using scala-arm for closing resources. So my data maybe wrapped into theresource.ManagedResource
monad so that I can use input coming from multiple files.Finally, although you want to work with tuples, I have found that it is usually clearer to have a case class that is appropriate for the problem and then write a method that creates that case class object from an
Array[String]
.