In a Rails app, I'm accepting and parsing CSV files that may come formatted with any of three possible line termination characters: \n
(LF
), \r\n
(CR+LF
), or \r
(CR
). Ruby's File
and CSV
libraries seem to handle the first two cases just fine, but the last case ("Mac classic" \r
line endings) isn't handled as a newline. It's important to be able to accept this format as well as the others, since Microsoft Excel for Mac (running on OS X) seems to use it when exporting to "Comma Separated Values" (although exporting to "Windows Comma Separated" produces the easier-to-handle \r\n
).
Python has "universal newline support" and will handle any of these three formats without a problem. Is there something similar in Ruby that will accept all three without knowing the format in advance?
You could use
:row_sep => :auto
:There are some caveats of course, see the manual linked to above for details.
You could also manually clean up the EOLs with a bit of
gsub
ing before handing the data to CSV for parsing. I'd probably take this route and manually convert all\r\n
s and\r
s to single\n
s before attempting to parse the CSV. OTOH, this won't work that well if there is embedded binary data in your CSV where\r
s mean something. On the gripping hand, this is CSV we're dealing with so who knows what sort of crazy broken nonsense you'll end up dealing with.