Universal newline support in Ruby that includes \\

2019-04-24 09:24发布

问题:

In a Rails app, I'm accepting and parsing CSV files that may come formatted with any of three possible line termination characters: \n (LF), \r\n (CR+LF), or \r (CR). Ruby's File and CSV libraries seem to handle the first two cases just fine, but the last case ("Mac classic" \r line endings) isn't handled as a newline. It's important to be able to accept this format as well as the others, since Microsoft Excel for Mac (running on OS X) seems to use it when exporting to "Comma Separated Values" (although exporting to "Windows Comma Separated" produces the easier-to-handle \r\n).

Python has "universal newline support" and will handle any of these three formats without a problem. Is there something similar in Ruby that will accept all three without knowing the format in advance?

回答1:

You could use :row_sep => :auto:

:row_sep
The String appended to the end of each row. This can be set to the special :auto setting, which requests that CSV automatically discover this from the data. Auto-discovery reads ahead in the data looking for the next "\r\n", "\n", or "\r" sequence.

There are some caveats of course, see the manual linked to above for details.

You could also manually clean up the EOLs with a bit of gsubing before handing the data to CSV for parsing. I'd probably take this route and manually convert all \r\ns and \rs to single \ns before attempting to parse the CSV. OTOH, this won't work that well if there is embedded binary data in your CSV where \rs mean something. On the gripping hand, this is CSV we're dealing with so who knows what sort of crazy broken nonsense you'll end up dealing with.