I am using ruby CSV.read with massive data. From time to time the library encounters poorly formatted lines, for instance:
"Illegal quoting in line 53657."
It would be easier to ignore the line and skip it, then to go through each csv and fix the formatting. How can I do this?
I had this problem in a line like
123,456,a"b"c
The problem is the CSV parser is expecting
"
, if they appear, to entirely surround the comma-delimited text.Solution use a quote character besides
"
that I was sure would not appear in my data:CSV.read(filename, :quote_char => "|")
Don't let CSV both read and parse the file.
Just read the file yourself and hand each line to
CSV.parse_line
, and thenrescue
any exceptions it throws.The
liberal_parsing
option is available starting in Ruby 2.4 for cases like this. From the documentation:To enable it, pass it as an option to the CSV read/parse/new methods: