CSV.read Illegal quoting in line x

2019-01-22 01:29发布

I am using ruby CSV.read with massive data. From time to time the library encounters poorly formatted lines, for instance:

"Illegal quoting in line 53657."

It would be easier to ignore the line and skip it, then to go through each csv and fix the formatting. How can I do this?

3条回答
冷血范
2楼-- · 2019-01-22 02:16

I had this problem in a line like 123,456,a"b"c

The problem is the CSV parser is expecting ", if they appear, to entirely surround the comma-delimited text.

Solution use a quote character besides " that I was sure would not appear in my data:

CSV.read(filename, :quote_char => "|")

查看更多
SAY GOODBYE
3楼-- · 2019-01-22 02:27

Don't let CSV both read and parse the file.

Just read the file yourself and hand each line to CSV.parse_line, and then rescue any exceptions it throws.

查看更多
看我几分像从前
4楼-- · 2019-01-22 02:34

The liberal_parsing option is available starting in Ruby 2.4 for cases like this. From the documentation:

When set to a true value, CSV will attempt to parse input not conformant with RFC 4180, such as double quotes in unquoted fields.

To enable it, pass it as an option to the CSV read/parse/new methods:

CSV.read(filename, liberal_parsing: true)
查看更多
登录 后发表回答