Ruby: How can I process a CSV file with “bad comma

I need to process a CSV file from FedEx.com containing shipping history. Unfortunately FedEx doesn't seem to actually test its CSV files as it doesn't quote strings that have commas in them.

For instance, a company name might be "Dog Widgets, Inc." but the CSV doesn't quote that string, so any CSV parser thinks that comma before "Inc." is the start of a new field.

Is there any way I can reliably parse those rows using Ruby?

The only differentiating characteristic that I can find is that the commas that are part of a string have a space after then. Commas that separate fields have no spaces. No clue how that helps me parse this, but it is something I noticed.

标签： ruby parsing csv

4条回答

倾城　Initia

2楼-- · 2020-04-21 04:06

Well, here's an idea: You could replace each instance of comma-followed-by-a-space with a unique character, then parse the CSV as usual, then go through the resulting rows and reverse the replace.

0人赞添加讨论(0) 举报

来，给爷笑一个

3楼-- · 2020-04-21 04:07

you can use a negative lookahead

>> "foo,bar,baz,pop, blah,foobar".split(/,(?![ \t])/)
=> ["foo", "bar", "baz", "pop, blah", "foobar"]

0人赞添加讨论(0) 举报

beautiful°

4楼-- · 2020-04-21 04:07

Perhaps something along these lines..

using gsub to change the ', ' to something else

ruby-1.9.2-p0 > "foo,bar,baz,pop, blah,foobar".gsub(/,\ /,'| ').split(',')
[
    [0] "foo",
    [1] "bar",
    [2] "baz",
    [3] "pop| blah",
    [4] "foobar"
]

and then remove the | after words.

0人赞添加讨论(0) 举报

Ridiculous、

5楼-- · 2020-04-21 04:18

If you are so lucky as to only have one field like that, you can parse the leading fields off the start, the trailing fields off than end and assume whatever is left is the offending field. In python (no habla ruby) this would look something like:

fields = line.split(',') # doesn't work if some fields are quoted
fields = fields[:5] + [','.join(fields[5:-3])] + fields[-3:]

Whatever you do, you should be able at a minimum determine the number of offending commas and that should give you something (a sanity check if nothing else).

0人赞添加讨论(0) 举报

Ruby: How can I process a CSV file with “bad comma

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间