CSV writing strings of text that need a unique del

2020-04-30 03:43发布

I wrote an HTML parser in python used to extract data to look like this in a csv file:

    itemA, itemB, itemC, Sentence that might contain commas, or colons: like this,\n

so I used a delmiter ":::::" thinking that it wouldn't be mined in the data

    itemA, itemB, itemC, ::::: Sentence that might contain commas, or colons: like this,::::\n

This works for most of the thousands of lines, however, apparently a colon : offset this when I imported the csv in Calc.

My question is, what is the best or a unique delimiter to use when creating a csv with many variations of sentences that need to be separated with some delimiter? Am I understanding delimiters correctly in that they separate the values within a CSV?

3条回答
一纸荒年 Trace。
2楼-- · 2020-04-30 03:56

Yes, delimiters separate values within each line of a CSV file. There are two strategies to delimiting text that has a lot of punctuation marks. First, you can quote the values, e.g.:

Value 1, Value 2, "This value has a comma, <- right there", Value 4

The second strategy is to use tabs (i.e., '\t').

Python's built-in CSV module can both read and write CSV files that use quotes. Check out the example code under the csv.reader function. The built-in csv module will handle quotes correctly, e.g. it will escape quotes that are in the value itself.

查看更多
做自己的国王
3楼-- · 2020-04-30 03:58

CSV files usually use double quotes " to wrap long fields that might contain a field separator like a comma. If the field contains a double quote it's escaped with a backslash: \".

查看更多
We Are One
4楼-- · 2020-04-30 04:10

As I suggested informally in a comment, unique just means you need to use some character that won't be in the data —chr(255)might be a good choice. For example:

Note: The code shown is for Python 2.x — see comments for a Python 3 version.

import csv

DELIMITER = chr(255)
data = ["itemA", "itemB", "itemC",
        "Sentence that might contain commas, colons: or even \"quotes\"."]

with open('data.csv', 'wb') as outfile:
    writer = csv.writer(outfile, delimiter=DELIMITER)
    writer.writerow(data)

with open('data.csv', 'rb') as infile:
    reader = csv.reader(infile, delimiter=DELIMITER)
    for row in reader:
        print row

Output:

 ['itemA', 'itemB', 'itemC', 'Sentence that might contain commas, colons: or even "quotes".']

If you're not using thecsvmodule and instead are writing and/or reading the data manually, then it would go something like this:

with open('data.csv', 'wb') as outfile:
    outfile.write(DELIMITER.join(data) + '\n')

with open('data.csv', 'rb') as infile:
    row = infile.readline().rstrip().split(DELIMITER)
    print row
查看更多
登录 后发表回答