I wrote an HTML parser in python used to extract data to look like this in a csv file:
itemA, itemB, itemC, Sentence that might contain commas, or colons: like this,\n
so I used a delmiter ":::::" thinking that it wouldn't be mined in the data
itemA, itemB, itemC, ::::: Sentence that might contain commas, or colons: like this,::::\n
This works for most of the thousands of lines, however, apparently a colon : offset this when I imported the csv in Calc.
My question is, what is the best or a unique delimiter to use when creating a csv with many variations of sentences that need to be separated with some delimiter? Am I understanding delimiters correctly in that they separate the values within a CSV?
Yes, delimiters separate values within each line of a CSV file. There are two strategies to delimiting text that has a lot of punctuation marks. First, you can quote the values, e.g.:
The second strategy is to use tabs (i.e.,
'\t'
).Python's built-in CSV module can both read and write CSV files that use quotes. Check out the example code under the
csv.reader
function. The built-in csv module will handle quotes correctly, e.g. it will escape quotes that are in the value itself.CSV files usually use double quotes
"
to wrap long fields that might contain a field separator like a comma. If the field contains a double quote it's escaped with a backslash:\"
.As I suggested informally in a comment, unique just means you need to use some character that won't be in the data —
chr(255)
might be a good choice. For example:Note: The code shown is for Python 2.x — see comments for a Python 3 version.
Output:
If you're not using the
csv
module and instead are writing and/or reading the data manually, then it would go something like this: