I have a CSV file that has fields that contain newlines e.g.:
A, B, C, D, E, F
123, 456, tree
, very, bla, indigo
(In this case third field in the second row is "tree\n"
I tried the following:
import csv
catalog = csv.reader(open('test.csv', 'rU'), delimiter=",", dialect=csv.excel_tab)
for row in catalog:
print "Length: ", len(row), row
and the result I got was this:
Length: 6 ['A', ' B', ' C', ' D', ' E', ' F']
Length: 3 ['123', ' 456', ' tree']
Length: 4 [' ', ' very', ' bla', ' indigo']
Does anyone have any idea how I can quickly remove extraneous newlines?
Thanks!
Suppose you have this Excel spreadsheet:
Note:
Saving that as CSV in Excel, you will get this csv file:
Assumably, you will want to read that into Python with the blank cells still having meaning and the embedded comma treated correctly.
So, this:
correctly produces the 4x4 List of List matrix represented in Excel:
The example CSV file you posted lacks quotes around the field with an 'extra newline' rendering the meaning of that newline ambiguous. Is it a new row or a multi-line field?
Therefor, you can only interpret this csv file:
as a one dimension list like so:
Which produces this one dimensional list:
This can then be interpreted and regrouped into any sub grouping as you wish.
The idiomatic regrouping method in python uses zip like so:
Or, if you want a list of lists, this is also idiomatic:
If you can change how your CSV file is created, it will be less ambiguous to interpret.
If you know the number of columns, the best way is to ignore end of lines and then split.
Something like this
You can convert it easily into a generator if you prefer:
This works with the CSV module and cleans blank fields and lines:
Prints:
If you want that in 6 col chunks:
Prints:
This will work if you have non blanks cells
Output:
If you have blank cells in the input, this will work most of the time:
Output:
However even the second solution will fail with input such
In this case the input is ambiguous and no algorithm will be able to guess if you meant:
(or the input give above)
If this could be the case for you, you'll have to go back to the person saving the data and make them save it in cleaner format (btw open office quotes newlines in CSV files far better then Excel).
This should work. (Warning: Brain compiled code)
If number of fields in each row is the same and fields can't be empty:
* grouper recipe
Output