I have a text file containing tabular data. What I need to do is automate the task of writing to a new text file that is comma delimited instead of space delimited, extract a few columns from existing data, reorder the columns.
This is a snippet of the first 4 lines of the original data:
Number of rows: 8542 Algorithm |Date |Time |Longitude |Latitude |Country 1 2000-01-03 215926.688 -0.262 35.813 Algeria 1 2000-01-03 215926.828 -0.284 35.817 Algeria
Here is what I want in the end:
Longitude,Latitude,Country,Date,Time -0.262,35.813,Algeria,2000-01-03,215926.688
Any tips on how to approach this?
str.split()
without any arguments will split by any length of whitespace.operator.itemgetter()
takes multiple arguments, and will return a tuple.I guess the file is separated by tabs, not spaces.
If so, you can try something like:
This code is untested, any bug is left for you as exercise.
I guess the important idea is that you have to use '\t' as the delimiter @Paulo Scardine.
I just wanted to add that pandas is a very good library for handling column data.
You could use the
csv
module and a reader with the' '
delimiter to read your data in, and use the a writer from the same module (with a comma delimiter) to produce the output.In fact, the first example in the
csv
module documentation usesdelimiter=' '
.You can use a
DictReader
/DictWriter
and specify the order of the columns in its constructor (fieldnames
list: different for reader/writer if you want to re-order) to output the entries in the order you wish.(You may need to skip/ignore your first two rows when producing the output.)
EDIT:
Here is an example for dealing with multi-word country names:
Use the
restkey=
and concatenate the dict entry for that value, which is a list of what's left over (hererestkey='rest'
). This prints: