converting a space delimited file to a CSV

2020-07-18 05:26发布

I have a text file containing tabular data. What I need to do is automate the task of writing to a new text file that is comma delimited instead of space delimited, extract a few columns from existing data, reorder the columns.

This is a snippet of the first 4 lines of the original data:

Number of rows: 8542
 Algorithm  |Date   |Time   |Longitude  |Latitude   |Country    
 1  2000-01-03  215926.688  -0.262  35.813  Algeria 
 1  2000-01-03  215926.828  -0.284  35.817  Algeria

Here is what I want in the end:

Longitude,Latitude,Country,Date,Time
-0.262,35.813,Algeria,2000-01-03,215926.688

Any tips on how to approach this?

标签: python
4条回答
够拽才男人
2楼-- · 2020-07-18 05:56

str.split() without any arguments will split by any length of whitespace. operator.itemgetter() takes multiple arguments, and will return a tuple.

查看更多
The star\"
3楼-- · 2020-07-18 06:03

I guess the file is separated by tabs, not spaces.

If so, you can try something like:

input_file = open('some_tab_separated_file.txt', 'r')
output_file = open('some_tab_separated_file.csv', 'w')
input_file.readline() # skip first line 
for line in input_file:
    (a, date, time, lon, lat, country) = line.strip().split('\t')
    output_file.write(','.join([lon, lat, country, date, time]) + '\n')
input_file.close()
output_file.close()

This code is untested, any bug is left for you as exercise.

查看更多
一夜七次
4楼-- · 2020-07-18 06:04

I guess the important idea is that you have to use '\t' as the delimiter @Paulo Scardine.

I just wanted to add that pandas is a very good library for handling column data.

>>> src = 'path/to/file'
>>> dest = 'path/to/dest_csv'
>>> column_names = ['names', 'of', 'columns']

>>> df = pd.read_csv(src, delimiter='\t', names=column_names)

# Do something in pandas if you need to

>>> df.to_csv(dest, index=False, sep = ';')
查看更多
Explosion°爆炸
5楼-- · 2020-07-18 06:17

You could use the csv module and a reader with the ' ' delimiter to read your data in, and use the a writer from the same module (with a comma delimiter) to produce the output.

In fact, the first example in the csv module documentation uses delimiter=' '.

You can use a DictReader/DictWriter and specify the order of the columns in its constructor (fieldnames list: different for reader/writer if you want to re-order) to output the entries in the order you wish.

(You may need to skip/ignore your first two rows when producing the output.)

EDIT:

Here is an example for dealing with multi-word country names:

import cStringIO
import csv

f = cStringIO.StringIO("""A B C
1 2 Costa Rica
3 4 Democratic Republic of the Congo
""")

r = csv.DictReader(f, delimiter=' ', restkey='rest')
for row in r:
    if row.get('rest'):
        row['C'] += " %s" % (" ".join(row['rest']))
    print 'A: %s, B: %s, C: %s' % (row['A'], row['B'], row['C'])

Use the restkey= and concatenate the dict entry for that value, which is a list of what's left over (here restkey='rest'). This prints:

A: 1, B: 2, C: Costa Rica
A: 3, B: 4, C: Democratic Republic of the Congo
查看更多
登录 后发表回答