How can I merge CSV rows that have the same value

2019-09-05 21:28发布

问题:

This is the file: https://drive.google.com/file/d/0B5v-nJeoVouHc25wTGdqaDV1WW8/view?usp=sharing

As you can see, there are duplicates in the first column, but if I were to combine the duplicate rows, no data would get overridden in the other columns. Is there any way I can combine the rows with duplicate values in the first column?

For example, turn "1,A,A,," and "1,,,T,T" into "1,A,A,T,T".

回答1:

Plain Python:

import csv

reader = csv.Reader(open('combined.csv'))
result = {}

for row in reader:
    idx = row[0]
    values = row[1:]
    if idx in result:
        result[idx] = [result[idx][i] or v for i, v in enumerate(values)]
    else:
        result[idx] = values

How this magic works:

  • iterate over rows in the CSV file
  • for every record, we check if there was a record with the same index before
  • if this is the first time we see this index, just copy the row values
  • if this is a duplicate, assign row values to empty cells.

The last step is done via or trick: None or value will return value. value or anything will return value. So, result[idx][i] or v will return existing value if it is not empty, or row value.

To output this without loosing the duplicated rows, we need to keep index, then iterate and output corresponding result entries:

indices = []
for row in reader:
    # ...
    indices.append(idx)

writer = csv.writer(open('outfile.csv', 'w'))
for idx in indices:
    writer.writerow([idx] + result[idx])