Compare a column between 2 csv files and write dif

2019-07-13 10:39发布

I am trying to print out the differences by comparing a column between 2 csv files.

CSV1:

SERVER,   FQDN,   IP_ADDRESS,  
serverA, device1.com, 10.10.10.1  
serverA,device2.com,10.11.11.1  
serverC,device3.com,10.12.12.1   
and so on..

CSV2:

FQDN, IP_ADDRESS, SERVER,  LOCATION  
device3.com,10.12.12.1,serverC,xx  
device679.com,20.3.67.1,serverA,we  
device1.com,10.10.10.1,serverA,ac  
device345.com,192.168.2.0,serverA,ad  
device2.com,192.168.6.0,serverB,af  
and so on...

What I am looking to do is to compare the FQDN column and write the differences to a new csv output file. So my output would look something like this:

Output.csv:

FQDN, IP_ADDRESS, SERVER, LOCATION  
device679.com,20.3.67.1,serverA,we  
device345.com,192.168.2.0,serverA,ad  
and so on..

I have tried, but not able to get the output.

This is my Code, please tell me where i am going wrong;

import csv

data = {}  # creating list to store the data

with open('CSV1.csv', 'r') as lookuplist:
 reader1 = csv.reader(lookuplist)
 for col in reader1:
    DATA[col[0]] = col[1]

with open('CSV2.csv', 'r') as csvinput, open('Output.csv', 'w', newline='') as f_output:
 reader2 = csv.reader(csvinput)
 csv_output = csv.writer(f_output)
 fieldnames = (['FQDN', 'IP_ADDRESS', 'SERVER'])
 csv_output.writerow(fieldnames)  # prints header to the output file

    for col in reader1:
     if col[1] not in reader2:
        csv_output.writerow(col)  

(EDIT) This is another approach that I have used:

import csv

f1 = (open("CSV1.csv"))
f2 = (open("CSV2.csv"))

csv_f1 = csv.reader(f1)
csv_f2 = csv.reader(f2)

for col1, col2 in zip(csv_f1, csv_f2):
    if col2[0] not in col1[1]:
    print(col2[0])

Basically, here I am only trying to find out first whether the unmatched FQDNs are printed or not. But it is printing out the whole CSV1 column instead. Please help guys, lot of research has went into this, but found no luck yet! :(

3条回答
淡お忘
2楼-- · 2019-07-13 11:10

This code uses the built-in difflib to spit out the lines from file1.csv that don't appear in file2.csv and vice versa.

I use the Differ object for identifying line changes. I assumed that you would not regard line swapping as a difference, that's why I added the sorted() function call.

from difflib import Differ
csv_file1 = sorted(open("file1.csv", 'r').readlines())
csv_file2 = sorted(open("file2.csv", 'r').readlines())
with open("diff.csv", 'w') as f:
    for line in Differ().compare(csv_file1,csv_file2)):
        dmode, line = line[:2], line[2:]
        if dmode.strip() == "":
            continue
        f.write(line + "\n")

Note that if the line differs somehow (not only in the FQDN column) it would appear in diff.csv

查看更多
beautiful°
3楼-- · 2019-07-13 11:15
import csv

data = {}  # creating list to store the data

with open('CSV1.csv', 'r') as lookuplist, open('CSV2.csv', 'r') as csvinput, open('Output.csv', 'w') as f_output:
 reader1 = csv.reader(lookuplist)
 reader2 = csv.reader(csvinput)
 csv_output = csv.writer(f_output)
 fieldnames = (['FQDN', 'IP_ADDRESS', 'SERVER', 'LOCATION'])
 csv_output.writerow(fieldnames)  # prints header to the output file
 _tempFqdn = []
 for i,dt in enumerate(reader1):
     if i==0:
         continue
     _tempFqdn.append(dt[1].strip())
 for i,col in enumerate(reader2):
     if i==0:
         continue
     if col[0].strip() not in _tempFqdn:
         csv_output.writerow(col)
查看更多
甜甜的少女心
4楼-- · 2019-07-13 11:31
import csv

data = {}  # creating dictionary to store the data

with open('CSV1.csv', 'r') as lookuplist:
reader1 = csv.reader(lookuplist)
for col in reader1:
    data[col[1]] = col[1]  # stores the data from column 0 to column 1 in the data list

with open('CSV2.csv', 'r') as csvinput, open('Output.csv', 'w', newline='') as f_output:
reader2 = csv.reader(csvinput)
csv_output = csv.writer(f_output)
fieldnames = (['SERVER', 'FQDN', 'AUTOMATION_ADMINISTRATOR', 'IP_ADDRESS', 'PRIMARY_1', 'MHT_1', 'MHT_2',
               'MHT_3'])
csv_output.writerow(fieldnames)  # prints header to the output file

for col in reader2:
    if col[0] not in data:  # if the column 1 in CSV1 does not match with column 0 in CSV2 Extract
    col = [col[0]]


        csv_output.writerow(col)  # writes all the data that is matched in CMDB WLC Extract

So basically, I only had to change 'not in' under 'for loop' and change the columns in the data list that will be reading from the CSV1 file that I am creating.

查看更多
登录 后发表回答