Finding matches in two files and outputting them

2019-09-09 11:37发布

问题:

I want to use x[#] from first file and x[#] from second file, I want to see if those two values match, if they do I want to output those, along with several other x[#] values from the second file, which are on the same line.

The format the files are in :(but there is millions, and I want to find the pairs in the two files because they all should match up)

  line 1  data,data,data,data
  line 2  data,data,data,data

data from file 1:

 (N'068D556A1A665123A6DD2073A36C1CAF', N'A76EEAF6D310D4FD2F0BD610FAC02C04DFE6EB67',    
N'D7C970DFE09687F1732C568AE1CFF9235B2CBB3673EA98DAA8E4507CC8B9A881');

data from file 2:

00000040f2213a27ff74019b8bf3cfd1|index.docbook|Redhat 7.3 (32bit)|Linux
00000040f69413a27ff7401b8bf3cfd1|index.docbook|Redhat 8.0 (32bit)|Linux
00000965b3f00c92a18b2b31e75d702c|Localizable.strings|Mac OS X 10.4|OSX
0000162d57845b6512e87db4473c58ea|SYSTEM|Windows 7 Home Premium (32bit)|Windows
000011b20f3cefd491dbc4eff949cf45|totem.devhelp|Linux Ubuntu Desktop 9.10 (32bit)|Linux

The order it is sorted in is alphanumeric, and I want to use a slider method. By that I mean if file1[x] is < file2[x] move the slider down or up depending on whether one value is greater than the other, until a match is found, when and if so, print the output along with other values that will identify that hash.

What I want as a result would be:

file1[x] and its corresponding match on file2[x] outputted to a file, as well as other file1[x] where x can be any index from the line. values along with other values using an index method.

回答1:

A starting point, add your own salt and pepper it's far from optimal and should use executemany etc...but that's for you to decide.

from StringIO import StringIO
import csv
import sqlite3 as sq3
from operator import methodcaller, itemgetter
from itertools import groupby

data1 = """068D556A1A665123A6DD2073A36C1CAF
A76EEAF6D310D4FD2F0BD610FAC02C04DFE6EB67
D7C970DFE09687F1732C568AE1CFF9235B2CBB3673EA98DAA8E4507CC8B9A881"""

data2 = """00000040f2213a27ff74019b8bf3cfd1|index.docbook|Redhat 7.3 (32bit)|Linux
00000040f69413a27ff7401b8bf3cfd1|index.docbook|Redhat 8.0 (32bit)|Linux
00000965b3f00c92a18b2b31e75d702c|Localizable.strings|Mac OS X 10.4|OSX
0000162d57845b6512e87db4473c58ea|SYSTEM|Windows 7 Home Premium (32bit)|Windows
000011b20f3cefd491dbc4eff949cf45|totem.devhelp|Linux Ubuntu Desktop 9.10 (32bit)|Linux"""

file1 = StringIO(data1)
file2 = StringIO(data2)

db = sq3.connect(':memory:')
db.execute('create table keys (key)')
db.execute('create table details (key, f1, f2, f3)')

for f1data in file1:
    db.execute('insert into keys values(?)', (f1data.strip(),))

for f2data in file2:
    row = map(methodcaller('strip'), f2data.split('|'))
    db.execute('insert into details values (?,?,?,?)', row)

results = db.execute('select * from keys natural join details')

for key, val in groupby(results, itemgetter(0)):
    print key, list(val)