Diff and intersection reporting between two text f

Disclaimer: I am new to programming and scripting in general so please excuse the lack of technical terms

So i have two text file data sets that contain names listed:

First File | Second File
bob        | bob
mark       | mark
larry      | bruce
tom        | tom

I would like to run a script (pref python) that outputs the intersection lines in one text file and the different lines in another text file, ex:

matches.txt:

bob 
mark 
tom

differences.txt:

bruce

How would I accomplish this with Python? Or with a Unix command line, if it's easy enough?

标签： python list shell compare

5条回答

啃猪蹄的小仙女

2楼-- · 2019-04-11 00:12

Unix shell solution-:

# duplicate lines
sort text1.txt text2.txt | uniq -d

# unique lines
sort text1.txt text2.txt | uniq -u

0人赞添加讨论(0) 举报

霸刀☆藐视天下

3楼-- · 2019-04-11 00:20

>>> with open('first.txt') as f1, open('second.txt') as f2:
        w1 = set(f1)
        w2 = set(f2)


>>> with open('matches.txt','w') as fout1, open('differences.txt','w') as fout2:
        fout1.writelines(w1 & w2)
        fout2.writelines(w2 - w1)


>>> with open('matches.txt') as f:
        print f.read()


bob
mark
tom
>>> with open('differences.txt') as f:
        print f.read()


bruce

0人赞添加讨论(0) 举报

别忘想泡老子

4楼-- · 2019-04-11 00:21

Python dictionaries are O(1) or very close, in other words they are very fast (but they use lots of memory if the files you're indexing are large). So first read in the first file and build a dictionary something like:

left = [x.strip() for x in open('left.txt').readlines()]

The list comprehension and strip() is required because readlines hands you the lines with the trailing newline intact. This creates a list of all items in the file, assuming one per line (use .split if they are all on one line).

Now build a dict:

ldi = dict.fromkeys(left)

This builds a dictionary with the items in the list as keys. This also deals with duplicates. Now iterate through the second file and check if the key is in the dict:

matches = open('matches.txt', 'w')
uniq = open('uniq.txt', 'w')
for l in open('right.txt').readlines():
    if l.strip() in ldi:
        # write to matches
        matches.write(l)
    else:
        # write to uniq
        uniq.write(l)
matches.close()
uniq.close()

0人赞添加讨论(0) 举报

一纸荒年 Trace。

5楼-- · 2019-04-11 00:31

sort | uniq is good, but comm might be even better. "man comm" for more information.

From the manual page:

EXAMPLES
       comm -12 file1 file2
              Print only lines present in both file1 and file2.

       comm -3 file1 file2
              Print lines in file1 not in file2, and vice versa.

You can also use the Python set type, but comm is easier.

0人赞添加讨论(0) 举报

混吃等死

6楼-- · 2019-04-11 00:34

words1 = set(open("some1.txt").read().split())
words2 = set(open("some2.txt").read().split())

duplicates  = words1.intersection(words2)
uniques = words1.difference(words2).union(words2.difference(words1))

print "Duplicates(%d):%s"%(len(duplicates),duplicates)
print "\nUniques(%d):%s"%(len(uniques),uniques)

something like that at least

0人赞添加讨论(0) 举报

Diff and intersection reporting between two text f

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间