Disclaimer: I am new to programming and scripting in general so please excuse the lack of technical terms
So i have two text file data sets that contain names listed:
First File | Second File
bob | bob
mark | mark
larry | bruce
tom | tom
I would like to run a script (pref python) that outputs the intersection lines in one text file and the different lines in another text file, ex:
matches.txt:
bob
mark
tom
differences.txt:
bruce
How would I accomplish this with Python? Or with a Unix command line, if it's easy enough?
Unix shell solution-:
Python dictionaries are O(1) or very close, in other words they are very fast (but they use lots of memory if the files you're indexing are large). So first read in the first file and build a dictionary something like:
The list comprehension and strip() is required because readlines hands you the lines with the trailing newline intact. This creates a list of all items in the file, assuming one per line (use .split if they are all on one line).
Now build a dict:
This builds a dictionary with the items in the list as keys. This also deals with duplicates. Now iterate through the second file and check if the key is in the dict:
sort | uniq is good, but comm might be even better. "man comm" for more information.
From the manual page:
You can also use the Python set type, but comm is easier.
something like that at least