Python: Extracting lines from a file using another

2019-07-21 03:25发布

问题:

I have a 'key' file that looks like this (MyKeyFile):

afdasdfa ghjdfghd wrtwertwt asdf (these are in a column, but I never figured out the formatting, sorry)

I call these keys and they are identical to the first word of the lines that I want to extract from a 'source' file. So the source file (MySourceFile) would look something like this (again, bad formatting, but 1st column = the key, following columns = data):

afdasdfa (several tab delimited columns) . . ghjdfghd ( several tab delimited columns) . wrtwertwt . . asdf

And the '.' would indicate lines of no interest currently.

I am an absolute novice in Python and this is how far I've come:

with open('MyKeyFile','r') as infile, \
open('MyOutFile','w') as outfile:
    for line in infile:
        for runner in source:
            # pick up the first word of the line in source
            # if match, print the entire line to MyOutFile
            # here I need help
outfile.close()

I realize there may be better ways to do this. All feedback is appreciated - along my way of solving it, or along more sophisticated ones.

Thanks jd

回答1:

As I understood (corrent me in the comments if I am wrong), you have 3 files:

  1. MySourceFile
  2. MyKeyFile
  3. MyOutFile

And you want to:

  1. Read keys from MyKeyFile
  2. Read source from MySourceFile
  3. Iterate over lines in the source
  4. If line's first word is in keys: append that line to MyOutFile
  5. Close MyOutFile

So here is the Code:

with open('MySourceFile', 'r') as sourcefile:
    source = sourcefile.read().splitlines()

with open('MyKeyFile', 'r') as keyfile:
    keys = keyfile.read().split()

with open('MyOutFile', 'w') as outfile:
    for line in source:
        if line.split():
            if line.split()[0] in keys:
                outfile.write(line + "\n")
outfile.close()


回答2:

I think that this would be a cleaner way of doing it, assuming that your "key" file is called "key_file.txt" and your main file is called "main_file.txt"

keys = []
my_file = open("key_file.txt","r") #r is for reading files, w is for writing to them.
for line in my_file.readlines():
    keys.append(str(line)) #str() is not necessary, but it can't hurt
#now you have a list of strings called keys. 
#take each line from the main text file and check to see if it contains any portion of a given key. 

my_file.close()
new_file = open("main_file.txt","r")
for line in new_file.readlines():
    for key in keys:
        if line.find(key) > -1: 
            print "I FOUND A LINE THAT CONTAINS THE TEXT OF SOME KEY", line

You can modify the print function or get rid of it to do what you want with the desired line that contains the text of some key. Let me know if this works