python nested for-loop not executing beyond first

2019-09-11 10:13发布

问题:

This script is meant to read through a file and take in the number (numA) and the text next to it (sourceA). It then uses this and compares it to every other line in the file. If a match in "nums" is found but not in sources, it writes the num to a file along with the sources it appears in.

with open(sortedNums, "r")as sor:
for line in sor:
    NumsA, sourceA = line.split('####')
    for line in sor:
        if '####' in line:
            NumsB, sourceB = line.split('####')
            if (NumsA == NumsB) & (sourceA != sourceB):
                print("Found reused Nums")
                with open(reusedNums, 'a')as reused:
                    reused.write(NumsA + ' ' + sourceA + ' ' + sourceB)
            print ("setA: " + NumsA + ' ' + sourceA)
            print ("setB: " + NumsB + ' ' + sourceB)

Most of this is working except that it does the full inner loop but only the first iteration of the outer loop

回答1:

You are trying to read twice from the same file. Files use a current position to determine what to read next, and iterating over the remaining lines in the inner loop, you moved that position all the way to the end.

You could 'fix' that by seeking back to the start of the file with:

sor.seek(0)

However, looping over the whole file for every line in that file is really inefficient. Use a dictionary to track if you have seen the same information on a previous line:

with open(sortedNums, "r")as sor, \
     open(reusedNums, 'a') as reused:
    seen = {}
    for line in sor:
        if not '####' in line:
            continue
        nums, source = line.rstrip().split('####')
        if nums in seen and seen[nums] != source:
            print("Found reused Nums")
            reused.write('{} {} {}\n'.format(nums, source, seen[nums]))
        seen[nums] = source

By storing data in a dictionary, you only have to loop over the file once.