Counting jump(no of lines) between first two '

2020-04-17 06:04发布

问题:

I have a huge data file with a specific string being repeated after a defined number of lines.

counting jump between first two 'Rank' occurrences. For example the file looks like this:

  1 5 6 8 Rank                     line-start
  2 4 8 5
  7 5 8 6
  5 4 6 4
  1 5 7 4 Rank                     line-end  
  4 8 6 4
  2 4 8 5
  3 6 8 9
  5 4 6 4 Rank

You can notice that the string Rank is repeated every 3rd line. So the number of lines in a block is 4 for the above example. My Question is how do i get the number of lines using python readline().

I currently follow this:

data = open(filename).readlines()
count = 0
for j in range(len(data)):
  if(data[j].find('Rank') != -1): 
    if count == 0: line1 = j
    count = count +1 
  if(count == 2):
    no_of_lines = j - line1
    break

Any improvements or suggestions welcome.

回答1:

I assume you want to find the number of lines in a block where each block starts with a line that contains 'Rank' e.g., there are 3 blocks in your sample: 1st has 4 lines, 2nd has 4 lines, 3rd has 1 line:

from itertools import groupby

def block_start(line, start=[None]):
    if 'Rank' in line:
       start[0] = not start[0]
    return start[0]

with open(filename) as file:
     block_sizes = [sum(1 for line in block) # find number of lines in a block
                    for _, block in groupby(file, key=block_start)] # group
print(block_sizes)
# -> [4, 4, 1]

If all blocks have the same number of lines or you just want to find number of lines in the first block that starts with 'Rank':

count = None
with open(filename) as file:
     for line in file:
         if 'Rank' in line:
             if count is None: # found the start of the 1st block
                count = 1
             else: # found the start of the 2nd block
                break
         elif count is not None: # inside the 1st block
             count += 1
print(count) # -> 4


回答2:

Don't use .readlines() when a simple generator expression counting the lines with Rank is enough:

count = sum(1 for l in open(filename) if 'Rank' not in l)

'Rank' not in l is enough to test if the string 'Rank' is not present in a string. Looping over the open file is looping over all the lines. The sum() function will add up all the 1s, which are generated for each line not containing Rank, giving you a count of lines without Rank in them.

If you need to count the lines from Rank to Rank, you need a little itertools.takewhile magic:

import itertools
with open(filename) as f:
    # skip until we reach `Rank`:
    itertools.takewhile(lambda l: 'Rank' not in l, f)
    # takewhile will have read a line with `Rank` now
    # count the lines *without* `Rank` between them
    count = sum(1 for l in itertools.takewhile(lambda l: 'Rank' not in l, f)
    count += 1  # we skipped at least one `Rank` line.


回答3:

counting jump between first two 'Rank' occurrences:

def find_jumps(filename):
    first = True
    count = 0
    with open(filename) as f:
        for line in f:
            if 'Rank' in line:
                if first:
                    count = 0 
                    #set this to 1 if you want to include one of the 'Rank' lines.
                    first = False                    
                else:
                    return count
            else:
                count += 1 


回答4:

7 line of codes:

count = 0
for line in open("yourfile.txt"):
    if "Rank" in line: 
        count += 1
        if count > 1: break 
    elif count > 0: count += 1
print count