Reading input from a file in python 3.x

2019-09-08 19:34发布

问题:

Say you are reading input from a file structured like so

P3
400 200
255
255 255 255
255 0 0
255 0 0
etc...

But you want to account for any mistakes that may come from the input file as in

P3 400
200
255
255 255
255
255 0 0
255 0
0
etc...

I want to read in the first token 'P3' then the next two '400' '200' (height/width) the '255' and from here on, I want to read every token in and account for how they should be in groups of 3. I have the correct code to read this information but I can't seem to get past the wall of figuring out how to read in information by token and not by line.

Which doesn't account for an imperfect input.

回答1:

Here is one way to go about it, using csv module:

import csv
first_four = []
all_of_the_tokens = []
first_four_processed = False

with open('token') as token_file:
    csv_reader = csv.reader(token_file, delimiter=' ')
    for row in csv_reader:
        all_of_the_tokens.extend(row)
        if not first_four_processed:
            first_four.extend(row)
        if len(first_four) >= 4 and not first_four_processed:
            first_four_processed = True
            first_four = first_four[:4]
token_file.close()

rest_of_the_tokens = all_of_the_tokens[4:]

for i in range(0, len(rest_of_the_tokens), 3):
    print rest_of_the_tokens[i:i+3]


回答2:

If your file consists of groups of three values (after the first P3 item) and you cannot rely upon the line breaks to have them grouped properly, I suggest reading the file as a single string and doing the splitting and grouping yourself. Here's a straight-forward way:

with open(filename) as f:
    text = f.read()    # get the file contents as a single string

tokens = text.split()  # splits the big string on any whitespace, returning a list
it = iter(tokens)      # start an iterator over the list
prefix = next(it)      # grab the "P3" token off the front
triples = list(zip(it, it it))  # make a list of 3-tuples from the rest of the tokens

Using zip on multiple references to the same iterator is the key trick here. If you needed to handle other group sizes with the same code, you could use zip(*[it]*grouplen).

Note that this will discard any left-over values at the end of the file if they don't form a group of three. If you need to handle that situation differently, I suggest using zip_longest from the itertools module, rather than the regular zip function. (See the grouper recipe in the itertools documentation.)