I have a generated file with thousands of lines like the following:
CODE,XXX,DATE,20101201,TIME,070400,CONDITION_CODES,LTXT,PRICE,999.0000,QUANTITY,100,TSN,1510000001
Some lines have more fields and others have fewer, but all follow the same pattern of key-value pairs and each line has a TSN field.
When doing some analysis on the file, I wrote a loop like the following to read the file into a dictionary:
#!/usr/bin/env python
from sys import argv
records = {}
for line in open(argv[1]):
fields = line.strip().split(',')
record = dict(zip(fields[::2], fields[1::2]))
records[record['TSN']] = record
print 'Found %d records in the file.' % len(records)
...which is fine and does exactly what I want it to (the print
is just a trivial example).
However, it doesn't feel particularly "pythonic" to me and the line with:
dict(zip(fields[::2], fields[1::2]))
Which just feels "clunky" (how many times does it iterate over the fields?).
Is there a better way of doing this in Python 2.6 with just the standard modules to hand?
In Python 2 you could use
izip
in theitertools
module and the magic of generator objects to write your own function to simplify the creation of pairs of values for thedict
records. I got the idea forpairwise()
from a similarly named (but functionally different) recipe in the Python 2itertools
docs.To use the approach in Python 3, you can just use plain
zip()
since it does whatizip()
did in Python 2 resulting in the latter's removal fromitertools
— the example below addresses this and should work in both versions.Which can be used like this in your file reading
for
loop:But wait, there's more!
It's possible to create a generalized version I'll call
grouper()
, which again corresponds to a similarly named, but functionally differentitertools
recipe (which is listed right belowpairwise()
):Which could be used like this in your
for
loop:Of course, for specific cases like this, it's easy to use
functools.partial()
and create a similarpairwise()
function with it (which will work in both Python 2 & 3):Postscript
Unless there's a really huge number of fields, you could instead create a actual sequence out of the pairs of line items (rather than using a generator expression which has no
len()
):You could get by with a simpler
grouper()
function:Not so much better as just more efficient...
Full explanation
source
If we're going to abstract it into a function anyway, it's not too hard to write "from scratch":
robert's recipe version definitely wins points for flexibility, though.