How to go through blocks of lines separated by an empty line? The file looks like the following:
ID: 1
Name: X
FamilyN: Y
Age: 20
ID: 2
Name: H
FamilyN: F
Age: 23
ID: 3
Name: S
FamilyN: Y
Age: 13
ID: 4
Name: M
FamilyN: Z
Age: 25
I want to loop through the blocks and grab the fields Name, Family name and Age in a list of 3 columns:
Y X 20
F H 23
Y S 13
Z M 25
simple solution:
Here's another way, using itertools.groupby. The function
groupy
iterates through lines of the file and callsisa_group_separator(line)
for eachline
.isa_group_separator
returns either True or False (called thekey
), anditertools.groupby
then groups all the consecutive lines that yielded the same True or False result.This is a very convenient way to collect lines into groups.
Along with the half-dozen other solutions I already see here, I'm a bit surprised that no one has been so simple-minded (that is, generator-, regex-, map-, and read-free) as to propose, for example,
Re-format to taste.
This answer isn't necessarily better than what's already been posted, but as an illustration of how I approach problems like this it might be useful, especially if you're not used to working with Python's interactive interpreter.
I've started out knowing two things about this problem. First, I'm going to use
itertools.groupby
to group the input into lists of data lines, one list for each individual data record. Second, I want to represent those records as dictionaries so that I can easily format the output.One other thing that this shows is how using generators makes breaking a problem like this down into small parts easy.
Use a dict, namedtuple, or custom class to store each attribute as you come across it, then append the object to a list when you reach a blank line or EOF.
If your file is too large to read into memory all at once, you can still use a regular expressions based solution by using a memory mapped file, with the mmap module:
The mmap trick will provide a "pretend string" to make regular expressions work on the file without having to read it all into one large string. And the
find_iter()
method of the regular expression object will yield matches without creating an entire list of all matches at once (whichfindall()
does).I do think this solution is overkill for this use case however (still: it's a nice trick to know...)