I'm working on this code (on python) that reads a text file. The text file contains information to construct a certain geometry, and it is separated by sections by using keywords, for example, the file:
*VERTICES
1 0 0 0
2 10 0 0
3 10 10 0
4 0 10 0
*EDGES
1 1 2
2 1 4
3 2 3
4 3 4
contains the information of a square with vertices at (0,0), (0,10), (10,0), (10,10). The "*Edges" part defines the connection between the vertices. The first number in each row is an ID number.
Here is my problem, the information in the text file is not necessarily in order, sometimes the "Vertices" section appears first, and some other times the "Edges" section will come first. I have other keywords as well, so I'm trying to avoid repeating if
statements to test if each line has a new keyword.
What I have been doing is reading the text file multiple times, each time looking for a different keyword:
open file
read line by line
if line == *Points
store all the following lines in a list until a new *command is encountered
close file
open file (again)
read line by line
if line == *Edges
store all the following lines in a list until a new *command is encountered
close file
open file (again)
...
Can someone point out how can I identify these keywords without such a tedious procedure? Thanks.
You can read the file once and store the contents in a dictionary. Since you have conveniently labeled the "command" lines with a *, you can use all lines beginning with a * as the dictionary key and all following lines as the values for that key. You can do this with a for loop:
Or you can take advantage of python's list and dictionary comprehensions to do the same thing in one line:
which I'll admit is not very nice looking but it gets the job done by splitting the entire file into chunks between '*' characters and then using new lines and spaces as delimiters to break up the remaining chunks into dictionary keys and lists of lists (as dictionary values).
Details about splitting, stripping, and slicing strings can be found here
You should just create a dictionary of the sections. You could use a generator to read the file and yield each section in whatever order they arrive and build a dictionary from the results.
Here's some incomplete code that might help you along:
Assuming the data above is in a file called
data.txt
:Then you can reference each section, e.g.:
A common strategy with this type of parsing is to build a function that can yield the data a section at a time. Then your top-level calling code can be fairly simple because it doesn't have to worry about the section logic at all. Here's an example with your data:
A dictionary is probably the way to go given that your data isn't ordered. You can access it by section name after reading the file into a list. Note that the
with
keyword closes your file automatically.Here's what it might look like:
You will get a dictionary that looks like this:
Access each section this way:
Note that the above code assumes each section starts with *, and that no other line starts with *. If the first is not the case, you could make this change:
Also note that this part of the
section_dict
code:...gets rid of the star at the beginning of each section name. If this is not desired, you can change that to:
If it is possible there will be undesired white space in your section name lines, you can do this to get rid of it:
I haven't tested all of this yet but this is the general idea.
The fact that they are unordered I think lends itself well for parsing into a dictionary from which you can access values later. I wrote a function that you may find useful for this task:
Assuming your file is named 'data.txt'
The returned defaultdict looks like this:
You access the data just like a normal dictionary