I have a text file that looks like this:
# Comments
PARAMETER 0 0
1045 54
1705 0 time 1
1 10 100 0.000e+00 9999 A
2 20 200 0.2717072 9999 B
3 30 300 0.0282928 9999 C
1 174 92 2999.4514 9999 APEW-1
2 174 92 54.952499 9999 ART-3A
1 174 97 5352.1299 9999 APEW-2
1 173 128 40.455467 9999 APEW-3
2 173 128 1291.1320 9999 APEW-3
3 173 128 86.562599 9999 ART-7B
...
I want to create a dictionary that looks like below (basically skipping the header and certain columns and goes to the data that I need):
my_dict = {'A':(1,10,100),'B':(2,20,200), 'C':(3,30,300), 'APEW-1':(1,174,92), ...}
These data point are observation points and their respective values are depth, y, x. Therefore one observation point can have multiple values for different depths (the first column). I am trying to avoid rename the labels by adding a suffix for duplicates. I wonder if there is any way around it. What I want to do with them is to call a observation point name and extract the coordinates. I am not sure if the dictionary is the right tool for this purpose.
It is an small dataset and doesn't need to be fast. I am using Numpy, Python 2.7.
loadtxt
can do it:
>>> dtype=np.rec.fromrecords([[0, 0, 0, b'APEW-1']]).dtype
>>> x = np.loadtxt(fn, skiprows=4, usecols=(0,1,2,5), dtype=dtype)
>>>
>>> result = {}
>>> for x0, x1, x2, key in x:
... try:
... result[key.decode()].append((x0,x1,x2))
... except KeyError:
... result[key.decode()] = [(x0,x1,x2)]
...
>>> result
{'A': [(1, 10, 100)], 'B': [(2, 20, 200)], 'C': [(3, 30, 300)], 'APEW-1': [(1, 174, 92)], 'ART-3A': [(2, 174, 92)], 'APEW-2': [(1, 174, 97)], 'APEW-3': [(1, 173, 128), (2, 173, 128)], 'ART-7B': [(3, 173, 128)]}
Notes:
we abuse rec.fromrecords
to create a compund dtype describing the columns, be sure to use a template string as long as the longest you expect
- there is probably an official way of creating compound
dtypes
that doesn't involve creating a throw-away array but this is easy and works
loadtxt
paramemters are self-explanatory, because of the compound dtype it generates a 1d recordd array
if there were no duplicate keys, we could use dict comprehension to translate the record array to dict f0-f3
are the auto generated field names
- to accomodate duplicates we pack the values which are tuples in lists
- most lists contain just one tuple, but some will have more
py2 version: main difference no need to use byte strings / decode
, dictionary forgets order of items
>> dtype=np.rec.fromrecords([[0, 0, 0, 'APEW-1']]).dtype
>>> x = np.loadtxt(fn, skiprows=4, usecols=(0,1,2,5), dtype=dtype)
>>>
>>> result = {}
>>> for x0, x1, x2, key in x:
... try:
... result[key].append((x0,x1,x2))
... except KeyError:
... result[key] = [(x0,x1,x2)]
...
>>> result
{'A': [(1, 10, 100)], 'B': [(2, 20, 200)], 'C': [(3, 30, 300)], 'APEW-1': [(1, 174, 92)], 'ART-3A': [(2, 174, 92)], 'APEW-2': [(1, 174, 97)], 'APEW-3': [(1, 173, 128), (2, 173, 128)], 'ART-7B': [(3, 173, 128)]}