My data file looks like this:
3.6-band
6238
Over
0.5678
Over
0.6874
Over
0.7680
Over
0.7834
What I want to do is to pick out the smallest float and the word directly above it and print those two values. I have no idea what I'm doing. I've tried
df=open('filepath')
for line in df:
df1=line.split()
df2=min(df1)
Which is my attempt at at least trying to isolate the smallest float. Problem is it's just giving me the last value. I think that's a problem with python not knowing to start over with the iteration, but again...no idea what I'm doing. I tried df2=min(df1.seek(0))
with no success, got an error saying no attribute seek
. So that's what I've tried so far, I still have no idea how to print the row that would come before the smallest float. Suggestions/help/advice would be appreciated, thanks.
As a side note: this data file is an example of a larger one with similar characteristics, but the word 'Over' could also be 'Under', that's why I need to have it printed as well.
You can't use:
You can only use:
If your file looks like this:
You can use this code:
If you have the
"Over"
s then you can skip every second line.I see some interesting solutions above. I would go for this straightforward solution. There is one problem left, which is that integers might be taken like this as well. Anyone a solution for this?
Store the items in a list of lists,
[word,num]
pairs and then applymin
on that list of list. Usekey
parameter ofmin
to specify the which item must be used for comparison of item.:Here
lis
looks like this:You need to read all lines of the file, perhaps with File.readlines(), or a loop like you already have, and then for each line read the number (if it is a number) and compare to the "best so far" value.
It looks like you don't really need split(). What you do need to do, is check if each lines starts with a digit. If so, you can get the number with float(line). Maybe float(line.strip()) if whitespace is causing trouble. If the line doesn't start with a digit, keep it in a temporary variable. If the next line proves to offer a lower number than the best-so-far value, you can copy that temporary value into a variable for the tentative output.
You could use the grouper recipe,
izip(*[iterator]*2)
to cluster the lines indf
into groups of 2. Then, to find the minimum pair of lines, usemin
and itskey
parameter to specify the proxy to used for comparison. In this case, for every pair of lines,(p, l)
, we want to use the float of the second line,float(l)
, as the proxy:prints
An explanation of the grouper recipe:
To understand the grouper recipe, first look at what happens if
df
were a list:In Python, when you multiply a list by a positive integer
n
, you getn
(shallow) copies of the items in the list. Thus,[df]*2
makes a list with two copies ofdf
inside.Now consider
zip(*[df]*2)
The
*
used inzip(*...)
has a special meaning. It tells Python to unpack the list following the*
into arguments to be passed tozip
. Thus,zip(*[df]*2)
is exactly equivalent tozip(df, df)
:A more complete explanation of argument unpacking is given by SaltyCrane here.
Take note of what
zip
is doing.zip(*[df]*2)
peels off the first element of both copies, (both 1's in this case), and forms the tuple, (1,1). Then it peels off the second element of both copies, (both 2's), and forms the tuple (2,2). It returns a list with these tuples inside.Now consider what happens when
df
is an iterator. An iterator is sort of like a list, except an iterator is good for only a single pass. As items are pulled out the iterator, the iterator can never be rewound.For example, a file handle is an iterator. Suppose we have a file with lines
You can pull items out of the iterator
f
by callingnext(f)
:Each time we call
next(f)
, we get the next line from the file handle,f
. If we callnext(f)
again, we'd get a StopIteration exception, indicating the iterator is empty.Now let's see how the grouper recipe behaves on
f
:[f]*2
gives us a list with two identical copies of the same iteratorf
.zip(*[f]*2)
peels off the first item from the first iterator,f
, and then peels off the first item form the second iterator,f
. But the iterator is the samef
both times! And since iterators are good for a single-pass (you can never go back), you get different items each time you peel off an item.zip
is callingnext(f)
each time to peel off an item. So the first tuple is('1\n', '2\n')
. Likewise,zip
then peels off the next item from the first iteratorf
, and the next item from the second iteratorf
, and forms the tuple('3\n', '4\n')
. Thus,zip(*[f]*2)
returns[('1\n', '2\n'), ('3\n', '4\n')]
.That's really all there is to the grouper recipe. Above, I chose to use
IT.izip
instead ofzip
so that Python would return an iterator instead of a list of tuples. This would save a lot of memory if the file had a lot of lines in it. The difference betweenzip
andIT.izip
is explained more fully here.