I was wondering how can I find minimum and maximum values from a dataset, which is basically a text file. It has 50 rows, 50 columns.
I know I can set up a control loop (for loop to be specific) to have it read each row and column, and determine the min/max values. But, I'm not sure how to do that.
I think the rows and columns need to be converted to list first and then I need to use the split()
function. I tried setting something up as follows, but it doesn't seem to work:
for x in range(4,50): # using that range as an example
x.split()
max(4,50)
print x
New to Python. Please excuse my mistakes.
Try something like this:
data = []
with open('data.txt') as f:
for line in f: # loop over the rows
fields = line.split() # parse the columns
rowdata = map(float, fields) # convert text to numbers
data.extend(rowdata) # accumulate the results
print 'Minimum:', min(data)
print 'Maximum:', max(data)
Note that split() takes an optional argument if you want to split on something other than whitespace (commas for example).
If the file contains a regular (rectangular) matrix, and you know how many lines of header info it contains, then you can skip over the header info and use NumPy to do this particularly easily:
import numpy as np
f = open("file.txt")
# skip over header info
X = np.loadtxt(f)
max_per_col = X.max(axis=0)
max_per_row = X.max(axis=1)
Hmmm...are you sure that homework doesn't apply here? ;) Regardless:
You need to not only split the input lines, you need to convert the text values into numbers.
So assuming you've read the input line into in_line, you'd do something like this:
...
row = [float(each) for each in in_line.split()]
rows.append(row) # assuming you have a list called rows
...
Once you have a list of rows, you need to get columns:
...
columns = zip(*rows)
Then you can just iterate through each row and each column calling max():
...
for each in rows:
print max(each)
for eac in columns:
print max(each)
Edit: Here's more complete code showing how to open a file, iterate through the lines of the file, close the file, and use the above hints:
in_file = open('thefile.txt', 'r')
rows = []
for in_line in in_file:
row = [float(each) for each in in_line.split()]
rows.append(row)
in_file.close() # this'll happen at the end of the script / function / method anyhow
columns = zip(*rows)
for index, row in enumerate(rows):
print "In row %s, Max = %s, Min = %s" % (index, max(row), min(row))
for index, column in enumerate(columns):
print "In column %s, Max = %s, Min = %s" % (index, max(column), min(column))
Edit: For new-school goodness, don't use my old, risky file handling. Use the new, safe version:
rows = []
with open('thefile.txt', 'r') as in_file:
for in_line in in_file:
row = ....
Now you've got a lot of assurances that you don't accidentally do something bad like leave that file open, even if you throw an exception while reading it. Plus, you can entirely skip in_file.close()
without feeling even a little guilty.
Will this work for you?
infile = open('my_file.txt', 'r')
file_lines = file.readlines(infile)
for line in file_lines[6:]:
items = [int(x) for x in line.split()]
max_item = max(items)
min_item = min(items)