I have a .csv file that when opened in Excel looks like this:
My code:
myfile = open("/Users/it/Desktop/Python/In-Class Programs/countries.csv", "rb")
countries = []
for item in myfile:
a = item.split(",")
countries.append(a)
hdi_list = []
for acountry in countries:
hdi = acountry[3]
try:
hdi_list.append(float(hdi))
except:
pass
average = round(sum(hdi_list)/len(hdi_list), 2)
maxNumber = round(max(hdi_list), 2)
minNumber = round(min(hdi_list), 2)
This code works well, however, when I find the max,min, or avg I need to grab the corresponding name of the country and print that as well.
How can I change my code to grab the country name of the min,max, avg as well?
The following approach is close enough to your implementation that I think it might be useful. However, if you start working with larger or more complicated csv files, you should look into packages like "csv.reader" or "Pandas" (as previously mentioned). They are more robust and efficient in working with complex .csv data. You could also work through Excel with the "xlrd" package.
In my opinion, the simplest solution to reference country names with their respective values is to combine your 'for loops'. Instead of looping through your data twice (in two separate 'for loops') and creating two separate lists, use a single 'for loop' and create a dictionary with relevant data (ie. "country name", "hdi"). You could also create a tuple (as previously mentioned) but I think dictionaries are more explicit.
myfile = open("/Users/it/Desktop/Python/In-Class Programs/countries.csv", "rb")
countries = []
for line in myfile:
country_name = line.split(",")[1]
value_of_interest = float(line.split(",")[3])
countries.append(
{"Country Name": country_name,
"Value of Interest": value_of_interest})
ave_value = sum([country["Value of Interest"] for country in countries])/len(countries)
max_value = max([country["Value of Interest"] for country in countries])
min_value = min([country["Value of Interest"] for country in countries])
print "Country Average == ", ave_value
for country in countries:
if country["Value of Interest"] == max_value:
print "Max == {country}:{value}".format(country["Country Name"], country["Value of Interest"])
if country["Value of Interest"] == min_value:
print "Min == {country}:{value}".format(country["Country Name"], country["Value of Interest"])
Note that this method returns multiple countries if they have equal min/max values.
If you are dead-set on creating separate lists (like your current implementation), you might consider zip() to connect your lists (by index), where
zip(countries, hdi_list) = [(countries[1], hdi_list[1]), ...]
For example:
for country in zip(countries, hdi_list):
if country[1] == max_value:
print country[0], country[1]
with similar logic applied to the min and average. This method works but is less explicit and more difficult to maintain.
Instead of putting the values straight in the list, use tuples instead, like this:
hdi_list.append((float(hdi), acountry[1]))
Then you can use this instead:
maxTuple = max(hdi_list)
maxNumber = round(maxTuple[0], 2)
maxCountry = maxTuple[1]
Using the pandas
module, [4]
, [5]
, and [6]
below should show the max, min, and average respectively. Note that the data below doesn't match yours save for country.
In [1]: import pandas as pd
In [2]: df = pd.read_csv("hdi.csv")
In [3]: df
Out[3]:
Country HDI
0 Norway 83.27
1 Australia 80.77
2 Netherlands 87.00
3 United States 87.43
4 New Zealand 87.43
5 Canada 87.66
6 Ireland 75.47
7 Liechtenstein 88.97
8 Germany 86.31
9 Sweden 80.54
In [4]: df.ix[df["HDI"].idxmax()]
Out[4]:
Country Liechtenstein
HDI 88.97
Name: 7, dtype: object
In [5]: df.ix[df["HDI"].idxmin()]
Out[5]:
Country Ireland
HDI 75.47
Name: 6, dtype: object
In [6]: df["HDI"].mean()
Out[6]: 84.484999999999985
Assuming both Liechtenstein
and Germany
have max values:
In [15]: df
Out[15]:
Country HDI
0 Norway 83.27
1 Australia 80.77
2 Netherlands 87.00
3 United States 87.43
4 New Zealand 87.43
5 Canada 87.66
6 Ireland 75.47
7 Liechtenstein 88.97
8 Germany 88.97
9 Sweden 80.54
In [16]: df[df["HDI"] == df["HDI"].max()]
Out[16]:
Country HDI
7 Liechtenstein 88.97
8 Germany 88.97
The same logic can be applied for the minimum value.