Find max number in .CSV file in Python

2019-06-09 17:33发布

问题:

I have a .csv file that when opened in Excel looks like this:

My code:

myfile = open("/Users/it/Desktop/Python/In-Class Programs/countries.csv", "rb")

    countries = []
    for item in myfile:
        a = item.split(",")
        countries.append(a)

    hdi_list = []
    for acountry in countries:
        hdi = acountry[3]

        try:
            hdi_list.append(float(hdi))
        except:
            pass

    average = round(sum(hdi_list)/len(hdi_list), 2)
    maxNumber = round(max(hdi_list), 2)
    minNumber = round(min(hdi_list), 2)

This code works well, however, when I find the max,min, or avg I need to grab the corresponding name of the country and print that as well.

How can I change my code to grab the country name of the min,max, avg as well?

回答1:

The following approach is close enough to your implementation that I think it might be useful. However, if you start working with larger or more complicated csv files, you should look into packages like "csv.reader" or "Pandas" (as previously mentioned). They are more robust and efficient in working with complex .csv data. You could also work through Excel with the "xlrd" package.

In my opinion, the simplest solution to reference country names with their respective values is to combine your 'for loops'. Instead of looping through your data twice (in two separate 'for loops') and creating two separate lists, use a single 'for loop' and create a dictionary with relevant data (ie. "country name", "hdi"). You could also create a tuple (as previously mentioned) but I think dictionaries are more explicit.

myfile = open("/Users/it/Desktop/Python/In-Class Programs/countries.csv", "rb")

countries = []
for line in myfile:
    country_name = line.split(",")[1]
    value_of_interest = float(line.split(",")[3])
    countries.append(
        {"Country Name": country_name, 
         "Value of Interest": value_of_interest})

ave_value = sum([country["Value of Interest"] for country in countries])/len(countries)
max_value = max([country["Value of Interest"] for country in countries])
min_value = min([country["Value of Interest"] for country in countries])

print "Country Average == ", ave_value
for country in countries:
    if country["Value of Interest"] == max_value:
        print "Max == {country}:{value}".format(country["Country Name"], country["Value of Interest"])
    if country["Value of Interest"] == min_value:
        print "Min == {country}:{value}".format(country["Country Name"], country["Value of Interest"])

Note that this method returns multiple countries if they have equal min/max values.

If you are dead-set on creating separate lists (like your current implementation), you might consider zip() to connect your lists (by index), where

zip(countries, hdi_list) = [(countries[1], hdi_list[1]), ...]

For example:

for country in zip(countries, hdi_list):
    if country[1] == max_value:
        print country[0], country[1]

with similar logic applied to the min and average. This method works but is less explicit and more difficult to maintain.



回答2:

Instead of putting the values straight in the list, use tuples instead, like this:

hdi_list.append((float(hdi), acountry[1]))

Then you can use this instead:

maxTuple = max(hdi_list)
maxNumber = round(maxTuple[0], 2)
maxCountry = maxTuple[1]


回答3:

Using the pandas module, [4], [5], and [6] below should show the max, min, and average respectively. Note that the data below doesn't match yours save for country.

In [1]: import pandas as pd

In [2]: df = pd.read_csv("hdi.csv")

In [3]: df
Out[3]: 
         Country    HDI
0         Norway  83.27
1      Australia  80.77
2    Netherlands  87.00
3  United States  87.43
4    New Zealand  87.43
5         Canada  87.66
6        Ireland  75.47
7  Liechtenstein  88.97
8        Germany  86.31
9         Sweden  80.54

In [4]: df.ix[df["HDI"].idxmax()]
Out[4]: 
Country    Liechtenstein
HDI                88.97
Name: 7, dtype: object

In [5]: df.ix[df["HDI"].idxmin()]
Out[5]: 
Country    Ireland
HDI          75.47
Name: 6, dtype: object

In [6]: df["HDI"].mean()
Out[6]: 84.484999999999985

Assuming both Liechtenstein and Germany have max values:

In [15]: df
Out[15]: 
         Country    HDI
0         Norway  83.27
1      Australia  80.77
2    Netherlands  87.00
3  United States  87.43
4    New Zealand  87.43
5         Canada  87.66
6        Ireland  75.47
7  Liechtenstein  88.97
8        Germany  88.97
9         Sweden  80.54

In [16]: df[df["HDI"] == df["HDI"].max()]
Out[16]: 
         Country    HDI
7  Liechtenstein  88.97
8        Germany  88.97

The same logic can be applied for the minimum value.