How can I iterate through multiple dataframes to s

2019-07-22 11:54发布

问题:

For my project I'm reading in a csv file with data from every State in the US. My function converts each of these into a separate Dataframe as I need to perform operations on each State's information.

def RanktoDF(csvFile):
    df = pd.read_csv(csvFile)
    df = df[pd.notnull(df['Index'])] # drop all null values
    df = df[df.Index != 'Index'] #Drop all extra headers
    df= df.set_index('State') #Set State as index
    return df

I apply this function to every one of my files and return the df with a name from my array varNames

for name , s in zip (glob.glob('*.csv'), varNames):
    vars()["Crime" + s] = RanktoDF(name)

All of that works perfectly. My problem is that I also want to create a Dataframe thats made up of one column from each of those State Dataframes.

I have tried iterating through a list of my dataframes and selecting the column (population) i want to append it to a new Dataframe:

dfList

dfNewIndex = pd.DataFrame(index=CrimeRank_1980_df.index) # Create new DF with Index


for name in dfList:  #dfList is my list of dataframes. See image
    newIndex = name['Population']
    dfNewIndex.append(newIndex)

    #dfNewIndex = pd.concat([dfNewIndex, dfList[name['Population']], axis=1)

My error is always the same which tells me that name is viewed as a string rather than an actual Dataframe

TypeError                                 Traceback (most recent call last)
<ipython-input-30-5aa85b0174df> in <module>()
      3 
      4 for name in dfList:
----> 5     newIndex = name['Index']
      6     dfNewIndex.append(newIndex)
      7 #     dfNewIndex = pd.concat([dfNewIndex, dfList[name['Population']], axis=1)

TypeError: string indices must be integers

I understand that my list is a list of Strings rather than variables/dataframes so my question is how can i correct my code to be able to do what i want or is there an easier way of doing this?

Any solutions I've looked up have given answers where the dataframes are explicitly typed in order to be concatenated but I have 50 so its a little unfeasible. Any help would be appreciated.

回答1:

One way would be to index into vars(), e.g.

for name in dfList:
    newIndex = vars()[name]["Population"]

Alternatively I think it would be neater to store your dataframes in a container and iterate through that, e.g.

frames = {}

for name, s in zip(glob.glob('*.csv'), varNames):
    frames["Crime" + s] = RanktoDF(name)

for name in frames:
    newIndex = frames[name]["Population"]