I have a DataFrame
from pandas:
import pandas as pd
inp = [{'c1':10, 'c2':100}, {'c1':11,'c2':110}, {'c1':12,'c2':120}]
df = pd.DataFrame(inp)
print df
Output:
c1 c2
0 10 100
1 11 110
2 12 120
Now I want to iterate over the rows of this frame. For every row I want to be able to access its elements (values in cells) by the name of the columns. For example:
for row in df.rows:
print row['c1'], row['c2']
Is it possible to do that in pandas?
I found this similar question. But it does not give me the answer I need. For example, it is suggested there to use:
for date, row in df.T.iteritems():
or
for row in df.iterrows():
But I do not understand what the row
object is and how I can work with it.
I was looking for How to iterate on rows AND columns and ended here so :
You can write your own iterator that implements
namedtuple
This is directly comparable to
pd.DataFrame.itertuples
. I'm aiming at performing the same task with more efficiency.For the given dataframe with my function:
Or with
pd.DataFrame.itertuples
:A comprehensive test
We test making all columns available and subsetting the columns.
To iterate through DataFrame's row in pandas one can use:
DataFrame.iterrows()
DataFrame.itertuples()
itertuples()
is supposed to be faster thaniterrows()
But be aware, according to the docs (pandas 0.21.1 at the moment):
iterrows:
dtype
might not match from row to rowiterrows: Do not modify rows
Use DataFrame.apply() instead:
itertuples:
While
iterrows()
is a good option, sometimesitertuples()
can be much faster:Use itertuples(). It is faster than iterrows():
You can also do
numpy
indexing for even greater speed ups. It's not really iterating but works much better than iteration for certain applications.You may also want to cast it to an array. These indexes/selections are supposed to act like Numpy arrays already but I ran into issues and needed to cast