I understand that pandas is designed to load fully populated DataFrame
but I need to create an empty DataFrame then add rows, one by one.
What is the best way to do this ?
I successfully created an empty DataFrame with :
res = DataFrame(columns=('lib', 'qty1', 'qty2'))
Then I can add a new row and fill a field with :
res = res.set_value(len(res), 'qty1', 10.0)
It works but seems very odd :-/ (it fails for adding string value)
How can I add a new row to my DataFrame (with different columns type) ?
You can also build up a list of lists and convert it to a dataframe -
giving
It's been a long time, but I faced the same problem too. And found here a lot of interesting answers. So I was confused what method to use.
In the case of adding a lot of rows to dataframe I interested in speed performance. So I tried 3 most popular methods and checked their speed.
SPEED PERFORMANCE
Results (in secs):
So I use addition through the dictionary for myself.
Code:
P.S. I believe, my realization isn't perfect, and maybe there is some optimization.
Figured out a simple and nice way:
Create a new record(data frame) and add to old_data_frame.
pass list of values and corresponding column names to create a new_record (data_frame)
If you know the number of entries ex ante, you should preallocate the space by also providing the index (taking the data example from a different answer):
Speed comparison
And - as from the comments - with a size of 6000, the speed difference becomes even larger:
This is not an answer to the OP question but a toy example to illustrate the answer of @ShikharDua above which I found very useful.
While this fragment is trivial, in the actual data I had 1,000s of rows, and many columns, and I wished to be able to group by different columns and then perform the stats below for more than one taget column. So having a reliable method for building the data frame one row at a time was a great convenience. Thank you @ShikharDua !