I'm creating a Pandas DataFrame to store data. Unfortunately, I can't know the number of rows of data that I'll have ahead of time. So my approach has been the following.
First, I declare an empty DataFrame.
df = DataFrame(columns=['col1', 'col2'])
Then, I append a row of missing values.
df = df.append([None] * 2, ignore_index=True)
Finally, I can insert values into this DataFrame one cell at a time. (Why I have to do this one cell at a time is a long story.)
df['col1'][0] = 3.28
This approach works perfectly fine, with the exception that the append statement inserts an additional column to my DataFrame. At the end of the process the output I see when I type df
looks like this (with 100 rows of data).
<class 'pandas.core.frame.DataFrame'>
Data columns (total 2 columns):
0 0 non-null values
col1 100 non-null values
col2 100 non-null values
df.head()
looks like this.
0 col1 col2
0 None 3.28 1
1 None 1 0
2 None 1 0
3 None 1 0
4 None 1 1
Any thoughts on what is causing this 0
column to appear in my DataFrame?
The append is trying to append a column to your dataframe. The column it is trying to append is not named and has two None/Nan elements in it which pandas will name (by default) as column named 0.
In order to do this successfully, the column names coming into the append for the data frame must be consistent with the current data frame column names or else new columns will be created (by default)
You could use a
Series
for row insertion:df
looks like: