I have the following indexed DataFrame with named columns and rows not- continuous numbers:
a b c d
2 0.671399 0.101208 -0.181532 0.241273
3 0.446172 -0.243316 0.051767 1.577318
5 0.614758 0.075793 -0.451460 -0.012493
I would like to add a new column, 'e'
, to the existing data frame and do not want to change anything in the data frame (i.e., the new column always has the same length as the DataFrame).
0 -0.335485
1 -1.166658
2 -0.385571
dtype: float64
I tried different versions of join
, append
, merge
, but I did not get the result I wanted, only errors at most. How can I add column e
to the above example?
Before assigning a new column, if you have indexed data, you need to sort the index. At least in my case I had to:
If you get the
SettingWithCopyWarning
, an easy fix is to copy the DataFrame you are trying to add a column to.If the column you are trying to add is a series variable then just :
This works well even if you are replacing an existing column.just type the new_columns_name same as the column you want to replace.It will just overwrite the existing column data with the new series data.
One thing to note, though, is that if you do
this will effectively be a left join on the df1.index. So if you want to have an outer join effect, my probably imperfect solution is to create a dataframe with index values covering the universe of your data, and then use the code above. For example,
Doing this directly via NumPy will be the most efficient:
Note my original (very old) suggestion was to use
map
(which is much slower):The following is what I did... But I'm pretty new to pandas and really Python in general, so no promises.