Adding new column to existing DataFrame in Python

2018-12-31 17:07发布

I have the following indexed DataFrame with named columns and rows not- continuous numbers:

          a         b         c         d
2  0.671399  0.101208 -0.181532  0.241273
3  0.446172 -0.243316  0.051767  1.577318
5  0.614758  0.075793 -0.451460 -0.012493

I would like to add a new column, 'e', to the existing data frame and do not want to change anything in the data frame (i.e., the new column always has the same length as the DataFrame).

0   -0.335485
1   -1.166658
2   -0.385571
dtype: float64

I tried different versions of join, append, merge, but I did not get the result I wanted, only errors at most. How can I add column e to the above example?

21条回答
泪湿衣
2楼-- · 2018-12-31 17:41

Before assigning a new column, if you have indexed data, you need to sort the index. At least in my case I had to:

data.set_index(['index_column'], inplace=True)
"if index is unsorted, assignment of a new column will fail"        
data.sort_index(inplace = True)
data.loc['index_value1', 'column_y'] = np.random.randn(data.loc['index_value1', 'column_x'].shape[0])
查看更多
梦寄多情
3楼-- · 2018-12-31 17:41

If you get the SettingWithCopyWarning, an easy fix is to copy the DataFrame you are trying to add a column to.

df = df.copy()
df['col_name'] = values
查看更多
几人难应
4楼-- · 2018-12-31 17:44

If the column you are trying to add is a series variable then just :

df["new_columns_name"]=series_variable_name #this will do it for you

This works well even if you are replacing an existing column.just type the new_columns_name same as the column you want to replace.It will just overwrite the existing column data with the new series data.

查看更多
美炸的是我
5楼-- · 2018-12-31 17:44

One thing to note, though, is that if you do

df1['e'] = Series(np.random.randn(sLength), index=df1.index)

this will effectively be a left join on the df1.index. So if you want to have an outer join effect, my probably imperfect solution is to create a dataframe with index values covering the universe of your data, and then use the code above. For example,

data = pd.DataFrame(index=all_possible_values)
df1['e'] = Series(np.random.randn(sLength), index=df1.index)
查看更多
十年一品温如言
6楼-- · 2018-12-31 17:45

Doing this directly via NumPy will be the most efficient:

df1['e'] = np.random.randn(sLength)

Note my original (very old) suggestion was to use map (which is much slower):

df1['e'] = df1['a'].map(lambda x: np.random.random())
查看更多
冷夜・残月
7楼-- · 2018-12-31 17:47

The following is what I did... But I'm pretty new to pandas and really Python in general, so no promises.

df = pd.DataFrame([[1, 2], [3, 4], [5,6]], columns=list('AB'))

newCol = [3,5,7]
newName = 'C'

values = np.insert(df.values,df.shape[1],newCol,axis=1)
header = df.columns.values.tolist()
header.append(newName)

df = pd.DataFrame(values,columns=header)
查看更多
登录 后发表回答