Adding new column to existing DataFrame in Python-第3页回答

I have the following indexed DataFrame with named columns and rows not- continuous numbers:

          a         b         c         d
2  0.671399  0.101208 -0.181532  0.241273
3  0.446172 -0.243316  0.051767  1.577318
5  0.614758  0.075793 -0.451460 -0.012493

I would like to add a new column, 'e', to the existing data frame and do not want to change anything in the data frame (i.e., the new column always has the same length as the DataFrame).

0   -0.335485
1   -1.166658
2   -0.385571
dtype: float64

I tried different versions of join, append, merge, but I did not get the result I wanted, only errors at most. How can I add column e to the above example?

标签： python pandas dataframe chained-assignment

21条回答

泪湿衣

2楼-- · 2018-12-31 17:41

Before assigning a new column, if you have indexed data, you need to sort the index. At least in my case I had to:

data.set_index(['index_column'], inplace=True)
"if index is unsorted, assignment of a new column will fail"        
data.sort_index(inplace = True)
data.loc['index_value1', 'column_y'] = np.random.randn(data.loc['index_value1', 'column_x'].shape[0])

0人赞添加讨论(0) 举报

梦寄多情

3楼-- · 2018-12-31 17:41

If you get the SettingWithCopyWarning, an easy fix is to copy the DataFrame you are trying to add a column to.

df = df.copy()
df['col_name'] = values

0人赞添加讨论(0) 举报

几人难应

4楼-- · 2018-12-31 17:44

If the column you are trying to add is a series variable then just :

df["new_columns_name"]=series_variable_name #this will do it for you

This works well even if you are replacing an existing column.just type the new_columns_name same as the column you want to replace.It will just overwrite the existing column data with the new series data.

0人赞添加讨论(0) 举报

美炸的是我

5楼-- · 2018-12-31 17:44

One thing to note, though, is that if you do

df1['e'] = Series(np.random.randn(sLength), index=df1.index)

this will effectively be a left join on the df1.index. So if you want to have an outer join effect, my probably imperfect solution is to create a dataframe with index values covering the universe of your data, and then use the code above. For example,

data = pd.DataFrame(index=all_possible_values)
df1['e'] = Series(np.random.randn(sLength), index=df1.index)

0人赞添加讨论(0) 举报

十年一品温如言

6楼-- · 2018-12-31 17:45

Doing this directly via NumPy will be the most efficient:

df1['e'] = np.random.randn(sLength)

Note my original (very old) suggestion was to use map (which is much slower):

df1['e'] = df1['a'].map(lambda x: np.random.random())

0人赞添加讨论(0) 举报

冷夜・残月

7楼-- · 2018-12-31 17:47

The following is what I did... But I'm pretty new to pandas and really Python in general, so no promises.

df = pd.DataFrame([[1, 2], [3, 4], [5,6]], columns=list('AB'))

newCol = [3,5,7]
newName = 'C'

values = np.insert(df.values,df.shape[1],newCol,axis=1)
header = df.columns.values.tolist()
header.append(newName)

df = pd.DataFrame(values,columns=header)

0人赞添加讨论(0) 举报

Adding new column to existing DataFrame in Python

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间