Adding new column to existing DataFrame in Python

2018-12-31 17:07发布

I have the following indexed DataFrame with named columns and rows not- continuous numbers:

          a         b         c         d
2  0.671399  0.101208 -0.181532  0.241273
3  0.446172 -0.243316  0.051767  1.577318
5  0.614758  0.075793 -0.451460 -0.012493

I would like to add a new column, 'e', to the existing data frame and do not want to change anything in the data frame (i.e., the new column always has the same length as the DataFrame).

0   -0.335485
1   -1.166658
2   -0.385571
dtype: float64

I tried different versions of join, append, merge, but I did not get the result I wanted, only errors at most. How can I add column e to the above example?

21条回答
余生请多指教
2楼-- · 2018-12-31 17:29

I was looking for a general way of adding a column of numpy.nans to a dataframe without getting the dumb SettingWithCopyWarning.

From the following:

  • the answers here
  • this question about passing a variable as a keyword argument
  • this method for generating a numpy array of NaNs in-line

I came up with this:

col = 'column_name'
df = df.assign(**{col:numpy.full(len(df), numpy.nan)})
查看更多
墨雨无痕
3楼-- · 2018-12-31 17:30

This is the simple way of adding a new column: df['e'] = e

查看更多
唯独是你
4楼-- · 2018-12-31 17:33

To add a new column, 'e', to the existing data frame

 df1.loc[:,'e'] = Series(np.random.randn(sLength))
查看更多
路过你的时光
5楼-- · 2018-12-31 17:35

It seems that in recent Pandas versions the way to go is to use df.assign:

df1 = df1.assign(e=np.random.randn(sLength))

It doesn't produce SettingWithCopyWarning.

查看更多
长期被迫恋爱
6楼-- · 2018-12-31 17:35
  1. First create a python's list_of_e that has relevant data.
  2. Use this: df['e'] = list_of_e
查看更多
墨雨无痕
7楼-- · 2018-12-31 17:39

For the sake of completeness - yet another solution using DataFrame.eval() method:

Data:

In [44]: e
Out[44]:
0    1.225506
1   -1.033944
2   -0.498953
3   -0.373332
4    0.615030
5   -0.622436
dtype: float64

In [45]: df1
Out[45]:
          a         b         c         d
0 -0.634222 -0.103264  0.745069  0.801288
4  0.782387 -0.090279  0.757662 -0.602408
5 -0.117456  2.124496  1.057301  0.765466
7  0.767532  0.104304 -0.586850  1.051297
8 -0.103272  0.958334  1.163092  1.182315
9 -0.616254  0.296678 -0.112027  0.679112

Solution:

In [46]: df1.eval("e = @e.values", inplace=True)

In [47]: df1
Out[47]:
          a         b         c         d         e
0 -0.634222 -0.103264  0.745069  0.801288  1.225506
4  0.782387 -0.090279  0.757662 -0.602408 -1.033944
5 -0.117456  2.124496  1.057301  0.765466 -0.498953
7  0.767532  0.104304 -0.586850  1.051297 -0.373332
8 -0.103272  0.958334  1.163092  1.182315  0.615030
9 -0.616254  0.296678 -0.112027  0.679112 -0.622436
查看更多
登录 后发表回答