Set value for particular cell in pandas DataFrame

2019-01-01 04:20发布

问题:


I\'ve created a pandas DataFrame

df=DataFrame(index=[\'A\',\'B\',\'C\'], columns=[\'x\',\'y\'])

and got this

    x    y
A  NaN  NaN
B  NaN  NaN
C  NaN  NaN


Then I want to assign value to particular cell, for example for row \'C\' and column \'x\'. I\'ve expected to get such result:

    x    y
A  NaN  NaN
B  NaN  NaN
C  10  NaN

with this code:

df.xs(\'C\')[\'x\']=10

but contents of df haven\'t changed. It\'s again only Nan\'s in dataframe.

Any suggestions?

回答1:

RukTech\'s answer, df.set_value(\'C\', \'x\', 10), is far and away faster than the options I\'ve suggested below. However, it has been slated for deprecation.

Going forward, the recommended method is .iat/.at.


Why df.xs(\'C\')[\'x\']=10 does not work:

df.xs(\'C\') by default, returns a new dataframe with a copy of the data, so

df.xs(\'C\')[\'x\']=10

modifies this new dataframe only.

df[\'x\'] returns a view of the df dataframe, so

df[\'x\'][\'C\'] = 10

modifies df itself.

Warning: It is sometimes difficult to predict if an operation returns a copy or a view. For this reason the docs recommend avoiding assignments with \"chained indexing\".


So the recommended alternative is

df.at[\'C\', \'x\'] = 10

which does modify df.


In [18]: %timeit df.set_value(\'C\', \'x\', 10)
100000 loops, best of 3: 2.9 µs per loop

In [20]: %timeit df[\'x\'][\'C\'] = 10
100000 loops, best of 3: 6.31 µs per loop

In [81]: %timeit df.at[\'C\', \'x\'] = 10
100000 loops, best of 3: 9.2 µs per loop


回答2:

Update: The .set_value method is going to be deprecated. .iat/.at are good replacements, unfortunately pandas provides little documentation


The fastest way to do this is using set_value. This method is ~100 times faster than .ix method. For example:

df.set_value(\'C\', \'x\', 10)



回答3:

You can also use a conditional lookup using .loc as seen here:

df.loc[df[<some_column_name>] == <condition>, [<another_column_name>]] = <value_to_add>

where <some_column_name is the column you want to check the <condition> variable against and <another_column_name> is the column you want to add to (can be a new column or one that already exists). <value_to_add> is the value you want to add to that column/row.

This example doesn\'t work precisely with the question at hand, but it might be useful for someone wants to add a specific value based on a condition.



回答4:

The recommended way (according to the maintainers) to set a value is:

df.ix[\'x\',\'C\']=10

Using \'chained indexing\' (df[\'x\'][\'C\']) may lead to problems.

See:

  • https://stackoverflow.com/a/21287235/1579844
  • http://pandas.pydata.org/pandas-docs/dev/indexing.html#indexing-view-versus-copy
  • https://github.com/pydata/pandas/pull/6031


回答5:

Try using df.loc[row_index,col_indexer] = value



回答6:

This is the only thing that worked for me!

df.loc[\'C\', \'x\'] = 10

Learn more about .loc here.



回答7:

you can use .iloc.

df.iloc[[2], [0]] = 10


回答8:

In my example i just change it in selected cell

    for index, row in result.iterrows():
        if np.isnan(row[\'weight\']):
            result.at[index, \'weight\'] = 0.0

\'result\' is a dataField with column \'weight\'



回答9:

df.loc[\'c\',\'x\']=10 This will change the value of cth row and xth column.



回答10:

If you want to change values not for whole row, but only for some columns:

x = pd.DataFrame({\'A\': [1, 2, 3], \'B\': [4, 5, 6]})
x.iloc[1] = dict(A=10, B=-10)


回答11:

From version 0.21.1 you can also use .at method. There are some differences compared to .loc as mentioned here - pandas .at versus .loc, but it\'s faster on single value replacement



回答12:

In addition to the answers above, here is a benchmark comparing different ways to add rows of data to an already existing dataframe. It shows that using at or set-value is the most efficient way for large dataframes (at least for these test conditions).

  • Create new dataframe for each row and...
    • ... append it (13.0 s)
    • ... concatenate it (13.1 s)
  • Store all new rows in another container first, convert to new dataframe once and append...
    • container = lists of lists (2.0 s)
    • container = dictionary of lists (1.9 s)
  • Preallocate whole dataframe, iterate over new rows and all columns and fill using
    • ... at (0.6 s)
    • ... set_value (0.4 s)

For the test, an existing dataframe comprising 100,000 rows and 1,000 columns and random numpy values was used. To this dataframe, 100 new rows were added.

Code see below:

#!/usr/bin/env python3
# -*- coding: utf-8 -*-
\"\"\"
Created on Wed Nov 21 16:38:46 2018

@author: gebbissimo
\"\"\"

import pandas as pd
import numpy as np
import time

NUM_ROWS = 100000
NUM_COLS = 1000
data = np.random.rand(NUM_ROWS,NUM_COLS)
df = pd.DataFrame(data)

NUM_ROWS_NEW = 100
data_tot = np.random.rand(NUM_ROWS + NUM_ROWS_NEW,NUM_COLS)
df_tot = pd.DataFrame(data_tot)

DATA_NEW = np.random.rand(1,NUM_COLS)


#%% FUNCTIONS

# create and append
def create_and_append(df):
    for i in range(NUM_ROWS_NEW):
        df_new = pd.DataFrame(DATA_NEW)
        df = df.append(df_new)
    return df

# create and concatenate
def create_and_concat(df):
    for i in range(NUM_ROWS_NEW):
        df_new = pd.DataFrame(DATA_NEW)
        df = pd.concat((df, df_new))
    return df


# store as dict and 
def store_as_list(df):
    lst = [[] for i in range(NUM_ROWS_NEW)]
    for i in range(NUM_ROWS_NEW):
        for j in range(NUM_COLS):
            lst[i].append(DATA_NEW[0,j])
    df_new = pd.DataFrame(lst)
    df_tot = df.append(df_new)
    return df_tot

# store as dict and 
def store_as_dict(df):
    dct = {}
    for j in range(NUM_COLS):
        dct[j] = []
        for i in range(NUM_ROWS_NEW):
            dct[j].append(DATA_NEW[0,j])
    df_new = pd.DataFrame(dct)
    df_tot = df.append(df_new)
    return df_tot




# preallocate and fill using .at
def fill_using_at(df):
    for i in range(NUM_ROWS_NEW):
        for j in range(NUM_COLS):
            #print(\"i,j={},{}\".format(i,j))
            df.at[NUM_ROWS+i,j] = DATA_NEW[0,j]
    return df


# preallocate and fill using .at
def fill_using_set(df):
    for i in range(NUM_ROWS_NEW):
        for j in range(NUM_COLS):
            #print(\"i,j={},{}\".format(i,j))
            df.set_value(NUM_ROWS+i,j,DATA_NEW[0,j])
    return df


#%% TESTS
t0 = time.time()    
create_and_append(df)
t1 = time.time()
print(\'Needed {} seconds\'.format(t1-t0))

t0 = time.time()    
create_and_concat(df)
t1 = time.time()
print(\'Needed {} seconds\'.format(t1-t0))

t0 = time.time()    
store_as_list(df)
t1 = time.time()
print(\'Needed {} seconds\'.format(t1-t0))

t0 = time.time()    
store_as_dict(df)
t1 = time.time()
print(\'Needed {} seconds\'.format(t1-t0))

t0 = time.time()    
fill_using_at(df_tot)
t1 = time.time()
print(\'Needed {} seconds\'.format(t1-t0))

t0 = time.time()    
fill_using_set(df_tot)
t1 = time.time()
print(\'Needed {} seconds\'.format(t1-t0))


回答13:

I too was searching for this topic and I put together a way to iterate through a DataFrame and update it with lookup values from a second DataFrame. Here is my code.

src_df = pd.read_sql_query(src_sql,src_connection)
for index1, row1 in src_df.iterrows():
    for index, row in vertical_df.iterrows():
        src_df.set_value(index=index1,col=u\'etl_load_key\',value=etl_load_key)
        if (row1[u\'src_id\'] == row[\'SRC_ID\']) is True:
            src_df.set_value(index=index1,col=u\'vertical\',value=row[\'VERTICAL\'])