可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试):
问题:
I\'ve created a pandas DataFrame
df=DataFrame(index=[\'A\',\'B\',\'C\'], columns=[\'x\',\'y\'])
and got this
x y
A NaN NaN
B NaN NaN
C NaN NaN
Then I want to assign value to particular cell, for example for row \'C\' and column \'x\'.
I\'ve expected to get such result:
x y
A NaN NaN
B NaN NaN
C 10 NaN
with this code:
df.xs(\'C\')[\'x\']=10
but contents of df haven\'t changed. It\'s again only Nan\'s in dataframe.
Any suggestions?
回答1:
RukTech\'s answer, df.set_value(\'C\', \'x\', 10)
, is far and away faster than the options I\'ve suggested below. However, it has been slated for deprecation.
Going forward, the recommended method is .iat/.at
.
Why df.xs(\'C\')[\'x\']=10
does not work:
df.xs(\'C\')
by default, returns a new dataframe with a copy of the data, so
df.xs(\'C\')[\'x\']=10
modifies this new dataframe only.
df[\'x\']
returns a view of the df
dataframe, so
df[\'x\'][\'C\'] = 10
modifies df
itself.
Warning: It is sometimes difficult to predict if an operation returns a copy or a view. For this reason the docs recommend avoiding assignments with \"chained indexing\".
So the recommended alternative is
df.at[\'C\', \'x\'] = 10
which does modify df
.
In [18]: %timeit df.set_value(\'C\', \'x\', 10)
100000 loops, best of 3: 2.9 µs per loop
In [20]: %timeit df[\'x\'][\'C\'] = 10
100000 loops, best of 3: 6.31 µs per loop
In [81]: %timeit df.at[\'C\', \'x\'] = 10
100000 loops, best of 3: 9.2 µs per loop
回答2:
Update: The .set_value method is going to be deprecated. .iat/.at are good replacements, unfortunately pandas provides little documentation
The fastest way to do this is using set_value. This method is ~100 times faster than .ix
method. For example:
df.set_value(\'C\', \'x\', 10)
回答3:
You can also use a conditional lookup using .loc
as seen here:
df.loc[df[<some_column_name>] == <condition>, [<another_column_name>]] = <value_to_add>
where <some_column_name
is the column you want to check the <condition>
variable against and <another_column_name>
is the column you want to add to (can be a new column or one that already exists). <value_to_add>
is the value you want to add to that column/row.
This example doesn\'t work precisely with the question at hand, but it might be useful for someone wants to add a specific value based on a condition.
回答4:
The recommended way (according to the maintainers) to set a value is:
df.ix[\'x\',\'C\']=10
Using \'chained indexing\' (df[\'x\'][\'C\']
) may lead to problems.
See:
- https://stackoverflow.com/a/21287235/1579844
- http://pandas.pydata.org/pandas-docs/dev/indexing.html#indexing-view-versus-copy
- https://github.com/pydata/pandas/pull/6031
回答5:
Try using df.loc[row_index,col_indexer] = value
回答6:
This is the only thing that worked for me!
df.loc[\'C\', \'x\'] = 10
Learn more about .loc
here.
回答7:
you can use .iloc
.
df.iloc[[2], [0]] = 10
回答8:
In my example i just change it in selected cell
for index, row in result.iterrows():
if np.isnan(row[\'weight\']):
result.at[index, \'weight\'] = 0.0
\'result\' is a dataField with column \'weight\'
回答9:
df.loc[\'c\',\'x\']=10
This will change the value of cth row and
xth column.
回答10:
If you want to change values not for whole row, but only for some columns:
x = pd.DataFrame({\'A\': [1, 2, 3], \'B\': [4, 5, 6]})
x.iloc[1] = dict(A=10, B=-10)
回答11:
From version 0.21.1 you can also use .at
method. There are some differences compared to .loc
as mentioned here - pandas .at versus .loc, but it\'s faster on single value replacement
回答12:
In addition to the answers above, here is a benchmark comparing different ways to add rows of data to an already existing dataframe. It shows that using at or set-value is the most efficient way for large dataframes (at least for these test conditions).
- Create new dataframe for each row and...
- ... append it (13.0 s)
- ... concatenate it (13.1 s)
- Store all new rows in another container first, convert to new dataframe once and append...
- container = lists of lists (2.0 s)
- container = dictionary of lists (1.9 s)
- Preallocate whole dataframe, iterate over new rows and all columns and fill using
- ... at (0.6 s)
- ... set_value (0.4 s)
For the test, an existing dataframe comprising 100,000 rows and 1,000 columns and random numpy values was used. To this dataframe, 100 new rows were added.
Code see below:
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
\"\"\"
Created on Wed Nov 21 16:38:46 2018
@author: gebbissimo
\"\"\"
import pandas as pd
import numpy as np
import time
NUM_ROWS = 100000
NUM_COLS = 1000
data = np.random.rand(NUM_ROWS,NUM_COLS)
df = pd.DataFrame(data)
NUM_ROWS_NEW = 100
data_tot = np.random.rand(NUM_ROWS + NUM_ROWS_NEW,NUM_COLS)
df_tot = pd.DataFrame(data_tot)
DATA_NEW = np.random.rand(1,NUM_COLS)
#%% FUNCTIONS
# create and append
def create_and_append(df):
for i in range(NUM_ROWS_NEW):
df_new = pd.DataFrame(DATA_NEW)
df = df.append(df_new)
return df
# create and concatenate
def create_and_concat(df):
for i in range(NUM_ROWS_NEW):
df_new = pd.DataFrame(DATA_NEW)
df = pd.concat((df, df_new))
return df
# store as dict and
def store_as_list(df):
lst = [[] for i in range(NUM_ROWS_NEW)]
for i in range(NUM_ROWS_NEW):
for j in range(NUM_COLS):
lst[i].append(DATA_NEW[0,j])
df_new = pd.DataFrame(lst)
df_tot = df.append(df_new)
return df_tot
# store as dict and
def store_as_dict(df):
dct = {}
for j in range(NUM_COLS):
dct[j] = []
for i in range(NUM_ROWS_NEW):
dct[j].append(DATA_NEW[0,j])
df_new = pd.DataFrame(dct)
df_tot = df.append(df_new)
return df_tot
# preallocate and fill using .at
def fill_using_at(df):
for i in range(NUM_ROWS_NEW):
for j in range(NUM_COLS):
#print(\"i,j={},{}\".format(i,j))
df.at[NUM_ROWS+i,j] = DATA_NEW[0,j]
return df
# preallocate and fill using .at
def fill_using_set(df):
for i in range(NUM_ROWS_NEW):
for j in range(NUM_COLS):
#print(\"i,j={},{}\".format(i,j))
df.set_value(NUM_ROWS+i,j,DATA_NEW[0,j])
return df
#%% TESTS
t0 = time.time()
create_and_append(df)
t1 = time.time()
print(\'Needed {} seconds\'.format(t1-t0))
t0 = time.time()
create_and_concat(df)
t1 = time.time()
print(\'Needed {} seconds\'.format(t1-t0))
t0 = time.time()
store_as_list(df)
t1 = time.time()
print(\'Needed {} seconds\'.format(t1-t0))
t0 = time.time()
store_as_dict(df)
t1 = time.time()
print(\'Needed {} seconds\'.format(t1-t0))
t0 = time.time()
fill_using_at(df_tot)
t1 = time.time()
print(\'Needed {} seconds\'.format(t1-t0))
t0 = time.time()
fill_using_set(df_tot)
t1 = time.time()
print(\'Needed {} seconds\'.format(t1-t0))
回答13:
I too was searching for this topic and I put together a way to iterate through a DataFrame and update it with lookup values from a second DataFrame. Here is my code.
src_df = pd.read_sql_query(src_sql,src_connection)
for index1, row1 in src_df.iterrows():
for index, row in vertical_df.iterrows():
src_df.set_value(index=index1,col=u\'etl_load_key\',value=etl_load_key)
if (row1[u\'src_id\'] == row[\'SRC_ID\']) is True:
src_df.set_value(index=index1,col=u\'vertical\',value=row[\'VERTICAL\'])