I have the following dataframe df:
print(df)
Food Taste
0 Apple NaN
1 Banana NaN
2 Candy NaN
3 Milk NaN
4 Bread NaN
5 Strawberry NaN
I am trying to replace values in a range of rows using iloc:
df.Taste.iloc[0:2] = 'good'
df.Taste.iloc[2:6] = 'bad'
But it returned the following SettingWithCopyWarning message:
SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame
So, I found this Stackoverflow page and tried this:
df.iloc[0:2, 'Taste'] = 'good'
df.iloc[2:6, 'Taste'] = 'bad'
Unfortunately, it returned the following error:
ValueError: Can only index by location with a [integer, integer slice (START point is INCLUDED, END point is EXCLUDED), listlike of integers, boolean array]
What would be the proper way to use iloc in this situation? Also, is there a way to combine these two lines above?
You can use Index.get_loc
for position of column Taste
, because DataFrame.iloc
select by positions:
#return second position (python counts from 0, so 1)
print (df.columns.get_loc('Taste'))
1
df.iloc[0:2, df.columns.get_loc('Taste')] = 'good'
df.iloc[2:6, df.columns.get_loc('Taste')] = 'bad'
print (df)
Food Taste
0 Apple good
1 Banana good
2 Candy bad
3 Milk bad
4 Bread bad
5 Strawberry bad
Possible solution with ix
is not recommended because deprecate ix in next version of pandas:
df.ix[0:2, 'Taste'] = 'good'
df.ix[2:6, 'Taste'] = 'bad'
print (df)
Food Taste
0 Apple good
1 Banana good
2 Candy bad
3 Milk bad
4 Bread bad
5 Strawberry bad
.iloc uses integer location, whereas .loc uses name. Both options also take both row AND column identifiers (for DataFrames). Your inital code didn't work because you didn't specify within the .iloc call which column you're selecting. The second code line you tried didn't work because you mixed integer location with column name, and .iloc only accepts integer location. If you don't know the column integer location, you can use Index.get_loc
in place as suggested above. Otherwise, use the integer position, in this case 1.
df.iloc[0:2, df.columns.get_loc('Taste')] = 'good'
df.iloc[2:6, df.columns.get_loc('Taste')] = 'bad'
is equal to:
df.iloc[0:2, 1] = 'good'
df.iloc[2:6, 1] = 'bad'
in this particular situation.
Purely integer-location based indexing for selection by position.. eg :-
lang_sets = {}
lang_sets['en'] = train[train.lang == 'en'].iloc[:,:-1]
lang_sets['ja'] = train[train.lang == 'ja'].iloc[:,:-1]
lang_sets['de'] = train[train.lang == 'de'].iloc[:,:-1]
I prefer to use .loc
in such cases, and explicitly use the index of the DataFrame if you want to select on position:
df.loc[df.index[0:2], 'Taste'] = 'good'
df.loc[df.index[2:6], 'Taste'] = 'bad'