I have a dataframe with Multiindex and would like to modify one particular level of the Multiindex. For instance, the first level might be strings and I may want to remove the white spaces from that index level:
df.index.levels[1] = [x.replace(' ', '') for x in df.index.levels[1]]
However, the code above results in an error:
TypeError: 'FrozenList' does not support mutable operations.
I know I can reset_index and modify the column and then re-create the Multiindex, but I wonder whether there is a more elegant way to modify one particular level of the Multiindex directly.
As mentioned in the comments, indexes are immutable and must be remade when modifying, but you do not have to use reset_index
for that, you can create a new multi-index directly:
df.index = pd.MultiIndex.from_tuples([(x[0], x[1].replace(' ', ''), x[2]) for x in df.index])
This example is for a 3-level index, where you want to modify the middle level. You need to change the size of the tuple for different level sizes.
Thanks to @cxrodgers's comment, I think the fastest way to do this is:
df.index = df.index.set_levels(df.index.levels[0].str.replace(' ', ''), level=0)
Old, longer answer:
I found that the list comprehension suggested by @Shovalt works but felt slow on my machine (using a dataframe with >10,000 rows).
Instead, I was able to use .set_levels
method, which was quite a bit faster for me.
%timeit pd.MultiIndex.from_tuples([(x[0].replace(' ',''), x[1]) for x in df.index])
1 loop, best of 3: 394 ms per loop
%timeit df.index.set_levels(df.index.get_level_values(0).str.replace(' ',''), level=0)
10 loops, best of 3: 134 ms per loop
In actuality, I just needed to prepend some text. This was even faster with .set_levels
:
%timeit pd.MultiIndex.from_tuples([('00'+x[0], x[1]) for x in df.index])
100 loops, best of 3: 5.18 ms per loop
%timeit df.index.set_levels('00'+df.index.get_level_values(0), level=0)
1000 loops, best of 3: 1.38 ms per loop
%timeit df.index.set_levels('00'+df.index.levels[0], level=0)
1000 loops, best of 3: 331 µs per loop
This solution is based on the answer in the link from the comment by @denfromufa ...
python - Multiindex and timezone - Frozen list error - Stack Overflow