I have a DataFrame using pandas and column labels that I need to edit to replace the original column labels.
I'd like to change the column names in a DataFrame A
where the original column names are:
['$a', '$b', '$c', '$d', '$e']
to
['a', 'b', 'c', 'd', 'e'].
I have the edited column names stored it in a list, but I don't know how to replace the column names.
Column names vs Names of Series
I would like to explain a bit what happens behind the scenes.
Dataframes are a set of Series.
Series in turn are an extension of a
numpy.array
numpy.array
s have a property.name
This is the name of the series. It is seldom that pandas respects this attribute, but it lingers in places and can be used to hack some pandas behaviors.
Naming the list of columns
A lot of answers here talks about the
df.columns
attribute being alist
when in fact it is aSeries
. This means it has a.name
attribute.This is what happens if you decide to fill in the name of the columns
Series
:Note that the name of the index always comes one column lower.
Artifacts that linger
The
.name
attribute lingers on sometimes. If you setdf.columns = ['one', 'two']
then thedf.one.name
will be'one'
.If you set
df.one.name = 'three'
thendf.columns
will still give you['one', 'two']
, anddf.one.name
will give you'three'
BUT
pd.DataFrame(df.one)
will returnBecause pandas reuses the
.name
of the already definedSeries
.Multi level column names
Pandas has ways of doing multi layered column names. There is not so much magic involved but I wanted to cover this in my answer too since I don't see anyone picking up on this here.
This is easily achievable by setting columns to lists, like this:
One line or Pipeline solutions
I'll focus on two things:
OP clearly states
I do not want to solve the problem of how to replace
'$'
or strip the first character off of each column header. OP has already done this step. Instead I want to focus on replacing the existingcolumns
object with a new one given a list of replacement column names.df.columns = new
wherenew
is the list of new columns names is as simple as it gets. The drawback of this approach is that it requires editing the existing dataframe'scolumns
attribute and it isn't done inline. I'll show a few ways to perform this via pipelining without editing the existing dataframe.Setup 1
To focus on the need to rename of replace column names with a pre-existing list, I'll create a new sample dataframe
df
with initial column names and unrelated new column names.Solution 1
pd.DataFrame.rename
It has been said already that if you had a dictionary mapping the old column names to new column names, you could use
pd.DataFrame.rename
.However, you can easily create that dictionary and include it in the call to
rename
. The following takes advantage of the fact that when iterating overdf
, we iterate over each column name.This works great if your original column names are unique. But if they are not, then this breaks down.
Setup 2
non-unique columns
Solution 2
pd.concat
using thekeys
argumentFirst, notice what happens when we attempt to use solution 1:
We didn't map the
new
list as the column names. We ended up repeatingy765
. Instead, we can use thekeys
argument of thepd.concat
function while iterating through the columns ofdf
.Solution 3
Reconstruct. This should only be used if you have a single
dtype
for all columns. Otherwise, you'll end up withdtype
object
for all columns and converting them back requires more dictionary work.Single
dtype
Mixed
dtype
Solution 4
This is a gimmicky trick with
transpose
andset_index
.pd.DataFrame.set_index
allows us to set an index inline but there is no correspondingset_columns
. So we can transpose, thenset_index
, and transpose back. However, the same singledtype
versus mixeddtype
caveat from solution 3 applies here.Single
dtype
Mixed
dtype
Solution 5
Use a
lambda
inpd.DataFrame.rename
that cycles through each element ofnew
In this solution, we pass a lambda that takes
x
but then ignores it. It also takes ay
but doesn't expect it. Instead, an iterator is given as a default value and I can then use that to cycle through one at a time without regard to what the value ofx
is.And as pointed out to me by the folks in sopython chat, if I add a
*
in betweenx
andy
, I can protect myy
variable. Though, in this context I don't believe it needs protecting. It is still worth mentioning.I think this method is useful:
This method allows you to change column names individually.
DataFrame -- df.rename() will work.
In case you don't want the row names
df.columns = ['a', 'b',index=False]
If you've got the dataframe, df.columns dumps everything into a list you can manipulate and then reassign into your dataframe as the names of columns...
Best way? IDK. A way - yes.
A better way of evaluating all the main techniques put forward in the answers to the question is below using cProfile to gage memory & execution time. @kadee, @kaitlyn, & @eumiro had the functions with the fastest execution times - though these functions are so fast we're comparing the rounding of .000 and .001 seconds for all the answers. Moral: my answer above likely isn't the 'Best' way.