I am just starting to use user-defined functions, so this is probably not a very complex question, forgive me.
I have a few dataframes, which all have a column named 'interval_time' (for example) and I would like to rename this column 'Timestamp'.
I know that I can do this manually with this;
df = df.rename(index=str, columns={'interval_time': 'Timestamp'})
but now I would like to define a function called rename that does this for me. I have seen that this works;
def rename(data):
print(data.rename(index=str, columns={'interval_time': 'Timestamp'}))
but I can't seem to figure out to save the renamed dataframe, I have tried this;
def rename(data):
data = data.rename(index=str, columns={'interval_time': 'Timestamp'})
The dataframes that I am using have the following form;
df_scada
interval_time A ... X Y
0 2010-11-01 00:00:00 0.0 ... 396.36710 381.68860
1 2010-11-01 00:05:00 0.0 ... 392.97974 381.40634
2 2010-11-01 00:10:00 0.0 ... 390.15695 379.99493
3 2010-11-01 00:15:00 0.0 ... 389.02786 379.14810
There are a few points to note:
- You need to use
return
in your function.
- It's good practice to make such functions generic. For example, your input and output column names can be arguments with default values set.
- Pandas offers
pd.DataFrame.pipe
to facilitate method chaining.
- You should not name your function the same as the Pandas method. This will only lead to confusion.
Putting these elements together:
def rename_col(data, col_in='interval_time', col_out='Timestamp'):
return data.rename(index=str, columns={col_in: col_out})
df = df.pipe(rename_col)
This is a trivial example, which probably doesn't require a user-defined function. However, the above advice may help when you write more complex functions.
Without inplace=True, the function creates a new object, which needs to be returned:
import pandas as pd
def rename(data):
return data.rename(index=str, columns={'interval_time': 'Timestamp'})
data = pd.DataFrame([1,2,3,4], columns=['interval_time'])
renamed_data = rename(data)
If no new DF should be created, set inplace=True in the function.
You do not need to re-assign the dataframe after you call the rename
function because pandas.DataFrame
is a mutable object and therefore it is passed by reference. Have a look on this link on how python objects are passed
https://jeffknupp.com/blog/2012/11/13/is-python-callbyvalue-or-callbyreference-neither/
Also, you should use the inplace
property so that you do not create a new object inside the function. Your code in the rename function will then look like
def rename(data):
data.rename(index=str, columns={'interval_time': 'Timestamp'}, inplace=True)
After you call rename(df)
your DataFrame df
has its columns renamed.