How to store a new dataframe after using a self de

2019-08-13 23:20发布

问题:

I am just starting to use user-defined functions, so this is probably not a very complex question, forgive me.

I have a few dataframes, which all have a column named 'interval_time' (for example) and I would like to rename this column 'Timestamp'.

I know that I can do this manually with this;

df = df.rename(index=str, columns={'interval_time': 'Timestamp'})

but now I would like to define a function called rename that does this for me. I have seen that this works;

def rename(data):
    print(data.rename(index=str, columns={'interval_time': 'Timestamp'}))

but I can't seem to figure out to save the renamed dataframe, I have tried this;

def rename(data):
    data = data.rename(index=str, columns={'interval_time': 'Timestamp'})

The dataframes that I am using have the following form;

df_scada
              interval_time                 A         ...             X                 Y 
0       2010-11-01 00:00:00                0.0        ...                396.36710         381.68860
1       2010-11-01 00:05:00                0.0        ...                392.97974         381.40634
2       2010-11-01 00:10:00                0.0        ...                390.15695         379.99493
3       2010-11-01 00:15:00                0.0        ...                389.02786         379.14810

回答1:

There are a few points to note:

  • You need to use return in your function.
  • It's good practice to make such functions generic. For example, your input and output column names can be arguments with default values set.
  • Pandas offers pd.DataFrame.pipe to facilitate method chaining.
  • You should not name your function the same as the Pandas method. This will only lead to confusion.

Putting these elements together:

def rename_col(data, col_in='interval_time', col_out='Timestamp'):
    return data.rename(index=str, columns={col_in: col_out})

df = df.pipe(rename_col)

This is a trivial example, which probably doesn't require a user-defined function. However, the above advice may help when you write more complex functions.



回答2:

Without inplace=True, the function creates a new object, which needs to be returned:

import pandas as pd

def rename(data):
    return data.rename(index=str, columns={'interval_time': 'Timestamp'})

data = pd.DataFrame([1,2,3,4], columns=['interval_time'])
renamed_data = rename(data)

If no new DF should be created, set inplace=True in the function.



回答3:

You do not need to re-assign the dataframe after you call the rename function because pandas.DataFrame is a mutable object and therefore it is passed by reference. Have a look on this link on how python objects are passed

https://jeffknupp.com/blog/2012/11/13/is-python-callbyvalue-or-callbyreference-neither/

Also, you should use the inplace property so that you do not create a new object inside the function. Your code in the rename function will then look like

def rename(data):
    data.rename(index=str, columns={'interval_time': 'Timestamp'}, inplace=True)

After you call rename(df) your DataFrame df has its columns renamed.