Helper function code python

2019-07-01 23:31发布

问题:

I need to write a helper function that can be applied elsewhere in my program to reformat a string.

My first function process_DrugCount(dataframe) returns three data frames that look like this:

 MemberID          DSFS  DrugCount
2       61221204   2- 3 months          1
8       30786520   1- 2 months          1
11      28420460  10-11 months          1

My second function, replaceMonth(string) is a helper function that will reformat DSFS values (example: "2- 3 months" to "2_3"). The following code I have will accomplish this only under process_DrugCount(), not replacemonth(). DrugCount_Y1.replace({'DSFS': {r'(\d+)\s*\-\s*(\d+).*': r'\1_\2'}}, regex=True) How would I rewrite this under replaceMonth(). Here's all my code:

def process_DrugCount(drugcount):
    dc = pd.read_csv("DrugCount.csv")
    sub_map = {'1' : 1, '2':2, '3':3, '4':4, '5':5, '6':6, '7+' : 7}
    dc['DrugCount'] = dc.DrugCount.map(sub_map)
    dc['DrugCount'] = dc.DrugCount.astype(int)
    dc_grouped = dc.groupby(dc.Year, as_index=False)
    DrugCount_Y1 = dc_grouped.get_group('Y1')
    DrugCount_Y2 = dc_grouped.get_group('Y2')
    DrugCount_Y3 = dc_grouped.get_group('Y3')
    DrugCount_Y1.drop('Year', axis=1, inplace=True)
    DrugCount_Y2.drop('Year', axis=1, inplace=True)
    DrugCount_Y3.drop('Year', axis=1, inplace=True)
    print DrugCount_Y1
    a = DrugCount_Y1.replace({'DSFS': {r'(\d+)\s*\-\s*(\d+).*': r'\1_\2'}}, regex=True) #WORKS HERE!
    return (DrugCount_Y1,DrugCount_Y2,DrugCount_Y3)

# this function converts strings such as "1- 2 month" to "1_2"
def replaceMonth(string):
    string.replace({'DSFS': {r'(\d+)\s*\-\s*(\d+).*': r'\1_\2'}}, regex=True) #Doesn't change dash to underscore. 
    return a_new_string

回答1:

actually you don't need special function for that, because it's already there - replace():

In [32]: replacements = {
   ....:     'DSFS': {
   ....:         r'(\d+)\s*\-\s*(\d+).*': r'\1_\2'
   ....:     },
   ....:     'DrugCount': {
   ....:         r'\+': ''
   ....:     }
   ....: }

In [33]: dc
Out[33]:
   MemberID Year         DSFS DrugCount
0  48925661   Y2  9-10 months        7+
1  90764620   Y3  8- 9 months         3
2  61221204   Y1  2- 3 months         1

In [34]: dc.replace(replacements, regex=True, inplace=True)

In [35]: dc['DrugCount'] = dc.DrugCount.astype(int)

In [36]: dc
Out[36]:
   MemberID Year  DSFS  DrugCount
0  48925661   Y2  9_10          7
1  90764620   Y3   8_9          3
2  61221204   Y1   2_3          1

In [37]: dc.dtypes
Out[37]:
MemberID      int64
Year         object
DSFS         object
DrugCount     int32
dtype: object


回答2:

it was easier than that. Maybe I didn't ask the question right. All I needed to do was this:

def replaceMonth(string): replace_map = {'0- 1 month' : "0_1", "1- 2 months": "1_2", "2- 3 months": "2_3", "3- 4 months": '3_4', "4- 5 months": "4_5", "5- 6 months": "5_6", "6- 7 months": "6_7", \ "7- 8 months" : "7_8", "8- 9 months": "8_9", "9-10 months": "9_10", "10-11 months": "10_11", "11-12 months": "11_12"} a_new_string = string.map(replace_map) return a_new_string

Just renaming column names.