In a pandas Dataframe I want to applymap(somefunction)
using groupby
(using some column index values).
mcve_01.txt
pos index M1 M2 F1_x
16230484 141 G/G G/G G
16230491 141 C/C C/C C
16230503 141 T/T T/T T
16230524 141 T/T T/T T
16230535 141 . . T
16232072 211 A/A A/A A
16232072 211 A/A A/A A
16229783 211 C/C C/C G
16229992 211 A/A A/A G
16230007 211 T/T T/T A
16230011 263 G/G G/G C
16230049 263 A/A A/A T
16230174 263 . . T
16230190 263 A/A A/A T
16230260 263 A/A A/A G
I have function written to do some analyses for columns A, B, C, D where the values in A, B, C and D are list.
mcve_data = pd.read_csv('mcve_01.txt', sep='\t')
mcve_data.set_index(['pos', 'index'], append= True, inplace = True)
mcve_list = mcve_data.applymap(lambda c:[list(c)])
say the function is,
def mapfun(c):
if any(['.' in l for l in c]):
return '.'
if all(['|' in l for l in c]):
fun = zip
else:
fun = product
filt_set = set(['|','/'])
filt = partial(filter,lambda l: not (l in filt_set))
return ','.join('g'.join(t) for t in fun(*map(filt, c)))
Finally:
mcve_mm = (mcve_list+mcve_list.shift(1)).dropna(how='all').\
applymap(mapfun)
which gives me (final output):
pos index M1 M2 F1_x
16230484 141 CgG,CgG,CgG,CgG CgG,CgG,CgG,CgG CgG
16230491 141 TgC,TgC,TgC,TgC TgC,TgC,TgC,TgC TgC
..... ... TgT,TgT,TgT,TgT TgT,TgT,TgT,TgT TgT
. . TgT
. . AgT
AgA,AgA,AgA,AgA AgA,AgA,AgA,AgA AgA
CgA,CgA,CgA,CgA CgA,CgA,CgA,CgA GgA
AgC,AgC,AgC,AgC AgC,AgC,AgC,AgC GgG
TgA,TgA,TgA,TgA TgA,TgA,TgA,TgA AgG
GgT,GgT,GgT,GgT GgT,GgT,GgT,GgT CgA
AgG,AgG,AgG,AgG AgG,AgG,AgG,AgG TgC
So, this code works if I want to run the function (mapfun) for the whole dataframe without grouping. But, i want to run the function by grouping them by index values.
Unfortunately, I don't see any example of groupby and applymap together.
I tried then reindexing the index column and then wrap the function (mapfun) within apply, which didn't work.
mcve_mm = (mcve_list+mcve_list.shift(1)).dropna(how='all').groupby(['f1_index'], group_keys = False).apply(lambda x: [mapfun])
I didn't get any error but the function part got messed up when trying to group and then apply.
Output I am getting:
f1_index
141.0 [<function mapfun at 0x7fee93550f28>]
211.0 [<function mapfun at 0x7fee93550f28>]
263.0 [<function mapfun at 0x7fee93550f28>]
dtype: object
Expected output:
same as final output but the output (functional part) grouped by the common index values
Now, I want to take this function and applymap in this column by grouping the data/frame using the values in one of the column or index.
data_groupby = (df+df.shift(1)).dropna(how='all').\
applymap(fnc) using groupby
I tried resetting the index and then groupby using the index name. But, the def fnc() is specific to the data from columns A, B, C, D. Also, I am not finding any examples and tutorial that uses applymap along with groupby in pandas df.