How to replace comma with dash using python pandas

2019-09-17 00:23发布

I have a file like this:

name|count_dic
name1 |{'x1':123,'x2,bv.':435,'x3':4}
name2|{'x2,bv.':435,'x5':98}
etc.

I am trying to load the data into a dataframe and count the number of keys in in the count_dic. The problem is that the dic items are separated with comma and also some of the keys contain comma. I am looking for a way to be able to replace commas in the key with '-' and then be able to separate different key,value pairs in the count_dic.something like this:

name|count_dic
name1 |{'x1':123,'x2-bv.':435,'x3':4}
name2|{'x2-bv.':435,'x5':98}
etc.

This is what I have done.

df = pd.read_csv('file' ,names = ['name','count_dic'],delimiter='|')
data = json.loads(df.count_dic)

and I get the following error:

TypeError: the JSON object must be str, not 'Series'

Does any body have any suggestions?

2条回答
乱世女痞
2楼-- · 2019-09-17 00:57

You can use ast.literal_eval as a converter for loading the dataframe, as it appears you have data that's more Python dict-like... JSON uses double quotes - eg:

import pandas as pd
import ast

df = pd.read_csv('file', delimiter='|', converters={'count_dic': ast.literal_eval})

Gives you a DF of:

    name                            count_dic
0  name1  {'x2,bv.': 435, 'x3': 4, 'x1': 123}
1  name2            {'x5': 98, 'x2,bv.': 435}

Since count_dic is actually a dict, then you can apply len to get the number of keys, eg:

df.count_dic.apply(len)

Results in:

0    3
1    2
Name: count_dic, dtype: int64
查看更多
劳资没心,怎么记你
3楼-- · 2019-09-17 01:18

Once df is defined as above:

# get a value to play around with
td = df.iloc[0].count_dic
td
# that looks like a dict definition... evaluate it?
eval(td)
eval(td).keys() #yup!
#apply to the whole df
df.count_dic = map(eval, df.count_dic)

#and a hint towards your key-counting
map(lambda i: i.keys(), df.count_dic)
查看更多
登录 后发表回答