Looking for a fast way to get a row in a pandas dataframe into a ordered dict with out using list. List are fine but with large data sets will take to long. I am using fiona GIS reader and the rows are ordereddicts with the schema giving the data type. I use pandas to join data. I many cases the rows will have different types so I was thinking turning into a numpy array with type string might do the trick.
可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试):
问题:
回答1:
Unfortunately you can't just do an apply (since it fits it back to a DataFrame):
In [1]: df = pd.DataFrame([[1, 2], [3, 4]], columns=['a', 'b'])
In [2]: df
Out[2]:
a b
0 1 2
1 3 4
In [3]: from collections import OrderedDict
In [4]: df.apply(OrderedDict)
Out[4]:
a b
0 1 2
1 3 4
But you can use a list comprehension with iterrows:
In [5]: [OrderedDict(row) for i, row in df.iterrows()]
Out[5]: [OrderedDict([('a', 1), ('b', 2)]), OrderedDict([('a', 3), ('b', 4)])]
If it was possible to use a generator, rather than a list, to whatever you were working with this will usually be more efficient:
In [6]: (OrderedDict(row) for i, row in df.iterrows())
Out[6]: <generator object <genexpr> at 0x10466da50>
回答2:
This is implemented in pandas 0.21.0+
in function to_dict
with parameter into
:
df = pd.DataFrame([[1, 2], [3, 4]], columns=['a', 'b'])
print (df)
a b
0 1 2
1 3 4
d = df.to_dict(into=OrderedDict, orient='index')
print (d)
OrderedDict([(0, OrderedDict([('a', 1), ('b', 2)])), (1, OrderedDict([('a', 3), ('b', 4)]))])