Can I use Python asyncio to slice and save DataFra

2019-09-16 04:30发布

问题:

As the title says - is it possible to write an asyncio event loop that will slice DataFrame by unique values in a certain column and save it on my drive? And maybe more importantly - is it faster?

What I've tried is something like this:

async def a_split(dist,df):
    temp_df = df[df.district == dist]
    await temp_df.to_csv('{}.csv'.format(d))

async def m_lp(df):
    for dist in df.district.unique().tolist():
        await async_slice(dist,df)

loop = asyncio.get_event_loop()

loop.run_until_complete(m_lp(dfTotal))  
loop.close() 

But I'm getting a following error:

TypeError: object NoneType can't be used in 'await' expression

If it's not obvious from my attempt, I'm very new to asyncio and I'm not sure how it works. Apologies if this is a stupid question.

If asyncio is not a good tool for the job - is there a better one?

Edit:

Full traceback below:

    ---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-22-2bc2373d2920> in <module>()
      2 loop = asyncio.get_event_loop()
      3 
----> 4 loop.run_until_complete(m_lp(dfTotal))
      5 loop.close()

C:\Users\5157213\AppData\Local\Continuum\Anaconda3\envs\python36\lib\asyncio\base_events.py in run_until_complete(self, future)
    464             raise RuntimeError('Event loop stopped before Future completed.')
    465 
--> 466         return future.result()
    467 
    468     def stop(self):

<ipython-input-20-9e91c0b1b06f> in m_lp(df)
      1 async def m_lp(df):
      2     for dist in df.district.unique().tolist():
----> 3         await a_split(dist,df)

<ipython-input-18-200b08417159> in a_split(dist, df)
      1 async def a_split(dist,df):
      2     temp = df[df.district == dist]
----> 3     await temp.to_csv('C:/Users/5157213/Desktop/Portfolio/{}.csv'.format(dist))

TypeError: object NoneType can't be used in 'await' expression

回答1:

As far as I know there is no asyncio support as such in Pandas. I think the single-threaded event-based architecture is not the best tool in the systems where you have a dozens of other options to work with load/large data ie. for a large dataset take a look on dask.

The error you get is because you tried to await function Dataframe.to_csv that does not return Future (or any other awaitable object), but the None.