concurrent.futures not parallelizing write

I have a list dataframe_chunk which contains chunks of a very large pandas dataframe.I would like to write every single chunk into a different csv, and to do so in parallel. However, I see the files being written sequentially and I'm not sure why this is the case. Here's the code:

import concurrent.futures as cfu

def write_chunk_to_file(chunk, fpath):  
    chunk.to_csv(fpath, sep=',', header=False, index=False)

pool = cfu.ThreadPoolExecutor(N_CORES)

futures = []
for i in range(N_CORES):
    fpath = '/path_to_files_'+str(i)+'.csv'
    futures.append(pool.submit( write_chunk_to_file(dataframe_chunk[i], fpath)))

for f in cfu.as_completed(futures):
    print("finished at ",time.time())

Any clues?

One thing that is stated in the Python 2.7.x threading docs but not in the 3.x docs is that Python cannot achieve true parallelism using the threading library - only one thread will execute at a time.

You should try using concurrent.futures with the ProcessPoolExecutor which uses separate processes for each job and therefore can achieve true parallelism on a multi-core CPU.

Update

Here is your program adapted to use the multiprocessing library instead:

#!/usr/bin/env python3

from multiprocessing import Process

import os
import time

N_CORES = 8

def write_chunk_to_file(chunk, fpath):  
    with open(fpath, "w") as f:
      for x in range(10000000):
        f.write(str(x))

futures = []

print("my pid:", os.getpid())
input("Hit return to start:")

start = time.time()
print("Started at:", start)

for i in range(N_CORES):
    fpath = './tmp/file-'+str(i)+'.csv'
    p = Process(target=write_chunk_to_file, args=(i,fpath))
    futures.append(p)

for p in futures:
  p.start()

print("All jobs started.")

for p in futures:
  p.join()

print("All jobs finished at ",time.time())

You can monitor the jobs with this shell command in another window:

while true; do clear; pstree 12345; ls -l tmp; sleep 1; done

(Replace 12345 with the pid emitted by the script.)

concurrent.futures not parallelizing write

问题:

回答1:

收藏的人(0)

concurrent.futures not parallelizing write

问题:

回答1:

收藏的人(0)

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮