how do we choose --nthreads and --nprocs per worke

2019-05-04 07:05发布

how do we choose --nthreads and --nprocs per worker in Dask distributed? i have 3 workers , with 4 cores and one thread per core on 2 workers and 8 cores on 1 worker (according to the output of 'lscpu' Linux command on each worker)

标签： distributed-computing dask dask-distributed

1条回答

Viruses.

2楼-- · 2019-05-04 07:40

It depends on your workload

By default Dask creates a single process with as many threads as you have logical cores on your machine (as determined by multiprocessing.cpu_count()).

dask-worker ... --nprocs 1 --nthreads 8  # assuming you have eight cores
dask-worker ...                          # this is actually the default setting

Using few processes and many threads per process is good if you are doing mostly numeric workloads, such as are common in Numpy, Pandas, and Scikit-Learn code, which is not affected by Python's Global Interpreter Lock (GIL).

However, if you are spending most of your compute time manipulating Pure Python objects like strings or dictionaries then you may want to avoid GIL issues by having more processes with fewer threads each

dask-worker ... --nprocs 8 --nthreads 1

Based on benchmarking you may find that a more balanced split is better

dask-worker ... --nprocs 4 --nthreads 2

Using more processes avoids GIL issues, but adds costs due to inter-process communication. You would want to avoid many processes if your computations require a lot of inter-worker communication..

0人赞添加讨论(0) 举报

how do we choose --nthreads and --nprocs per worke

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间