My Dataflow pipeline is running extremely slow. Its processing approximately 4 elements/2 with 30 worker threads. A single local machine running the same operations (but not in the dataflow framework) is able to process 7 elements/s. The script is written in Python. Data is read from BigQuery.
The workers are n1-standard, and all look to be at 100% CPU utilization.
The operations contained within the combine are:
- tokenizes the record and applies stop word filtering (nltk)
- stem the word (nltk)
- lookup the word in a dictionary
- increment the count of said word in a dictionary
Each record is approximately 30-70 KB. Total number of records is ~ 9,000,000 from BigQuery (Log shows all records have been exported successfully)
With 30 worker threads, I expect the throughput to be a lot higher than my single local machine and certainly not half as fast.
What could the problem be?