Refusing to split GroupedShuffleRangeTracker propo

2019-07-14 07:20发布

I am sporadically getting the following errors:

W Refusing to split at '\x00\x00\x00\x15\xbc\x19)b\x00\x01': proposed split position is out of range ['\x00\x00\x00\x15\x00\xff\x00\xff\x00\xff\x00\xff\x00\x01', '\x00\x00\x00\x15\xbc\x19)b\x00\x01'). Position of last group processed was '\x00\x00\x00\x15\xbc\x19)a\x00\x01'.

When it happens, the error is logged every so often and the job never seems to end. Although it seems that it did actually complete the job otherwise.

In the last instance I am using 10 workers and have auto scaling disabled. I am using the Python implementation of Apache Beam.

1条回答
Bombasti
2楼-- · 2019-07-14 07:22

This is not an error, it's part of normal operation of a pipeline. We should probably reduce its logging level to INFO and rephrase it, because it very frequently confuses people.

This message (rather obscurely) signals that Dataflow is trying to apply dynamic rebalancing, but there's no work that can be further subdivided.

I.e. your job is stuck doing something non-parallelizable on a small number of workers, while other workers are staying idle. To investigate this further, one would need to look at the code of your job and the Dataflow job id.

查看更多
登录 后发表回答