I'm seeing this message in a job which indeed runs more slowly than similar jobs (with slightly different inputs).
What does it mean that there will be reiteration? Does it only affect performance or it means that my code could be running twice on the same inputs (my code does occasionally does have side effects).
Thanks! G
This means that the joined PCollection is too large to keep in memory, so that fetching elements from it will be less efficient than if the entire collection fit in memory. We reiterate over the materialized input to the CoGroupByKey, but your code is not re-run, so this only affects performance.
It's worth noting that code with side effects may be run more than once in the case of worker failure.