We are using GCS as the data sink of a dataflow pipeline, and for some reason the output directory "shows" a different list of files every time I try "gsutil ls" on the directory. Specifically, the number of files should be exactly 4,000 (as the pipeline was specified to shard the output to 4,000 files). However, the list I see is some of those 4,000 files ($prefix-?????-of-04000) and some of the temp files ($prefix-temp-*). It's been 10+ hours since the dataflow job (2016-12-18_19_30_32-7274262445792076535) completed, and I am still seeing different file lists (it's not just increasing, but sometimes decreasing meaning some files disappear and then appear again). This is affect other dataflow pipelines we run which read from this directory.
Is this Dataflow issue or GCS issue, and how can we resolve this? I've seen this behavior of GCS before, but it was usually for the first few minutes after a dataflow pipeline was completed, but this time it seems to be on-going for a while.