I have a gcs folder as below:
gs://<bucket-name>/<folder-name>/dt=2017-12-01/part-0000.tsv
/dt=2017-12-02/part-0000.tsv
/dt=2017-12-03/part-0000.tsv
/dt=2017-12-04/part-0000.tsv
...
I want to match only the files under dt=2017-12-02
and dt=2017-12-03
using sc.textFile()
in Scio, which uses TextIO.Read.from()
underneath as far as I know.
I've tried
gs://<bucket-name>/<folder-name>/dt={2017-12-02,2017-12-03}/*.tsv
and
gs://<bucket-name>/<folder-name>/dt=2017-12-(02|03)/*.tsv
Both match zero files:
INFO org.apache.beam.sdk.io.FileBasedSource - Filepattern gs://<bucket-name>/<folder-name>/dt={2017-12-02,2017-12-03}/*.tsv matched 0 files with total size 0
INFO org.apache.beam.sdk.io.FileBasedSource - Filepattern gs://<bucket-name>/<folder-name>/dt=2017-12-(02|03)/*.tsv matched 0 files with total size 0
What should be the valid filepattern on doing this?