I have read through the Beam documentation and also looked through Python documentation but haven't found a good explanation of the syntax being used in most of the example Apache Beam code.
Can anyone explain what the _
, |
, and >>
are doing in the below code? Also is the text in quotes ie 'ReadTrainingData' meaningful or could it be exchanged with any other label? In other words how is that label being used?
train_data = pipeline | 'ReadTrainingData' >> _ReadData(training_data)
evaluate_data = pipeline | 'ReadEvalData' >> _ReadData(eval_data)
input_metadata = dataset_metadata.DatasetMetadata(schema=input_schema)
_ = (input_metadata
| 'WriteInputMetadata' >> tft_beam_io.WriteMetadata(
os.path.join(output_dir, path_constants.RAW_METADATA_DIR),
pipeline=pipeline))
preprocessing_fn = reddit.make_preprocessing_fn(frequency_threshold)
(train_dataset, train_metadata), transform_fn = (
(train_data, input_metadata)
| 'AnalyzeAndTransform' >> tft.AnalyzeAndTransformDataset(
preprocessing_fn))