This error is a sudden occurrence. Nothing was changed on the Dataflow. We are seeing errors "NameError: global name 'firestore' is not defined [while running 'generatedPtransform-12478']" Looks like there is some problem installing the packages on the worker nodes
I tried the same pipeline locally on "DirectRunner" and it worked fine. We referred to the "NameErrors" documentation on the link "https://cloud.google.com/dataflow/docs/resources/faq#how-can-i-tell-what-version-of-the-cloud-dataflow-sdk-is-installedrunning-in-my-environment" and tried couple of things below
1.'save_main_session': True Pipeline parameter
2.moved all the package 'import' commands from global to function scope
We have the below packages in requirements.txt,
apache-beam[gcp]
google-cloud-firestore
- python-dateutil
import datetime
import json
import apache_beam as beam
from apache_beam.options.pipeline_options import PipelineOptions
from google.cloud import firestore
import yaml
from functools import reduce
from dateutil.parser import parse
class PubSubToDict(beam.DoFn):
<...to process elements>
class WriteToFS(beam.DoFn):
<...to write data to firestore>
pipeline_options = {
'project': PROJECT,
'staging_location': 'gs://' + BUCKET + '/staging',
'temp_location': 'gs://' + BUCKET + '/temp',
'runner': 'DataflowRunner',
'job_name': JOB_NAME,
'disk_size_gb': 100,
'save_main_session': True,
'region': 'europe-west1',
'requirements_file': 'requirements.txt',
'streaming': True
}
with beam.Pipeline(options=options) as p:
lines = (p | "Read from PubSub" >> beam.io.ReadFromPubSub(topic=TOPIC).with_output_types(bytes)
| "Transformation" >> beam.ParDo(PubSubToDict()))
FSWrite = (lines | 'Write To Firestore' >> beam.ParDo(WriteToFS()))```