When running a PySpark job on the dataproc server like this
gcloud --project <project_name> dataproc jobs submit pyspark --cluster <cluster_name> <python_script>
my print statements don't show up in my terminal.
Is there any way to output data onto the terminal in PySpark when running jobs on the cloud?
Edit: I would like to print/log info from within my transformation. For example:
def print_funct(l):
print(l)
return l
rddData.map(lambda l: print_funct(l)).collect()
Should print every line of data in the RDD rddData
.
Doing some digging, I found this answer for logging, however, testing it provides me the results of this question, whose answer states that that logging isn't possible within the transformation