python api to launch template unknown name cannot

2019-08-20 05:07发布

问题:

I've created and run a DataPrep job, and am trying to use the template from python on app engine. I can successfully start a job using

gcloud dataflow jobs run 
    --parameters "inputLocations={\"location1\":\"gs://bucket/folder/*\"},
outputLocations={\"location1\":\"project:dataset.table\"},
customGcsTempLocation=gs://bucket/DataPrep-beta/temp"
--gcs-location gs://bucket/DataPrep-beta/temp/cloud-dataprep-templatename_template

however trying to use python on app engine;

service = build('dataflow', 'v1b3', credentials=credentials)
input1  = {"location1": "{i1}".format(i1=input)}
output1 = {"location1": "{o1}".format(o1=output)}

print('input location: {}'.format(input1))

GCSPATH="gs://{bucket}/{template}".format(bucket=BUCKET, template=template)
BODY = {
    "jobName": "{jobname}".format(jobname=JOBNAME),
    "parameters": {
        "inputLocations":  input1,
        "outputLocations": output1,
        "customGcsTempLocation": "gs://{}/DataPrep-beta/temp".format(BUCKET)
     }
}

print("dataflow request body: {}".format(BODY))
request = service.projects().templates().launch(projectId=PROJECT, gcsPath=GCSPATH, body=BODY)
response = request.execute()

I get back;

"Invalid JSON payload received. Unknown name "location1" at 
  'launch_parameters.parameters[1].value': Cannot find field.
Invalid JSON payload received. Unknown name "location1" at 
  'launch_parameters.parameters[2].value': Cannot find field."

Nothing I've tried seems to support passing a dict or a json.dumps() or a str() to "inputLocations" or "outputLocations".

回答1:

The issue is with the format that you are passing input1 and output1. They need to be between quotation marks like this:

input1 = '{"location1":"' + input + '" }'
output1 = '{"location1":"' + output + '" }'

I have tried sending the request with the same approach than you and it fails. It also fails if I later parse it back to string or json because it doesn't parse quotes correctly.



回答2:

Surely the format is something to do with your problem. I had the same use case to solve, but the output would be the files, instead of google bigquery dataset. and for me, the code with the following BODY parameter is initiating the google dataflow pipeline:

BODY = {
        "jobName": "{jobname}".format(jobname=JOBNAME),
        "parameters": {
            "inputLocations" : "{{\"location1\":\"gs://{bucket}/employee/input/patient.json\"}}".format(bucket=BUCKET),
            "outputLocations": "{{\"location1\":\"gs://{bucket}/employee/employees.json/file\",\"location2\":\"gs://{bucket}/jobrun/employees_314804/.profiler/profilerTypeCheckHistograms.json/file\",\"location3\":\"gs://{bucket}/jobrun/employees_314804/.profiler/profilerValidValueHistograms.json/file\"}}".format(bucket=BUCKET)
         },
         "environment": {
            "tempLocation": "gs://{bucket}/employee/temp".format(bucket=BUCKET),
            "zone": "us-central1-f"
         }
    }