Image for Google Cloud Dataflow instances

2019-09-19 08:37发布

When I run Dataflow job, it takes my small package (setup.py or requirements.txt) and uploads it to run on the Dataflow instances.

But what is actually running on the Dataflow instance? I got a stacktrace recently:

File "/usr/lib/python2.7/httplib.py", line 1073, in _send_request
   self.endheaders(body) 
File "/usr/lib/python2.7/httplib.py", line 1035, in endheaders
  self._send_output(message_body) 
File "/usr/lib/python2.7/httplib.py", line 877, in _send_output
  msg += message_body
TypeError: must be str, not unicode
[while running 'write to datastore/Convert to Mutation']

But in theory, if I'm doing str += unicode, it implies I might not be running this Python patch? Can you point to the docker image that these jobs are running, so I can know what version of Python I'm working with, and make sure I'm not barking up the wrong tree here?

The cloud console shows me the instance template, which seems to point to dataflow-dataflow-owned-resource-20170308-rc02, but it seems I don't have permission to look at it. Is the source for it online anywhere?

1条回答
太酷不给撩
2楼-- · 2019-09-19 08:53

Haven't tested (and maybe there is an easier way), but something like this might do the trick:

  1. ssh into one of the Dataflow workers from the console
  2. run docker ps to get the container id
  3. run docker inspect <container_id>
  4. grab the image id from the field Image
  5. run docker history --no-trunc <image>

Then you should find what you are after.

查看更多
登录 后发表回答