I'm following the Serving Inception Model with TensorFlow Serving and Kubernetes workflow and everything work well up to the point of the final serving of the inception model via k8s when I am trying to do inference from a local host.
I'm getting the pods running and the output of $kubectl describe service
inception-service is consistent with what is suggested by the workflow in the Serving Inception Model with TensorFlow Serving and Kubernetes.
However, when running inference things don't work. Here is the trace:
$bazel-bin/tensorflow_serving/example/inception_client --server=104.155.175.138:9000 --image=cat.jpg
Traceback (most recent call last):
File "/home/dimlyus/serving/bazel-
bin/tensorflow_serving/example/inception_client.runfi
les/tf_serving/tensorflow_serving/example/inception_client.py", line 56, in
tf.app.run()
File "/home/dimlyus/serving/bazel-
bin/tensorflow_serving/example/inception_client.runfi
les/org_tensorflow/tensorflow/python/platform/app.py", line 48, in run
_sys.exit(main(_sys.argv[:1] + flags_passthrough))
File "/home/dimlyus/serving/bazel-
bin/tensorflow_serving/example/inception_client.runfi
les/tf_serving/tensorflow_serving/example/inception_client.py", line 51, in
main
result = stub.Predict(request, 60.0) # 10 secs timeout
File "/usr/local/lib/python2.7/dist-
packages/grpc/beta/_client_adaptations.py", line 32
4, in call
self._request_serializer, self._response_deserializer)
File "/usr/local/lib/python2.7/dist-
packages/grpc/beta/_client_adaptations.py", line 21
0, in _blocking_unary_unary
raise _abortion_error(rpc_error_call)
grpc.framework.interfaces.face.face.AbortionError:
AbortionError(code=StatusCode.UNAVAILABLE, details="Connect Failed")
I am running everything on Google Cloud. The setup is done from a GCE instance and the k8s is run inside of Google Container Engine. The setup of the k8s follows the instructions from the workflow linked above and uses the inception_k8s.yaml file.
The service is set as follows:
apiVersion: v1
kind: Service
metadata:
labels:
run: inception-service
name: inception-service
spec:
ports:
- port: 9000
targetPort: 9000
selector:
run: inception-service
type: LoadBalancer
Any advice on how to troubleshoot this would be greatly appreciated!
I figured it out with the help of several tensorflow experts. Things started to work after I introduced the following changes:
First, I changed inception_k8s.yaml file in the following way:
Source:
Modification:
Second, I exposed the deployment:
and I used the IP generated from exposing the deployment, not the inception-service IP.
From this point I am able to run the inference from an external host where the client is installed using the command from the Serving Inception Model with TensorFlow Serving and Kubernetes.
The error message seems to indicate that your client cannot connect to the server. Without some additional information it is hard to trouble shoot. If you post your deployment and service configuration as well as give some information about the environement (is it running on a cloud? which one? what are your security rules? load balancers?) we may be able to help better.
But here some things that you can check right away:
If you are running in some kind of cloud environment (Amazon, Google, Azure, etc.), they all have security rules where you need to explicitly open the ports on the nodes running your kubernetes cluster. So every port that your Tensorflow deployment/service is using should be opened on the Controller and Worker nodes.
Did you deploy only a
Deployment
for the app or also aService
? If you run aService
how does it expose? Did you forget to enable aNodePort
?Update: Your service type is load balancer. So there should be a separate load balancer be created in GCE. you need to get the IP of the load balancer and access the service through the load balancer's ip. Please see the section 'Finding Your IP' in this link https://kubernetes.io/docs/tasks/access-application-cluster/create-external-load-balancer/