I know how to load a model into a container and also I know that we can create a static config file and when we run a tensorflow serving container pass it to the container and later use one the models inside that config files but I want to know if there is any way to hot load a completely new model (not a newer version of the previous model) into a running tensorflow serving container. What I mean is we run the container with model-A and later we load model-B into the container and use it, can we do this? If yes how?
可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试):
问题:
回答1:
You can.
First you need to copy the new model files to model_base_path
you specified when launching the tf serve, so that the server can see the new model. The directory layout is usually this: $MODEL_BASE_PATH/$model_a/$version_a/* and $MODEL_BASE_PATH/$model_b/$version_b/*
Then you need to refresh the tf serve with a new model_config_file that includes the entry for the new model. See here on how to add entries to the model config file. To make the server take in the new config, there are two ways to do it:
- save the new config file and restart the tf serve.
- reload the new model config on the fly without restarting the tf serve. This service is defined in model_service.proto as HandleReloadConfigRequest, but the service's REST api does not seem to support it, so you need to rely on the gRPC API. Sadly the Python client for gRPC seems unimplemented. I managed to generate Java client code from protobuf files, but it is quite complex. An example here explains how to generate Java client code for doing gRPC inferencing, and doing handleReloadConfigRequest() is very similar.