As machine learning engineers or data scientists, we all got to the point where we built our beautiful models with wonderful test results to end up using them just in a PowerPoint presentation. The most standard way of interacting with a model is to use it in an offline setup, where we have some kind of dataset to play with. This is ok for experimentation and to build your initial model. But, the next step would be to put our precious model out in the wild so people can use it. This is what model serving is all about. It represents the mechanism of deploying the model so other people can interact with it. With model serving, you can move from experimentation to production.
The most common ways of serving a model are:
- Model-as-Dependency;
- Model-as-Service.
βοΈ In a Model-as-Dependency setup, the model will be used directly by the application. For example, in a Python project, it will be installed as a package through pip or integrated into your code directly from a git repository. I think this is the easiest way, conceptually, to serve a model, but it comes with some downsides. Where the application runs, you need to have the necessary hardware for your model, which sometimes is not possible. Also, the client of the model will always have to go through the pain of managing the dependencies of the model, which in some cases is a real pain.