Really simple way to deploy your machine learning model online

One of the simplest ways of deploying your model online in minutes and inferring from anywhere in the world

11 min readJan 11, 2021

Imagine this, you’ve just trained your machine learning (ML) model and it is demonstrating an impressive accuracy, something to tell fellow colleagues about. The model works really well in your Jupyter notebook locally, but it is time to let others use it from wherever they want to, perhaps as a web-service.

This is the exact question that I was faced with after I had painstakingly trained my ML model to identify conditions from one’s walking style. I wanted to show it to the world and have everyone use it in their work, which is usually the academic dream. I knew that the solution was to leverage cloud services such as Amazon Web Services (AWS) or the Microsoft Azure cloud, but wasn’t sure how exactly. This post is for all the academics, machine learning tinkerers or even professionals starting out and wanting to bring their ML model to life, online.

The Tech Stack : Python, Flask, Docker and AWS EC2

The process of deploying the model online can be listed as follows:
1. Wrapping the trained ML model into a flash application
2. Using docker to containerize the flask application
3. Hosting the container on AWS EC2 and inferring from the model as a web-service

CAUTION: This post describes a quick and easy way to make the model available online as a web-service. This is by no means production ready and should not be used in a commercial setting. The crux of this post is to provide a quick and easy way of deploy the ML model for consumption online.

We are going ahead with the assumption that there exists a pre-trained ML model. Let’s call that model.pkl a .pickle file which is a serialized format of storing ML models. We’re going ahead with the assumption that the model is an SVM classifier, trained using the popular ML library, scikit-learn. There are two important functionalities of the model:

1. The model requires 4 features per test sample in the form of a numpy array to perform the inference
2. The model inference can be executed by using the .predict() function call on the model, passing the test sample as input

Wrapping the model as a Flask Web Service

Performing an inference from our ML model is as simple as calling the .predict() function on the model itself. We could do it locally from out favorite Python IDE (e.g. Jupyter notebook) but we want to be able to call it from anywhere on the internet. Thus, we would wrap this functionality in Flask.

Flask is a simple yet powerful Python micro-webserver framework that lets us quickly and easily build REST API based web-services. In order to do this:

First we define a simple function to load out trained model:

Here, we create a variable called model and load the actual ML model into it using the load_model() function. We’ve made it a global variable to be able to access it from any part of the code

2. Now we setup the Flask object and define a home endpoint (something that will be run by default), which when hit returns Home Endpoint Hit! message

The decorator “@app.route” helps direct the API requests to the appropriate functions

3. We now define the endpoint which will be used for inference (or prediction) of the class of the test sample. To keep things simple we’ll refer to this as the “/predict” endpoint. This endpoint would accept a POST request, which accepts the test data to perform classification on. For simplicity, this function would work with one test sample at a time.

4. Finally, we create the __main__ function

To sum it up:

We load the ML model and populate the global variable model with it
A Flask object called app is created, and a home endpoint is created which executes the function home_endpoint() when an API requests the home endpoint. We simply acknowledge that the API endpoint has been hit by returning a text message.
We create an additional endpoint /predict to perform inference using the test sample provided along with the API POST request to this endpoint. We return the prediction to this request.
We tie it all together in the main function, by loading the model as soon as this Python script gets executed

The complete procedure can be executed from a single script as follows:

The micro webservice is now ready to run. We can test this locally by running the flask_code.py script by using the command python flask_code.py from the terminal

When running, if the browser is opened to the URL of localhost:80 or 127.0.0.1:80 , message “Home Endpoint Hit!” should appear on the browser window.

The same can also be tested using the following curl command from the terminal:

curl -X POST \
   0.0.0.0:80/predict \
   -H 'Content-Type: application/json' \
   -d '[5.2,3.2,5.2,1.4]'

The above curl command POSTs the test sample [5.2, 3.2, 5.2, 1.4] to the Flask web-server and return a single class label

Using docker to containerize the flask service

One of the most frequent statements during testing or deployment is, “…but, it worked fine on my local machine?”. Fortunately Docker has come to the rescue by letting us containerize the application such that it deploys in the same, consistent environment, where it was developed and tested locally. Ideally the code should be independent of the machine and operating system used in development. This problem becomes even more pronounced while running a web-service on the cloud VM. The cloud VM may be running a different OS or even a different version of the OS. Installing the dependent libraries, using pip may pull in a different, more recent version than the version we build it on, resulting in bugs and incompatibilities. There’s a myriad of ways in which mismatch in execution environment can wreck havoc in the web-service. Thus, the need for containerization. A quick tutorial on containerizing an application using docker, can be found here.

In order to start containerizing our own application, we need to create a ‘Dockerfile’, which comprises of a set of instructions for the “docker daemon” to read, understand and build our docker image

Taking a short deep dive into the Dockerfile code; we pull the docker image python:3.6-slim from the python dockerhub repository to be used as the base for building the rest of the image on. We then copy the python file app.py and the model file model.pkl along with the requirements.txt file into the deploy/ folder in the docker image. We then change the “working directory” of the image to the deploy/ directory using the WORKDIR command. The RUN command installs the specific python packages detailed in the requirements.txt file onto the image, using pip . The EXPOSE command ‘exposes’ the port 80 to the outside world. Internally, we would need something to happen when the port 80 is accessed from outside. As it so happens, our flask service runs on port 80, thus making our flask service accessible to the outside world through port 80.

Now, issuing the following command would instruct docker to start building the docker image : docker build -t app-ml-model . . This command instructs docker to start building the image and ‘tag’ the image -t with the name app-ml-model and search for the Dockerfile in the current directory . (the period). Once built, the image can be seen under the list of images using the docker images command on the terminal. There should be an image with the name app-ml-model visible now. Another image by the name of python should also be visible, as that is the base image we build our own image on.

We are now ready to run the built image. The following command should run the ‘app-ml-model’ image: docker run -p 80:80 app-ml-model .

Using -p we’ve now mapped the port 80 of the local machine to the port 80 of the docker image, for redirection of traffic on local HTTP port 80 to port 80 of the container. Any custom changes to the local port (e.g. 5000) should be taken care of during this process (e.g. 5000:80).

If this step has been executed properly, typing in localhost:80 or http://0.0.0.0:80on the browser should now show the message “Home Endpoint Hit!”

Hosting the docker container just created, on an AWS EC2 instance

Seeing the message “Home Endpoint Hit!” confirms that our docker container is working. However, its only working on our local system. The eventual goal is to make it available to everyone as a web-service. In addition, it needs to be scalable and automated. Although, we could host the container locally, using our private internet server, it is far from ideal. Fortunately, there are numerous cloud hosting services provided by a multitude of companies including Amazon (Amazon Web Services a.k.a AWS), Microsoft (Azure) and Google (Google Cloud Platform a.k.a GCP). For this article, we’ll choose AWS. Specifically AWS’s Elastic Cloud Computing (EC2) service.

We do need to have an AW account to be able to use the EC2 service. If you’re new to AWS, you could use some AWS services for free for a period of a year, within certain limits of course. For this article, we’ll choose the “t2.micro” EC2 instance, which happens to be free-tier eligible. However, is capped in terms of its computation power in the free-tier. However, beyond the free-tier, this instance costs about a cent (in USD) or less than a pence (in GBP) during the time of writing this article. Thus, if you can afford it, I highly recommend evaluating AWS’s full potential, even if for a short period of time.

The first step is to login into the “AWS Management Console”. A simple google search of the phrase should eventually direct you to a page that looks similar to this :

Home page for the AWS Management Console. All AWS services can be accessed here.

Either clicking on the ‘EC2’ link in the services area of the webpage or searching for “ec2” and clicking on the link would direct you to the EC2 Dashboard, which shows an overview of all running instances and options for creating and launching one:

The EC2 Dashboard: An overview of all running instances and options of creating and launching more

The rest of the process of creating and launching an AW EC2 instance is quite straightforward and has been described in numerous tutorials. One of the closest to being an official tutorial is here. Although this article mentions the steps to creating an EC2 instance, I highly recommend reading through the official guide anyway, to get a better understanding of why certain steps were taken. The whole process shouldn’t take more than 10 minutes, depending on your internet connection. As a quick recap, the following images listed below show the process of creating an EC2 instance in snapshots.

There are 2 key steps to note:
(1) Step 7: Setting the additional incoming port of Type: HTTP, Procotol: TCP, Port: 80 and Source: 0.0.0.0/0, ::/0,
(2) Step 8: Downloading and storing the private key i.e. the .pem file

Step 1: Clicking on Launch Instance button on the EC2 Dashboard

Step 2: Selecting the 64-bit (x86) Amazon Linux 2 AMI (HVM)

Step 3: Choosing the t2.micro instance, which is free-tier eligible and clicking on “Review and Launch”

Step 4: In the next page, click on “Edit security groups”

Step 5: Click on “Add Rule” and add the new rule as demonstrated in the next step

Step 6: After the rule is added, click on “Review and Launch”

Step 7: Click on Launch after confirming the change in the Security Groups

Step 8: Create a new key pair or use an existing key pair and click on “Download Key Pair”

Step 9: Once launched, this page should appear. Click on “View Instances” to get an overview of running instances

We can now ‘ssh’ into the EC2 instance from our local system terminal using the following command. Ensure that the term public-dns-name is replaced with your EC2 instance name that appears under “Public IPv4 DNS” when you click on the running instance. ec2-user happens to be the defult username that gets created when a new EC2 instance is created. path refers to the custom local path where the .pem file is stored.

ssh -i /path/my-key-pair.pem ec2-user@public-dns-name

This will provide us access to the EC2 instance. There’s a few prerequisites to us getting our container up and running inside the instance. This is a simple approach for now. There exist more sophisticated methods that are beyond the scope of this article.

sudo amazon-linux-extras install docker
sudo yum install docker
sudo service docker start
sudo usermod -a -G docker ec2-user

This documentation provides a detailed explanation of the commands mentioned above.

Next, exit the instance by typing in exit into the terminal and log back in again. Ensure docker still works by typing in docker info in the terminal and confirming its output on screen.

Now we’ll copy our local files over to the EC2 instance. Open a new terminal window and issue the following command the local terminal. This command would need to be issued 4 times. Each time the phrase file-to-copy would be replaced by each of these 4 phrases, requirements.txt , app.py , model.pkl and Dockerfile . Ensure that the terminal is pointing to the directory that has all these files.

scp -i /path/my-key-pair.pem file-to-copy ec2-user@public-dns-name:/home/ec2-user

Log back into the EC2 instance and issue the ls command to ensure that all the copied over.

Now we’re ready to build the docker image inside the EC2 instance. Issue the same commands we used to build the docker image locally. In this case, ensure that port 80 is used at all locations in the codes and commands.

If everything has been executed well up until now, open the browser, copy and paste the public-dns-name . The phrase “Home Endpoint Hit!” should appear on your browser.

Sending the same curl request we did to the local container should return the same results as it did locally. Only this time, ensure that localhost or 0.0.0.0 is replaced with the public-dns-name .

curl -X POST \
   public-dns-name:80/predict \
   -H 'Content-Type: application/json' \
   -d '[5.2,3.2,5.2,1.4]'

We now have an up and running web-service that is hosting a trained ML model of our choice!

Further thoughts

This server is extremely basic and is only intended for passionate hobbyists. It’s nowhere near robust or secure as a ‘real’ production level server should be. A few suggestions to convert this to a production level server are:

Using a Web Server Gateway Interface (WSGI) such as Gunicorn in the flask app itself. This can be further supplemented with NGINX Reverse Proxy and Async workers
The current server is open to the entire world. The security of a production level server needs to be tighter than this. This can be achieved by restricting access to a custom set of IPs only.
There’s no test cases currently written for this server. Any production level software would need layers of testing before being deployed. This needs to be implemented.

As always, there’s more to learn than can be written in a single article. I’ll try my best to supplement the information in this article in my future drafts.

Github Repository: https://github.com/viswadeep-sarangi/DeployMLModel

Hope you’ve found this article useful. Would love to hear your thoughts in the comments below!

Really simple way to deploy your machine learning model online

One of the simplest ways of deploying your model online in minutes and inferring from anywhere in the world

Wrapping the model as a Flask Web Service

Written by VISWADEEP SARANGI