Shivani Sharma — September 8, 2021
Advanced AWS Cloud Computing Deep Learning MLops Technique

This article was published as a part of the Data Science Blogathon

The speed of Deep learning and neural networks is increasingly indispensable for thousands of industries. One of the main problems they face is deploying complex kinds of applications. I want to show you a practical and convenient way of such a deployment, for which you do not need to be an expert in cloud technologies and clusters. For this, we will use the serverless infrastructure. Let’s move!


Recently, many problems in the product have been solved using models created by deep learning or neural networks. Often these are problems that have been solved for many years by conventional deterministic methods and are now easier and cheaper to solve through ML.

With modern frameworks like Keras or Tensorflow and catalogs of ready-made solutions, it becomes easier to create models that give the accuracy necessary for the product.

The most important thing is that today it is easy to find / download/train a model and I want to be able to deploy it just as easily.

Again, when working in a startup or small company, you often need to quickly test assumptions, not just technical ones, but market ones as well. And for the same, you have to deploy the model, quickly, easily, and smartly.

To solve such a deployment problem, I liked working with cloud microservices.

FaaS – functions as a service are relatively cheap, easy to deploy (no Docker required), and an almost unlimited number of entities can be run in parallel.

Now I’ll show you how you can deploy TensorFlow / Keras models on AWS Lambda – Amazon’s FaaS. As a result, an API for recognizing content on images costs $ 1 for 20,000 recognitions. Is it cheaper? Perhaps. Can it be easier? Unlikely.


Consider a diagram of different types of application deployments:

Function as a service | Serverless Tensorflow AWS LambdaImage 1

Next, we see IaaS(Infrastructure-as-a-Service) – here a server is located in a data center and we are already working with a specific virtual machine. The next step is Platform-as-a-Service when we no longer have access to the machine itself, but we manage the container in which the application will be executed. And finally, FaaS(Function-as-a-Service), where everything is hidden except the code, that you need to control. This is good news as we will see later, which gives us some really cool functionality.

AWS Lambda is a FAAS implementation on the AWS platform. Briefly about the implementation. The container for it is a zip archive [code + libraries]. AWS deploys this code to containers based on the number of external requests (triggers). There are essentially no borders from above – the current limit is 1000 concurrently operating containers, but it can easily be raised to 10000 and higher through the support.

The main advantages of AWS Lambda:

  • Easy to deploy (no docker) – only code and libraries

  • Good scaling – in production we ran over 40 thousand invocations at the same time. More is possible.

  • Low call cost. It is also important for my colleagues from the BD direction that microservices support a pay-as-you-go model for using the service. This makes the unit economics of using the model clear when scaling.

Why port neural networks to serverless?

Firstly, I am going to use Tensorflow for the application purpose. It is an open framework that permits developers to create, test-train, and deploy deep learning models. It is by far the most popular deep learning library and is used by experts and novices alike.

At the moment, the main method for deploying machine learning models is a cluster. The look of REST API for deep learning is like this:

Serverless Tensorflow AWS Lambda

Image  2

Seems cumbersome? Now, see what you need to take care of most:

  • write the logic for distributing traffic to cluster machines

  • write the logic of scaling, trying to find the golden mean between downtime and braking

  • prescribe the logic of the container’s behavior – logging, managing incoming requests

On AWS Lambda, the architecture will look much simpler:

Aws lambda | Serverless Tensorflow AWS Lambda

Image 3

First, this approach is highly scalable. It can handle up to 10 thousand concurrent requests without writing any additional logic. This feature makes the architecture ideal for handling peak loads as it does not require additional processing time.

Secondly, you don’t have to pay for server downtime. In Serverless architecture, for one request the payment is made. This means that if you have 25 thousand requests, you will only pay for 25 thousand requests, no matter what stream they came in. For the example on Tensorflow, which I will show later, the cost is 20-25 thousand requests per $ 1. A cluster with exactly the same functionality costs exceptionally higher, and it is more profitable to become only on a very large number of requests (> 1 million).

Third, the infrastructure is getting much larger. You do not need to work with docker, write the logic of scaling and load balancing. In short, the company does not have to hire an additional person to support the infrastructure, and if you are a data scientist, you can do everything yourself.

As you will see below, deploying the entire infrastructure for the above application requires no more than 4 lines of code.

It would be incorrect not to mention the shortcomings of the serverless infrastructure and the cases when it will not work. AWS Lambda has a hard processing time and memory constraints to keep in mind.

First, In cases where you have no peak load and a lot of requests, the cluster will be more profitable.

Secondly, AWS Lambda has a small but definite start time (100-200ms). For the applications of deep learning, it takes a little more time to download. For the example I will show below, a cold start will be 4.5 seconds and a warm start will be 3 seconds. For some applications, this may not be critical, but if your application is focused on processing a single request as quickly as possible, a cluster is a better option.

Application of Serverless Tensorflow on AWS Lambda

For this example, I’m using a fairly popular application of neural networks – image recognition. Our application takes a picture as input and returns a description of the object on it. These kinds of applications are widely used to filter images and classify many images into groups. Our application will also do the recognization work on the image of a panda.


Image 4

We will be using the following stack:

  • API Gateway for managing requests

  • AWS Lambda for Processing

  • Serverless deployment framework

Implementation Code for Serverless Tensorflow on AWS Lambda

First, you need to install and configure the Serverless framework that we will use to orchestrate and deploy the application. Link to the guide.

Just create a blank new folder and run the specified command:

serverless install -u -n tensorflow
cd tensorflow
serverless deploy
serverless invoke --function main --log
You will receive the following response:
giant panda, panda bear, panda, Ailuropoda melanoleuca (score = 0.89107)
indri, Indri brevicaudatus (score = 0.00779)
custard apple (score = 0.00147)
earthstar (score = 0.00117)

As you can see, the successful output of the panda picture (0.89) by our model.

WoW! We have deployed a neural network successfully for image recognition purposes on Tensorflow on AWS Lambda, just with 4 lines of code.

Let’s deep dive into the code for Serverless Tensorflow on AWS Lambda

Let’s start with a configuration file. Nothing fancy – we’re using the basic AWS Lambda configuration.

service: tensorflow
frameworkVersion: ">=1.2.0 <2.0.0"
  name: aws
  runtime: python2.7
  memorySize: 1536
  timeout: 300
    handler: index.handler

If we look at the ‘’ file itself, we will see that first, we download the model (‘.pb’ file) to the ‘/ tmp /’ folder on AWS Lambda, and then import it in the standard way via Tensorflow.

Below are links to the pieces of code on Github that you should keep in mind if you want to embed your own model:


Downloading a model from S3

   strBucket = 'ryfeuslambda'
    strKey = 'tensorflow/imagenet/classify_image_graph_def.pb'
    strFile = '/tmp/imagenet/classify_image_graph_def.pb'

Importing a model

def create_graph():
with tf.gfile.FastGFile(os.path.join('/tmp/imagenet/', 'classify_image_graph_def.pb'), 'rb') as f:
        graph_def = tf.GraphDef()
        _ = tf.import_graph_def(graph_def, name='')

Downloading the image

   strFile = '/tmp/imagenet/inputimage.jpg'
if ('imagelink' in event):
        urllib.urlretrieve(event['imagelink'], strFile)
        strBucket = 'ryfeuslambda'
        strKey = 'tensorflow/imagenet/cropped_panda.jpg'

Getting predictions from the model

       softmax_tensor = sess.graph.get_tensor_by_name('softmax:0')
        predictions =,
                               {'DecodeJpeg/contents:0': image_data})
        predictions = np.squeeze(predictions)

Now let’s add the API to the lambda.


API example

The easiest way to add an API is to modify the YAML configuration file.

service: tensorflow
framework version: ">=1.2.0 <2.0.0"
  name: aws
  runtime: python2.7
  memorySize: 1536
  timeout: 300
    handler: index.handler
      - http: GET handler
Now let's redeploy the stack:
serverless deploy
We get the following.
Service Information
service: tensorflow
stage: dev
region: us-east-1
stack: tensorflow-dev
api keys:
  GET -
  main: tensorflow-dev-main
To test the API, you can simply open it as a link:
Or use curl:

We’ll get:

{"return": "giant panda, panda, panda bear, coon bear, Ailuropoda melanoleuca (score = 0.89107)"}


We created an API for a Tensorflow model based on AWS Lambda using the Serverless framework. We managed to do everything quite simply and this approach saved us a lot of time compared to the traditional approach.

By modifying the configuration file, you can connect many other AWS services, such as SQS for stream processing tasks, or create a chatbot using AWS Lex.


  1. Image 1 –*b6MXaZWwYJATdF6vw2Z8Hw.png
  2. Image 2-
  3. Image 3-
  4. Image 4 –
The media shown in this article are not owned by Analytics Vidhya and are used at the Author’s discretion.

About the Author

Our Top Authors

  • Analytics Vidhya
  • Guest Blog
  • Tavish Srivastava
  • Aishwarya Singh
  • Aniruddha Bhandari
  • Abhishek Sharma
  • Aarshay Jain

Download Analytics Vidhya App for the Latest blog/Article

Leave a Reply Your email address will not be published. Required fields are marked *