A Comprehensive Guide on Using Docker Compose for Machine Learning Containers

Ajay Kumar Reddy 12 May, 2023 • 17 min read

Introduction

Docker Compose is a tool that lets you define and run Multi-Containers apps, even establishing a communication layer between these services. It lets you declare your application’s services, networks, volumes, and Configurations in a single file called docker-compose.yml. It follows a declarative syntax to define Services and their Configurations for your applications, which can be configured with details like the Docker image to be used / custom Dockerfiles to build the image, PORTS to expose, volumes to mount, and environment variables. This guide will look at a machine learning application where we will work with Docker Compose to create and manage the application Containers.

Docker Compose | Machine Learning

Learning Objectives

  • To learn about the Docker Compose tool and its inner workings.
  • Learn to write docker-compose.yml files from scratch.
  • Understanding the keywords/elements present in the docker-compose.yml files.
  • Working with Docker compose and its files to speed up the Image building and containerization process time
  • Establishing communication networks between containers with docker compose.
  • Fitting docker compose in the machine learning application building pipeline.

This article was published as a part of the Data Science Blogathon.

Table of Contents

The Need for Docker Compose

In this section, we will go through the necessity for the Docker Compose. It helps in building Images and running Containers quickly. The docker-compose.yml files help in the Configuration of the Containers for the build process. They are also sharable, so everyone with the same file can run the Container with similar settings. Let’s look at each of these reasons why Docker Compose files are helpful.

Simplified Container Builds: When building multiple Containers, it requires a lot of effort to type in more docker commands to start and stop the Containers. The more the number of Containers, the more commands will be used. Also, if the Containers are mounted to one of the files in the host, or if they need one or more PORT numbers, then we have even to mention the PORT numbers and the volume option while typing in the docker commands. But all of this is reduced when using the Docker Compose file.

In the docker-compose.yml file, only we define the volumes and ports to the services that need them. After this, we never mention them in the command line. With just a single command, we can build and start all the Containers. Even with a single command itself, we can stop and delete all the Containers at a single time.

Scaling of Application: Scaling is something that we need to manage when it comes to building applications that are expected to have high traffic. That is, we increase the number of Containers of the application so that the application does not run slowly. So if we want to scale our app by 10, we need to run all the Containers of that application 10 times each. But with Compose, all of this can be achieved in a single line.

The Docker Compose command line tool provides many optional commands, and one option of those is scaling. After scaling the application, if one wants to set up a load balancer to tackle the traffic, we need to configure an Nginx Container separately. But again, with Compose, we can declare all of this docker-compose.yml file itself.

Sharing Compose Files: Each Container needs a separate Configuration when running multiple Containers. We provide these configurations while using the docker run command for each Container. So, when we want to share these Configurations with someone, so they can run with the Configuration we are running, we need to share all the commands we typed in the command line. This can be simplified by writing the docker-compose.yml file for our application and sharing the same with anyone who wants to reproduce the same Configuration.

Communication Between Services: The docker-compose.yml makes it simple to define for the Containers defined in it to communicate with each other. The docker-compose command is run, then a Common Network is created for all the Services/Containers present in the docker-compose.yml file. This is useful in situations, for example, when you are working with a machine learning application and want to store the outputs in MySQL or Redis Database. You can create a MySQL / Redis Container and add it to the Docker Compose file. Then the machine learning application will be able to communicate with these databases seamlessly.

Optimal Resource Allocation: Allocating resources for the Containers is necessary so that spawning too many Containers will not result in eating up all our resources and thus finally, crashing the computer. This can be talked with the docker-compose.yml files. In these files, we can Configure the resources individually for every Container/Service. The Configurations include both CPU usage and memory usage.

Docker Compose: Getting Started

In this section, we will look at a simple example of how Docker Compose works. Firstly, to work with Docker Compose, one needs to have Docker installed on their computer. When we install Docker, the docker-compose command line tool will be downloaded too. We will create a simple Fast API application and then write the Dockerfile and docker-compose file for it.

app.py

from fastapi import FastAPI

app = FastAPI()

@app.get("/")
async def root():
    return "Hello World"

This is the application we will be Containerizing. When the application is run, then Hello World is displayed at localhost:8000. The requirements file will contain fastapi and uvicorn, which are the libraries for this application. Now the Dockerfile for this application will be:

Dockerfile

FROM python:3.10-alpine

WORKDIR /code

COPY . .

RUN pip install --no-cache-dir -r requirements.txt

EXPOSE 8000

ENTRYPOINT [ "uvicorn", "app:app", "--host", "0.0.0.0", "--reload" ]
  • We are building this Image with a Base Image – python-alpine Image, which will result in less space.
  • The working directory is set to /code and all our files including the app.py, requirements.txt, and the Dockerfile will be copied here.
  • Then we install the dependencies with pip from the requirements.txt.
  • As the application runs on PORT 8000, we will be exposing this PORT of the Container.
  • Finally, we will declare the ENTRYPOINT/CMD to run our application. The host is set to 0.0.0.0 so that when the Container is run, the application can be visible from the localhost of the host machine.
  • Now we can use the docker build command to create an Image out of this. Then run the application using the docker run command. That would look something like the one below:
$ docker build -t fastapi_app .

$ docker run -p8000:8000 fastapi_app
Docker commands | Machine Learning

So we see that with the two docker commands, we are able to Containerize and run the application and it’s now visible to the localhost. Now let’s write a docker-compose.yml file to it so that we can run the application through the docker-compose command.

version: "3.9"
services:
  web-app:
    build: .
    container_name: fast-api-app
    ports:
      - 8000:8000

This is what a docker-compose.yml file looks like. It follows the Yet_Another_Markup_Language(YAML) syntax, which contains key-value pairs. We will cover each line, and what they do in this guide later. For now, let’s run the command to build our Image from this docker-compose.yml file.

$ docker-compose up

This command will search for the docker-compose.yml file and create the Images that are written in the services section of the file. At the same time, it even starts running the Image thus creating a Container out of it. So now if we go to localhost:8000, we can see our application running.

Docker commands | Machine Learning
Machine Learning

We see that with only one line of code, we are able to both build the Image and run the Container. We even notice that we did not write the PORT number in the command and still, the website is functioning. This is because we have written the PORT for the Container in the docker-compose.yml file. Also note that the above command only builds the Image if the Image does not exist, if the Image already exists, then it just runs the Image. Now to view the active Images, we can use the following command

$ docker-compose ps
"
"
  • Name: It’s the Container name that we have written in the docker-compose.yml file.
  • Image: The image generated from the docker-compose.yml file. compose_basics is the folder where the app.py, Dockerfile, and docker-compose.yml reside.
  • Command: The command runs when the Container is created. This was written in the ENTRYPOINT of the Dockerfile.
  • Service: It is the name of the service created from the docker-compose.yml file. We have only one service in the docker-compose.yml file and the name we have given is web-app.
  • Status: It tells how much time the Container is running. If the Container is Exited, then the Status will be Exited
  • PORTS: Specified the PORTS we have exposed to the host Network.

Now to stop the Container the command is:

$ docker-compose down

This command will stop the Container that was just created from the docker-compose up. It will even delete the Container and the networks and volumes associated with it. If we have multiple Containers spun up when running the docker-compose up command, then all those Containers will be cleaned when the docker-compose down command is run. So to start our application again, we run the docker-compose up command, which will create a new Container. This time it will not build the Image, because the Image was already built when we used the command for the first time.

Services: What are They?

In this section, we will have a more in-depth look into the format of how we have written the docker-compose.yml file. Mostly we will be looking into the services part of it.

version: "3.9"
services:
  web-app:
    build: .
    container_name: fast-api-app
    ports:
      - 8000:8000

The first line tells the version of the docker-compose we are working with. Here the version we are working on is 3.9.

  • Services: Every docker-compose.yml starts with the list of Services/Containers that we want to create and work in our application. All the Containers that you will work with related to a specific application will be Configured under the services. Each service has its own Configuration that includes the path to the build file, PORTS, environment variables, etc. For the fast API application, we need only one Container, hence we defined only one service named web-app under the Services.
  • Build: The build keyword tells where the location of the Dockerfile is. It provides the path to the Dockerfile. If the Dockerfile and the docker-compose.yml file exist in the same directory, then dot(.) can be used to represent its path.
  • Image: If we do not want to build an Image but rather, want to pull an Image from the DockerHub repo, then we use the image command. One example could be when we want to integrate our application with a database. Instead of downloading the database in the existing Image, we will pull a database Image, and then allow it to connect to the main application. For example, in our application, if we want our fast API app to integrate with redis, the docker-compose.yml file would look something like this:
version: "3.9"
services:
  web-app:
    build: .
    container_name: fast-api-app
    ports:
      - 8000:8000

  database:
    image: redis:latest
    ports:
      - 6379:6379
Docker compose | Machine Learning

In this example, we see that we create two services, i.e. two Containers. One is for the fast API app, another is for the database that we want to connect. Here, in the database service Configuration, we have written the IMAGE keyword instead BUILD. Because we want to pull the latest Redis Image from Docker and integrate it into our application

  • container-name: This keyword is for providing the name of our Container when the Container gets created. We have seen our Container name with the docker-compose ps command which we have done.
  • ports: This keyword is for providing the PORTS that our Container will expose so that we can view them from the host machine. As for the web-app service, we have defined 5000 because the fast API works on PORT 5000. Thus by adding this PORT, we are able to see the website running in the Container from our host Network. Similar to the database service in the above .yml file. Redis works on PORT 6379, hence we have exposed this PORT using this keyword. We can even expose multiple PORTS for a single service
  • depends_on: It’s the keyword written in a particular service. Here we will give the name of the other service that the current service will Depend on. For example, when building a website, we will create two services, one for the backend and one for frontend. Here the backend service will Depend on the frontend because only when the frontend service is created, we can use the backend to connect to it.

The docker-compose.yml file for this application will be

version: "3.9"
services:
  frontend:
    build: ./frontend_dir
    ports:
      - 8000:8000

  backend:
    build: ./backend_dir
    depends_on:
      - frontend
    ports:
      - 5000:5000

Let’s think that we have two folders frontend_dir which contains the Dockerfile for the frontend and backend_dir which contains the Dockerfile for the backend. We have even written the PORTS that they use. Now the backend service depends on the frontend service. It means that the backend service will only be converted into an Image/Containerized only after the frontend Image/Container is created.

Sharing Files Between Container and Host

In this section, we will look at how to use volumes in the Docker Compose file. We know that to share data between a Container and Host, we use volumes. We define this when running the Container. Instead of writing this every time we run a Container, we can add the volumes keyword under a particular service, so to share files between the service and the host. Let’s try this with our fast API example.

In the fast API code we have written, we see that the message “Hello World” is displayed when we run the Container. Now let’s add volumes to our docker-compose.yml and try changing this message and see what happens.

version: "3.9"
services:
  web-app:
    build: .
    container_name: fast-api-app
    ports:
      - 8000:8000
    volumes:
      - ./app.py:/code/app.py

Under the volumes keyword, we can provide a list of volumes that we want to link from the Container to the host. As we have only one file app.py, we are sharing the app.py present in the host with that of the app.py present in the CODE folder of the Container. Now let’s use the below command to create our app and see the output.

$ docker-compose up –force-recreate
 Machine Learning | Containers

The option –force-recreate, recreates the Containers, even though it already exists. Because we have changed the docker-compose.yml file, we provided this option. We see in the browser that the “Hello World” is displayed. Now let’s try to make the following changes in the app.py.

from fastapi import FastAPI

app = FastAPI()

@app.get("/")
async def root():
    return "Hello Analytics Vidhya"  # changed this
Machine Learning

We can see that after changing the code and saving the app.py, the following changes can be seen in the browser, thus stating that the app.py from the host and the app.py in the Container are linked. This can even be seen in the command line when we have run the docker-compose up –force-recreate command.

Docker commands output | Machine Learning | Containers

In the pic, we see that it will watch for the changes in the /code folder, where the app.py is located. And when we made the change in the app.py in the host machine and saved the file, then it generated a WARNING, stating that changes were made in the app.py file. If we have more than one service in our docker-compose.yml file, then multiple volumes can be written, one for each service. Thus it makes things much easier, rather than adding the volumes option to the docker run command, every time we run the Container to each Container that needs the volumes option.

Networking: Communication Between Services

In this section, we will look at how Containers/Services can talk to one another, and how a Network is established using the docker-compose.yml file. For this, we will use the Redis Database, we will store a key-value pair in Redis, then retrieve it and show it in the browser:

from fastapi import FastAPI
import redis

conn = redis.Redis(host='redis-database', port=6379,charset="utf-8",decode_responses=True)
conn.set("Docker","Compose") # Docker:Compose
ans = conn.get("Docker")

app = FastAPI()

@app.get("/")
async def root():
    return "Docker: " + ans

Here the conn establishes a connection. The host is assigned the value “redis-database”. redis-database is the service name that creates the Redis Container. Hence the service names declared in the docker-compose.yml act like a hostname that can be used in an application code. Then we are adding a key-value pair to the Redis Server(key=Docker, value=Compose). Then again we are retrieving the value of that key from the Redis Server and displaying it in the browser. The Dockerfile will be the same and the requirements.txt will contain.

  • redis
  • fastapi
  • uvicorn

The docker-compose.yml will now have two services.

version: "3.9"
services:
  redis-database:
    image: "redis:alpine"
    restart: always
    ports:
      - '6379:6379'
    networks:
      my-network:

  web-app:
    build: .
    container_name: fast-api-app
    ports:
      - 8000:8000
    volumes:
      - ./app.py:/code/app.py
    depends_on:
      - redis-database
    networks:
      my-network:
   
 

networks:
  my-network:
  • Here we created a new service called the redis-database.
  • We assigned the redis:alpine Image to it, which will be pulled from DockerHub when running the command.
  • The restart keyword will make sure to restart the Redis Container every time it stops or fails.
  • Now the web-app service will be depending on the database service.
  • Also, note that we added a networks keyword.

networks: This keyword is for providing communications between a set of Containers. In the above docker-compose.yml file, we created a new Network called my-network. This my-network is provided to both the web-app service and the redis-database service. Now both these services can communicate with each other because they are part of the same Network Group.

 | Machine Learning | Containers

Now let’s run the docker-compose up –force-recreate command, to create our New Image and run our New Container. Let’s see the output of the browser

 | Machine Learning | Containers

We see that the code has run perfectly fine. We are able to send information to Redis Server and retrieve the information. Thus we are able to establish a connection between the fast API app and the Redis Server. Now if we use the docker-compose ps, we can look at the Containers that we have created. The below pic, will the output of it.

" CODE OUTPUT

The Network we created can be seen with one of the docker’s command

$ docker network ls
Docker commands output | Machine Learning | Containers

We see the Network compose_basics_my-network is created, and the my-network is the one we have defined in the docker-compose.yml file

Machine Learning Example for Docker Compose

We will be creating a website application that takes in two input sentences from the user and gives the similarity between them on a scale of 0 to 1, where 1 is almost the same. After that, we will be storing the input sentences and their similarity in a Redis Database. The user will even be able to view past inputs provided by past users and their similarities, which were stored in the Redis Database.

For this application, the UI library we are working with is the streamlit. And coming to the model, we are using the sentence-transformers library from hugging face. This library provides us with a model named all-MiniLM-L6-v2, which gives out the similarity(cosine similarity) for given sentences.

app.py

import pandas as pd
import streamlit as st
from models import predict_similarity
from data import add_sentences_to_hash, get_past_inputs

st.set_page_config(page_title="Sentence Similarity")

txt1 = st.text_input("Enter Sentence 1")
txt2 = st.text_input("Enter Sentence 2")

predict_btn = st.button("Predict Similarity")

if predict_btn:
    similarity = predict_similarity(txt1,txt2)
    st.write("Similarity: ",str(round(similarity,2)))
    add_sentences_to_hash(txt1,txt2,similarity)

show_prev_queries = st.checkbox("Previous Queries")

if show_prev_queries:
    query_list = get_past_inputs()
    query_df = pd.DataFrame(query_list)
   
    st.write("Previous Queries and their Similarities")
    st.write(query_df)

In app.py, we have worked with the Streamlit library to create a UI for the website application. Users can enter two sentences using text input fields txt1 and txt2 and click on the “Predict Similarity” button (predict_btn) to activate the prediction. This will then run the predict_similarity() function from models.py that takes in the sentences and gives out the similarity. The predicted similarity score is then displayed using st.write(). The add_sentences_to_hash() function from the data.py is called to store the entered sentences and their similarity in a Redis Server using a timestamp for the key.

We have even created a checkbox(show_prev_queries to display past sentences that the past users have entered with their similarities. If the checkbox is selected, the get_past_inputs() function is called from data.py to retrieve the past sentences and their similarities from the Redis Server. The retrieved data is then converted to a Pandas Dataframe and then displayed to the UI through st.write().

models.py

import json
import requests

API_URL = "https://api-inference.huggingface.co/models/sentence-transformers/all-MiniLM-L6-v2"
headers = {"Authorization": "Bearer {API_TOKEN}"}

def predict_similarity(txt1,txt2):
    payload = {
        "inputs": {
            "source_sentence": txt1,
            "sentences":[txt2]
        }
    }
    response = requests.post(API_URL, headers=headers, json=payload)

    return response.json()[0]

In the models.py we are using the Inference API of the hugging face. Through this, we can send the input sentences in the form of payloads to the sentence-transformer pre-trained model using the post() function of the Requests library. Here the Authorization header containing API_TOKEN will be replaced by the token that you can get by signing into hugging face.

We then create a function called predict_simillarity() which takes in two inputs txt1 and txt2, which are the sentences that the user will provide, and then constructs a payload in the pre-defined format and sends a POST Request to the Inference API endpoint (API_URL) with the auth headers and the payload JSON to extract the similarity score, which is then returned. Note that the actual auth token is obscured with a placeholder (“API_TOKEN”) in the code provided.

data.py

import redis
import time

cache = redis.Redis(host='redis', port=6379,charset="utf-8",decode_responses=True)

def add_sentences_to_hash(txt1,txt2,similarity):
    dict = {"sentence1":txt1,
            "sentence2":txt2,
            "similarity":similarity}
   
    key = "timestamp_"+str(int(time.time()))
    cache.hmset(key,mapping=dict)

def get_past_inputs():
    query_list = []
    keys = cache.keys("timestamp_*")

    for key in keys:
        query = cache.hgetall(key)
        query_list.append(query)
   
    return query_list

In data.py, we work with the Redis library to interact with a Redis Server. Firstly we make a connection to the server by providing the host(which in our situation is the name of the service provided in the docker-compose.yml file), PORT, etc. We then defined an add_sentences_to_hash() function that takes the following inputs: two sentences txt1, txt2 and their similarities (similarity variable) and stores them in the Redis Database in the form of a dictionary with a timestamp-based key(created using the time library).

We then define another function get_past_inputs function retrieves all the keys in the Redis Server that match the pattern “timestamp_*”, retrieves the corresponding values (which are dictionaries containing the sentences and their similarities), and appends them to the list query_list which is then returned by the function.

Let’s run the app.py locally and test the application by running the below command.

Note: For running locally replace host “redis” in data.py with “localhost”.

$ streamlit run app.py
Docker commands
 | Machine Learning | Containers
Docker commands | Machine Learning | Containers

We see that the application is working perfectly. We are able to get the similarity score and even add it to the Redis Database. We even successfully retrieved the data from the Redis Server.

Containerizing and Running the ML App with Docker Compose

In this section, we will create the Dockerfile and the docker-compose.yml file. Now the Dockerfile for this application will be:

FROM python:3.10-slim
WORKDIR /code
COPY requirements.txt requirements.txt
RUN pip install --no-cache-dir -r requirements.txt
EXPOSE 8501
COPY . .
CMD streamlit run app.py
  • FROM python:3.10-slim: Here we tell what is the Base Image for the Docker image, which is Python 3.10 slim version.
  • WORKDIR /code: Creates a directory named code and sets it to the working directory, where all the application files are copied
  • COPY . . : Copies all the files from our current directory to the working directory present in the Container. It copies app.py, models.py, data.py, Dockerfile, docker-compose.yml, and the requirements.txt
  • RUN pip install –no-cache-dir -r requirements.txt: Installs all the dependencies present in the requirements.txt where –no-cache-dir flag is provided to avoid caching the downloaded packages, which helps in reducing the size of the Docker image.
  • EXPOSE 8501: Exposes the PORT 8051 of the Container to outside of the Container, i.e. to the host machine.
  • CMD streamlit run app.py: Runs the command to start the application

The requirements.txt file contains:

  • streamlit
  • pandas
  • redis

The docker-compose.yml file will contain two services. One for the application code and the other for the database. The file will be something similar to the one below:

version: "3.9"
services:
  redis:
    image: "redis:alpine"
    container_name: redis-database
    ports:
      - '6379:6379'

  ml-app:
    build: .
    container_name: sentence-similarity-app
    ports:
      - '8501:8501'
    depends_on:
      - redis
    volumes:
      - ./app.py:/code/app.py
      - ./data.py:/code/data.py
      - ./models.py:/code/models.py
  • We created two services redis(for creating the redis Container) and ml-app(for creating a Container for our application code), and we even name these Containers.
  • We then provide the build path for the ml-app service and provide the Image for the redis service, where the alpine Image is taken for its smaller size.
  • For both services, their respective PORTS are exposed.
  • The ml-app service depends on the redis service because redis service is a Database.
  • Finally, volumes are created to map all the files in our current directly to all the files present in the code directory in the Container.

Now we will run the docker-compose up command to build and run our application. If that doesn’t work try docker-compose up –force-recreate command. After running the command, let’s open the localhost:8501 in the browser.

Docker commands | Machine Learning | Containers
Docker commands | Machine Learning | Containers

We see that the docker-compose has successfully built our Image and created working Containers for it. So this way, docker-compose can be really helpful when we are dealing with multiple Containers, like when creating Machine Learning applications, where we need the frontend, the backend, and the database to act like separate Containers

Conclusion

In this comprehensive guide, we have taken a complete look at how to create docker-compose.yml files and how to build Images and run Containers with them. We even learned about different keywords that go in docker-compose.yml files. We have seen examples for all these keywords. Finally, through a project, we have seen how docker-compose.yml files can be really useful when creating Machine Learning applications.

Some key takeaways from this guide include:

  • Docker Compose is a tool for creating and managing multiple Containers.
  • All the Configurations of the Containers are defined in the docker-compose.yml file.
  • Docker Compose allows different Containers to communicate with one another.
  • Compose files makes it easy to share the Container Configurations with others.
  • With docker-compose.yml files, we can restrict the resources used by each Container.

The media shown in this article is not owned by Analytics Vidhya and is used at the Author’s discretion. 

Ajay Kumar Reddy 12 May 2023

Frequently Asked Questions

Lorem ipsum dolor sit amet, consectetur adipiscing elit,

Responses From Readers