Mastering AI Optimization and Deployment with Intel’s OpenVINO Toolkit

Tarun R Jain 30 Oct, 2023

7 min read

Introduction

We talk about AI almost daily due to its growing impact in replacing humans’ manual work. Building AI-enabled software has rapidly grown in a brief time. Enterprises and businesses believe in integrating reliable and responsible AI in their application to generate more revenue. The most challenging part of integrating AI into an application is the model inference and computation resources utilized in training the model. Many techniques already exist that improve the performance by optimizing the model during inference with fewer computation resources. With this problem statement, Intel introduced the OpenVINO Toolkit, an absolute game-changer. OpenVINO is an open-source toolkit for optimizing and deploying AI inference.

Learning Objectives

In this article, we will:

Understand what the OpenVINO Toolkit is and its purpose in optimizing and deploying AI inference models.
Explore the practical use cases of OpenVINO, especially its importance in the future of AI at the edge.
Learn how to implement a text detection project in an image using OpenVINO in Google Colab.
Discover the key features and advantages of using OpenVINO, including its model compatibility and support for hardware accelerators and how it can impact various industries and applications.

This article was published as a part of the Data Science Blogathon.

What is OpenVINO?

OpenVINO, which stands for Open Visual Inference and Neural Network Optimization, is an open-source toolkit developed by the Intel team to facilitate the optimization of deep learning models. The vision of the OpenVINO toolkit is to boost your AI deep-learning models and deploy the application on-premise, on-device, or in the cloud with more efficiency and effectiveness.

OpenVINO Toolkit is particularly valuable because it supports many deep learning frameworks, including popular ones like TensorFlow, PyTorch, Onnx, and Caffe. You can train your models using your preferred framework and then use OpenVINO to convert and optimize them for deployment on Intel’s hardware accelerators, like CPUs, GPUs, FPGAs, and VPUs.

Concerning inference, OpenVINO Toolkit offers various tools for model quantization and compression, which can significantly reduce the size of deep learning models without losing inference accuracy.

Why Use OpenVINO?

The craze of AI is currently in no mood to slow down. With this popularity, it is evident that more and more applications will be developed to run AI applications on-premise and on-device. A few of the challenging areas where OpenVINO excels make it an ideal choice why it is crucial to use OpenVINO:

OpenVINO Model Zoo

OpenVINO provides a model zoo with pre-trained deep-learning models for tasks like Stable Diffusion, Speech, Object detection, and more. These models can serve as a starting point for your projects, saving you time and resources.

Model Compatibility

OpenVINO supports many deep learning frameworks, including TensorFlow, PyTorch, ONNx, and Caffe. This means you can use your preferred framework to train your models and then convert and optimize them for deployment using the OpenVINO Toolkit.

High Performance

OpenVINO is optimized for fast inference, making it suitable for real-time applications like computer vision, robotics, and IoT devices. It leverages hardware acceleration such as FPGA, GPU, and TPU to achieve high throughput and low latency.

AI in Edge Future Using Intel OpenVINO

AI in Edge is the most challenging area to tackle. Building an optimized solution to solve hardware constraints is no longer impossible with the help of OpenVINO. The future of AI in Edge with this Toolkit has the potential to revolutionize various industries and applications.

Let’s find out how OpenVINO works to make it suitable for AI in Edge.

The primary step is to build a model using your favorite deep-learning frameworks and convert it into an OpenVINO core model. One another alternative is to use pre-trained models using the OpenVINO model zoo.
Once the model is been trained, the next step is compression. OpenVINO Toolkit provides a Neural Network compression framework (NNCF).
Model Optimizer converts the pre-trained model into a suitable format. The optimizer consists of IR data. IR data refers to the Intermediate Representation of a deep learning model, which is already optimized and transformed for deployment with OpenVINO. The model weights are in .XML and .bin file format.
At model deployment, the OpenVINO Inference Engine can load and use the IR data on the target hardware, enabling fast and efficient inference for various applications.

With this approach, OpenVINO can play a vital role in AI in Edge. Let’s dirty our hands with a code project to implement Text detection in an image using the OpenVINO Toolkit.

Text Detection in an Image Using OpenVINO Toolkit

In this project implementation, we will use Google Colab as a medium to run the application successfully. In this project, we will use the horizontal-text-detection-0001 model from the OpenVINO model Zoo. This pre-trained model detects horizontal text in input images and returns a blob of data in the shape (100,5). This response looks like (x_min, y_min, x_max, y_max, conf) format.

Step-by-Step Code Implementation

Installation

!pip install openvino

Import Required Libraries

Let’s import the required modules to run this application. OpenVINO supports a utils helper function to download pre-trained weights from the provided source code URL.

import urllib.request

base = "https://raw.githubusercontent.com/openvinotoolkit/openvino_notebooks"
utils_file = "/main/notebooks/utils/notebook_utils.py"

urllib.request.urlretrieve(
    url= base + utils_file,
    filename='notebook_utils.py'
)

from notebook_utils import download_file

You can verify, that notebook_utils is now successfully downloaded, let’s quickly import the remaining modules.

import openvino

import cv2
import matplotlib.pyplot as plt
import numpy as np
from pathlib import Path

Download Weights

Initialize the Path to download IR data model weight files of horizontal text detection in .xml and .bin format.

base_model_dir = Path("./model").expanduser()

model_name = "horizontal-text-detection-0001"model_xml_name = f'{model_name}.xml'
model_bin_name = f'{model_name}.bin'

model_xml_path = base_model_dir / model_xml_name
model_bin_path = base_model_dir / model_bin_name

In the following code snippet, we use three variables to simplify the path where the pre-trained model weights exist.

model_zoo = "https://storage.openvinotoolkit.org/repositories/open_model_zoo/2022.3/models_bin/1/"
algo = "horizontal-text-detection-0001/FP32/"
xml_url = "horizontal-text-detection-0001.xml"
bin_url = "horizontal-text-detection-0001.bin"

model_xml_url = model_zoo+algo+xml_url
model_bin_url =  model_zoo+algo+bin_url

download_file(model_xml_url, model_xml_name, base_model_dir)
download_file(model_bin_url, model_bin_name, base_model_dir)

download the file | Intel's OpenVINO Toolkit

Load Model

OpenVINO provides a Core class to interact with the OpenVINO toolkit. The Core class provides various methods and functions for working with models and performing inference. Use read_model and pass the model_xml_path. After reading the model, compile the model for a specific target device.

core = Core()

model = core.read_model(model=model_xml_path)
compiled_model = core.compile_model(model=model, device_name="CPU")

input_layer_ir = compiled_model.input(0)
output_layer_ir = compiled_model.output("boxes")

In the above code snippet, the complied model returns the input image shape (704,704,3), an RGB image but in PyTorch format (1,3,704,704) where 1 is the batch size, 3 is the number of channels, 704 is height and weight. Output returns (x_min, y_min, x_max, y_max, conf). Let’s load an input image now.

Load Image

The model weight is [1,3,704,704]. Consequently, you should resize the input image accordingly to match this shape. In Google Colab or your code editor, you can upload your input image, and in our case, the image file is named sample_image.jpg.

image = cv2.imread("sample_image.jpg")

# N,C,H,W = batch size, number of channels, height, width.
N, C, H, W = input_layer_ir.shape

# Resize the image to meet network expected input sizes.
resized_image = cv2.resize(image, (W, H))

# Reshape to the network input shape.
input_image = np.expand_dims(resized_image.transpose(2, 0, 1), 0)

print("Model weights shape:")
print(input_layer_ir.shape)
print("Image after resize:")
print(input_image.shape)

Display the input image.

plt.imshow(cv2.cvtColor(image, cv2.COLOR_BGR2RGB))
plt.axis("off")

Inference Engine

Previously, we used model weights to compile the model. Use compile the model in context to the input image.

# Create an inference request.
boxes = compiled_model([input_image])[output_layer_ir]

# Remove zero only boxes.
boxes = boxes[~np.all(boxes == 0, axis=1)]

Prediction

The compiled_model returns boxes with the bounding box coordinates. We use the cv2 module to create a rectangle and putText to add the confidence score above the detected text.

def detect_text(bgr_image, resized_image, boxes, threshold=0.3, conf_labels=True):
    # Fetch the image shapes to calculate a ratio.
    (real_y, real_x), (resized_y, resized_x) = bgr_image.shape[:2], resized_image.shape[:2]
    ratio_x, ratio_y = real_x / resized_x, real_y / resized_y

    # Convert image from BGR to RGB format.
    rgb_image = cv2.cvtColor(bgr_image, cv2.COLOR_BGR2RGB)

    # Iterate through non-zero boxes.
    for box in boxes:
        # Pick a confidence factor from the last place in an array.
        conf = box[-1]
        if conf > threshold:
            (x_min, y_min, x_max, y_max) = [
                int(max(corner_position * ratio_y, 10)) if idx % 2
                else int(corner_position * ratio_x)
                for idx, corner_position in enumerate(box[:-1])
            ]

            # Draw a box based on the position, parameters in rectangle function are: 
            # image, start_point, end_point, color, thickness.
            rgb_image = cv2.rectangle(rgb_image, (x_min, y_min), (x_max, y_max),(0,255, 0), 10)

            # Add text to the image based on position and confidence.
            if conf_labels:
                rgb_image = cv2.putText(
                    rgb_image,
                    f"{conf:.2f}",
                    (x_min, y_min - 10),
                    cv2.FONT_HERSHEY_SIMPLEX,
                    4,
                    (255, 0, 0),
                    8,
                    cv2.LINE_AA,
                )

    return rgb_image

Display the output image

plt.imshow(detect_text(image, resized_image, boxes));
plt.axis("off")

Conclusion

To conclude, we successfully built Text detection in an image project using the OpenVINO Toolkit. Intel team continuously improves the Toolkit. OpenVINO also supports pre-trained Generative AI models such as Stable Diffusion, ControlNet, Speech-to-text, and more.

Key Takeaways

OpenVINO is a game-changer open source tool to boost your AI deep-learning models and deploy the application on-premise, on-device, or in the cloud.
The primary goal of OpenVINO is to optimize the deep models with various model quantization and compression, which can significantly reduce the size of deep learning models without losing inference accuracy.
This Toolkit also supports deploying AI applications on hardware accelerators such as GPUs, FPGAs, ASIC, TPUs, and more.
Various industries can adopt OpenVINO and leverage its potential to make an impact on AI at the edge.
Utilization of the model zoo pre-trained model is simple as we implemented text detection in images with just a few lines of code.

Frequently Asked Questions

Q1. What is Intel OpenVINO used for?

A. Intel OpenVINO provides a model zoo with pre-trained deep-learning models for tasks like Stable Diffusion, Speech, and more. OpenVINO runs model zoo pre-trained models on-premise, on-device, and in the cloud more efficiently and effectively.

Q2. What is the difference between OpenVINO and TensorFlow?

A. Both OpenVINO and TensorFlow are free and open-source. Developers use TensorFlow, a deep-learning framework, for model development, while OpenVINO, a Toolkit, optimizes deep-learning models and deploys them on Intel hardware accelerators.

Q3. Where is OpenVINO used?

A. OpenVINO’s versatility and ability to optimize deep learning models for Intel hardware make it a valuable tool for AI and computer vision applications across various industries such as Military defense, Healthcare, Smart cities, and many more.

Q4. Is the Intel’s OpenVINO Toolkit free to use?

A. Yes, Intel’s OpenVINO toolkit is free to use. The Intel team developed this open-source toolkit to facilitate the optimization of deep learning models.

The media shown in this article is not owned by Analytics Vidhya and is used at the Author’s discretion.