LOGOS: A Brand Independent logo detection model

Maitri Last Updated : 08 Aug, 2022

8 min read

This article was published as a part of the Data Science Blogathon.

Introduction

Logos, also known as trademarks, are critical to a firm. Each firm has its unique logo that contributes to the company’s public perception. From the toothpaste we use to the slippers we wear, logos surround us in our day-to-day life.

“It’s a habit of mine now, noticing labels, logos, shoes.”

~ Michael Jordan

They are everywhere in such an insane amount that we cannot help but notice them. Moreover, over the years, Logo Detection began gaining widespread attention because of its various applications. Its applications range from marketing to security. With its attention, many open source models and blogs are available for logo detection. So why another post on logo detection?

To understand the why, we must understand the existing approaches toward logo detection.

Logo detection is a part of a broader family of object detection; in other words, its an application of object detection. Furthermore, generally in object detection tasks, we are to predict the bounding box for objects in an image.

Source: https://github.com/eriklindernoren/PyTorch-YOLOv3

For an object detection model, the training data we need to provide consists of the image we want to detect, the bounding box coordinates, and the label of the bounding box. It is the same for logo detection.

Most of the existing open source logo detection models use brand names as the label; for example, A apple logo will have “Apple” as the label. That is where we found a problem that we would like to solve.

intel — Source:- https://github.com/satojkovic/DeepLogo

You might ask, what’s the problem with using brand names as the label for logo detection?

Despite gaining widespread attention, there is still a lack of research in logo detection. Not many logo detection models are available that will detect whether the image has a logo or not. This blog proposes a model that, if a logo is present in an image, can identify and predict the bounding box around it regardless of which brand it represents.

Source: Unsplash

With new businesses being registered daily, keeping track of each brand’s logo is challenging. Class Imbalance problems aside, it’s also hard to find sufficient data for each brand to keep our model up to date. This might create difficulties in identifying new brands in the image during inference. For example: suppose a new brand, “XYZ,” with its new logo, comes into the market, and our model is not trained on that. Obviously, during inference, we won’t be able to detect its logo.

And what if we simply want to detect the logos in the image and do not care which brand it belongs to. For example, suppose an image is given, and we just want to correctly identify the bounding boxes around the logos in the image irrespective of whether that logo belongs to “Apple” or “Google.” That’s the problem we want to address in this blog.

Source: Unsplash

Approach

A Machine learning pipeline is used to construct and automate machine learning workflows. It contains steps like preparing data, training and validating the model, etc. We also follow a similar workflow for the logo detection model development.

The changes we make are in the preparation of the data. We created an end to -end pipeline to detect whether a given image has a logo or not, and if the logo is present, predict the bounding box. It’s a classic object detection pipeline; the difference we purpose is in the annotation process. A standard object detection model requires at least two object classes for it to work. But presently, the logo detection datasets have the brand name for the object class; using brand names for object class has the following disadvantages:

Source: https://arxiv.org/pdf/2008.05359v1.pdf

Firstly the datasets will not consist of all the brands. Therefore we will never have sufficient training data. Curating such large-scale data will take lots of time. Apart from that, we also have to keep track of all new brands registering daily, which will create a problem in identifying new brands in the image during inference.
The datasets we create might be prone to imbalance which again creates that problem during inference. For example, We might have more images from one brand than the other.
We have to keep training each day to keep up to date.

To solve this problem, we annotated the data; differently, we divided logos into two classes:

Text logos: Logos with text in it
No Text logos: Logos with no text in them.

We annotate the data concerning these two classes; this mitigates our previous problem of brand dependency.

Why use this approach?

Rather than creating our own data set and custom model, this approach lets us use already available datasets and object detection models. Conveniently looking at all the logos, it makes sense to divide logos into two groups logos with text in them and logos without the text. Also, we have to give at least two classes to train on for any object detection model.

We identified two datasets for our problem LogoDet-3K: A Large-Scale Image Dataset for Logo Detection and Visually29K: a large-scale curated infographics dataset.

Annotation

We used the following steps to annotate the data:

Extracted the bounding box coordinates from the already available annotated logo datasets.
Cropped the image for that bounding box to get that particular region
Gave cropped logo image as input to the text detection model. We used CRAFT: Character-Region Awareness For Text detection as the text detector.
If text is detected, we label that particular bounding box as “Text”. Else we label the bounding box as “NoText”.

Datasets

LogoDet-3K: A Large-Scale Image Dataset for Logo Detection

source: https://github.com/Wangjing1551/LogoDet-3K-Dataset

LogoDet-3k is a large-scale high-quality logo detection dataset consisting of 3000 logo categories, 158,652 images, and 194,261 bounding boxes. In this dataset, according to our approach, we observe three types of images:

Images that contain only text logos.

Images that contain logos of label text only

2. Images that contain only no text logos.

only text — Images that contain logos of label no text only

3. Images that contain both text and no text logos.

Images that contain logos of both the “Text” label and “NoText” label

After we finish the process, we observe a severe class imbalance between text bounding boxes and no text bounding boxes, with no text bounding being the limiting class. To compensate for this imbalance, we next use the Visually29K dataset.

Visually29K: a large-scale curated infographics dataset

Visually29K consists of 28,973 infographics to cover a fixed set of 391 tags (filtered down from free-form text). They provide a bounding box for icons in the infographics from the dataset. As icons are very similar to no text logos, we use that to compensate for the imbalance of no text logos; in simple words, use the icons as no text logos. Hence we proceed to annotate the Visually29K dataset similar to how we annotated the LogoDet-3K dataset.

After the annotation step, we now proceed to train our model.

Model

We use the icevision library for our model training and inference purposes. We use mmdetection’s retinanet architecture for the object detection model with a pre-trained resnet50_fpn_1x as its backbone. The Metric used was COCOMetric.

Training

Data after applying some augmentations

For the training process, first, a custom data parser is used that will parse our data and present it to the model for training and testing purposes. We first train our data on 30,000 images. Later we increase the number of images to 53000. The 30,000 images only include images from LogoDet-3K, and 53,000 contain images from LogoDet-3K and Visually29K. We train them for 30 epochs and 50 epochs, respectively.

Later it is found that the validation loss or Metric is not increasing significantly, so training is stopped.

Inference

We saved the checkpoints from the previous step, letting the 30k images checkpoint be checkpoint 1 and the 53k images be checkpoint 2. So we find that checkpoint 1 is more inclined toward text logos; it might be because we did not check for imbalance at that moment, but the 30K images dataset was imbalanced towards text logos. And checkpoint 2 detected more no text logos. We later created an ensemble of checkpoint 1 and checkpoint 2 for our final model.

Results

Some good cases

Some bad cases

Conclusion

An enormous number of companies are in the market, each with its logo and brand name. It isn’t easy to develop a logo detection model with brand names as the classes. In this case, we have to reduce the number of classes. In this blog, I describe an approach of reducing the number of classes to train a generic logo detection model that detects a logo if it is present in the image.

Some of the key takeaways from the article are:

1. Logo detection with brand names as classes creates lots of problems. We need to reduce the number of classes to develop a model for general use to resolve these issues.

2. Logos can be divided into two classes logos with text and logos without text. We can annotate some existing datasets according to those two classes and train an object detection model.

3. Some of the approach’s shortcomings might be that it won’t tell which brand the logos belong to. Second, there might be false positives because we are using the VisuallyData dataset, which is a dataset for icons. Subsequently, our logo detection model might recognize icons as logos.

This marks the end of this blog!

A big thanks to Divyam Shah and Shashank Pathak!

Check out the GitHub repository for the code: https://github.com/LaotechLabs/LOGOS. This project is open-sourced, if you wish, you are welcome to collaborate with us on the project :)!

The media shown in this article is not owned by Analytics Vidhya and is used at the Author’s discretion.

Maitri

Computer Vision Image Image Analysis Intermediate Libraries

Free Courses

4.7

Generative AI - A Way of Life

Explore Generative AI for beginners: create text and images, use top AI tools, learn practical skills, and ethics.

4.5

Getting Started with Large Language Models

Master Large Language Models (LLMs) with this course, offering clear guidance in NLP and model training made simple.

4.6

Building LLM Applications using Prompt Engineering

This free course guides you on building LLM apps, mastering prompt engineering, and developing chatbots with enterprise data.

4.8

Improving Real World RAG Systems: Key Challenges & Practical Solutions

Explore practical solutions, advanced retrieval strategies, and agentic RAG systems to improve context, relevance, and accuracy in AI-driven applications.

4.7

Microsoft Excel: Formulas & Functions

Master MS Excel for data analysis with key formulas, functions, and LookUp tools in this comprehensive course.

MUID

Used by Microsoft Clarity, to store and track visits across websites.

Expiry: 1 Year

Type: HTTP

_clck

Used by Microsoft Clarity, Persists the Clarity User ID and preferences, unique to that site, on the browser. This ensures that behavior in subsequent visits to the same site will be attributed to the same user ID.

Expiry: 1 Year

Type: HTTP

_clsk

Used by Microsoft Clarity, Connects multiple page views by a user into a single Clarity session recording.

Expiry: 1 Day

Type: HTTP

SRM_I

Collects user data is specifically adapted to the user or device. The user can also be followed outside of the loaded website, creating a picture of the visitor's behavior.

Expiry: 2 Years

Type: HTTP

SM

Use to measure the use of the website for internal analytics

Expiry: 1 Years

Type: HTTP

CLID

The cookie is set by embedded Microsoft Clarity scripts. The purpose of this cookie is for heatmap and session recording.

Expiry: 1 Year

Type: HTTP

SRM_B

Collected user data is specifically adapted to the user or device. The user can also be followed outside of the loaded website, creating a picture of the visitor's behavior.

Expiry: 2 Months

Type: HTTP

_gid

This cookie is installed by Google Analytics. The cookie is used to store information of how visitors use a website and helps in creating an analytics report of how the website is doing. The data collected includes the number of visitors, the source where they have come from, and the pages visited in an anonymous form.

Expiry: 399 Days

Type: HTTP

_ga_#

Used by Google Analytics, to store and count pageviews.

Expiry: 399 Days

Type: HTTP

_gat_#

Used by Google Analytics to collect data on the number of times a user has visited the website as well as dates for the first and most recent visit.

Expiry: 1 Day

Type: HTTP

collect

Used to send data to Google Analytics about the visitor's device and behavior. Tracks the visitor across devices and marketing channels.

Expiry: Session

Type: PIXEL

AEC

cookies ensure that requests within a browsing session are made by the user, and not by other sites.

Expiry: 6 Months

Type: HTTP

G_ENABLED_IDPS

use the cookie when customers want to make a referral from their gmail contacts; it helps auth the gmail account.

Expiry: 2 Years

Type: HTTP

test_cookie

This cookie is set by DoubleClick (which is owned by Google) to determine if the website visitor's browser supports cookies.

Expiry: 1 Year

Type: HTTP

_we_us

this is used to send push notification using webengage.

Expiry: 1 Year

Type: HTTP

WebKlipperAuth

used by webenage to track auth of webenagage.

Expiry: Session

Type: HTTP

ln_or

Linkedin sets this cookie to registers statistical data on users' behavior on the website for internal analytics.

Expiry: 1 Day

Type: HTTP

JSESSIONID

Use to maintain an anonymous user session by the server.

Expiry: 1 Year

Type: HTTP

li_rm

Used as part of the LinkedIn Remember Me feature and is set when a user clicks Remember Me on the device to make it easier for him or her to sign in to that device.

Expiry: 1 Year

Type: HTTP

AnalyticsSyncHistory

Used to store information about the time a sync with the lms_analytics cookie took place for users in the Designated Countries.

Expiry: 6 Months

Type: HTTP

lms_analytics

Used to store information about the time a sync with the AnalyticsSyncHistory cookie took place for users in the Designated Countries.

Expiry: 6 Months

Type: HTTP

liap

Cookie used for Sign-in with Linkedin and/or to allow for the Linkedin follow feature.

Expiry: 6 Months

Type: HTTP

visit

allow for the Linkedin follow feature.

Expiry: 1 Year

Type: HTTP

li_at

often used to identify you, including your name, interests, and previous activity.

Expiry: 2 Months

Type: HTTP

s_plt

Tracks the time that the previous page took to load

Expiry: Session

Type: HTTP

lang

Used to remember a user's language setting to ensure LinkedIn.com displays in the language selected by the user in their settings

Expiry: Session

Type: HTTP

s_tp

Tracks percent of page viewed

Expiry: Session

Type: HTTP

AMCV_14215E3D5995C57C0A495C55%40AdobeOrg

Indicates the start of a session for Adobe Experience Cloud

Expiry: Session

Type: HTTP

s_pltp

Provides page name value (URL) for use by Adobe Analytics

Expiry: Session

Type: HTTP

s_tslv

Used to retain and fetch time since last visit in Adobe Analytics

Expiry: 6 Months

Type: HTTP

li_theme

Remembers a user's display preference/theme setting

Expiry: 6 Months

Type: HTTP

li_theme_set

Remembers which users have updated their display / theme preferences

Expiry: 6 Months

Type: HTTP

Reading list

Model Deployment

Introduction to Computer Vision

Getting Started with Image Data

Introduction to CNN and Implementation

Introduction to Transfer Learning

CNN Visualization

Overview of Pretrained Models

Inception

ResNets

DenseNets

CSRNet

Introduction to Object Detection

Region Based Convolutional Neural Network

Single Stage Networks

Transformed Based Object Detection Models

Face Detection

Object Tracking

Pose Estimation

Introduction to Image Segmentation

Understanding Deep Learning Architectures for Image Segmentation

Video Classification

Introduction to Image Generation

Zero and Few Shot Learning

LOGOS: A Brand Independent logo detection model

Introduction

To understand the why, we must understand the existing approaches toward logo detection.

You might ask, what’s the problem with using brand names as the label for logo detection?

Approach

Why use this approach?

Annotation

Datasets

Model

Training

Inference

Results

Some good cases

Some bad cases

Conclusion

Free Courses

Generative AI - A Way of Life

Getting Started with Large Language Models

Building LLM Applications using Prompt Engineering

Improving Real World RAG Systems: Key Challenges & Practical Solutions

Microsoft Excel: Formulas & Functions

Recommended Articles

Responses From Readers

Write for us

Congratulations, You Did It!

Analytics Vidhya (4)

brahmaid

csrftoken

Identityid

sessionid

Google (1)

g_state

Microsoft (7)

MUID

_clck

_clsk

SRM_I

SM

CLID

SRM_B

Google (7)

_gid

_ga_#

_gat_#

collect

AEC

G_ENABLED_IDPS

test_cookie

Webengage (2)

_we_us

WebKlipperAuth

LinkedIn (16)

ln_or

JSESSIONID

li_rm

AnalyticsSyncHistory