Pulkit Sharma — Updated On December 23rd, 2020
Classification Computer Vision Deep Learning Image Intermediate Project Python Supervised Unstructured Data


Are you working with image data? There are so many things we can do using computer vision algorithms:

  • Object detection
  • Image segmentation
  • Image translation
  • Object tracking (in real-time), and a whole lot more.

This got me thinking – what can we do if there are multiple object categories in an image? Making an image classification model was a good start, but I wanted to expand my horizons to take on a more challenging task – building a multi-label image classification model!

I didn’t want to use toy datasets to build my model – that is too generic. And then it struck me – movie/TV series posters contain a variety of people. Could I build my own multi-label image classification model to predict the different genres just by looking at the poster?

The short answer – yes! And in this article, I have explained the idea behind multi-label image classification. We will then build our very own model using movie posters. You will be amazed by the impressive results our model generates. And if you’re an Avengers or Game of Thrones fan, there’s an awesome (spoiler-free) surprise for you in the implementation section.

Excited? Good, let’s dive in!


Table of Contents

  1. What is Multi-Label Image Classification?
  2. How is Multi-Label Image Classification different from Multi-Class Image Classification?
  3. Understanding the Multi-Label Image Classification Model Architecture
  4. Steps to Build your Multi-Label Image Classification Model
  5. Case Study: Solve a Multi-Label Image Classification Problem in Python


What is Multi-Label Image Classification?

Let’s understand the concept of multi-label image classification with an intuitive example. Check out the below image:

binary classification

The object in image 1 is a car. That was a no-brainer. Whereas, there is no car in image 2 – only a group of buildings. Can you see where we are going with this? We have classified the images into two classes, i.e., car or non-car.

When we have only two classes in which the images can be classified, this is known as a binary image classification problem.

Let’s look at one more image:

multi label image classification (scenary)

How many objects did you identify? There are way too many – a house, a pond with a fountain, trees, rocks, etc. So,

When we can classify an image into more than one class (as in the image above), it is known as a multi-label image classification problem.

Now, here’s a catch – most of us get confused between multi-label and multi-class image classification. Even I was bamboozled the first time I came across these terms. Now that I have a better understanding of the two topics, let me clear up the difference for you.


How is Multi-Label Image Classification different from Multi-Class Image Classification?

Suppose we are given images of animals to be classified into their corresponding categories. For ease of understanding, let’s assume there are a total of 4 categories (cat, dog, rabbit and parrot) in which a given image can be classified. Now, there can be two scenarios:

  1. Each image contains only a single object (either of the above 4 categories) and hence, it can only be classified in one of the 4 categories
  2. The image might contain more than one object (from the above 4 categories) and hence the image will belong to more than one category

Let’s understand each scenario through examples, starting with the first one:

multi class image classification

Here, we have images which contain only a single object. The keen-eyed among you will have noticed there are 4 different types of objects (animals) in this collection.

Each image here can only be classified either as a cat, dog, parrot or rabbit. There are no instances where a single image will belong to more than one category.

1. When there are more than two categories in which the images can be classified, and

2. An image does not belong to more than one category


If both of the above conditions are satisfied, it is referred to as a multi-class image classification problem.

Now, let’s consider the second scenario – check out the below images:

multi label image classification

  • First image (top left) contains a dog and a cat
  • Second image (top right) contains a dog, a cat and a parrot
  • Third image (bottom left) contains a rabbit and a parrot, and
  • The last image (bottom right) contains a dog and a parrot

These are all labels of the given images. Each image here belongs to more than one class and hence it is a multi-label image classification problem.

These two scenarios should help you understand the difference between multi-class and multi-label image classification. Connect with me in the comments section below this article if you need any further clarification.

Before we jump into the next section, I recommend going through this article – Build your First Image Classification Model in just 10 Minutes!. It will help you understand how to solve a multi-class image classification problem.


Steps to Build your Multi-Label Image Classification Model

Now that we have an intuition about multi-label image classification, let’s dive into the steps you should follow to solve such a problem.

The first step is to get our data in a structured format. This applied to be both binary as well as multi-class image classification.

You should have a folder containing all the images on which you want to train your model. Now, for training this model, we also require the true labels of images. So, you should also have a .csv file which contains the names of all the training images and their corresponding true labels.

We will learn how to create this .csv file later in this article. For now, just keep in mind that the data should be in a particular format. Once the data is ready, we can divide the further steps as follows:


Load and pre-process the data

First, load all the images and then pre-process them as per your project’s requirement. To check how our model will perform on unseen data (test data), we create a validation set. We train our model on the training set and validate it using the validation set (standard machine learning practice).


Define the model’s architecture

The next step is to define the architecture of the model. This includes deciding the number of hidden layers, number of neurons in each layer, activation function, and so on.


Train the model

Time to train our model on the training set! We pass the training images and their corresponding true labels to train the model. We also pass the validation images here which help us validate how well the model will perform on unseen data.


Make predictions

Finally, we use the trained model to get predictions on new images.


Understanding the Multi-Label Image Classification Model Architecture

Now, the pre-processing steps for a multi-label image classification task will be similar to that of a multi-class problem. The key difference is in the step where we define the model architecture.

We use a softmax activation function in the output layer for a multi-class image classification model. For each image, we want to maximize the probability for a single class. As the probability of one class increases, the probability of the other class decreases. So, we can say that the probability of each class is dependent on the other classes.

But in case of multi-label image classification, we can have more than one label for a single image. We want the probabilities to be independent of each other. Using the softmax activation function will not be appropriate. Instead, we can use the sigmoid activation function. This will predict the probability for each class independently. It will internally create n models (n here is the total number of classes), one for each class and predict the probability for each class.

Using sigmoid activation function will turn the multi-label problem to n – binary classification problems. So for each image, we will get probabilities defining whether the image belongs to class 1 or not, and so on. Since we have converted it into a n – binary classification problem, we will use the binary_crossentropy loss. Our aim is to minimize this loss in order to improve the performance of the model.

This is the major change we have to make while defining the model architecture for solving a multi-label image classification problem. The training part will be similar to that of a multi-class problem. We will pass the training images and their corresponding true labels and also the validation set to validate our model’s performance.

Finally, we will take a new image and use the trained model to predict the labels for this image. With me so far?


Case Study: Solving a Multi-Label Image Classification Problem

Congratulations on making it this far! Your reward – solving an awesome multi-label image classification problem in Python. That’s right – time to power up your favorite Python IDE!

Let’s set up the problem statement. Our aim is to predict the genre of a movie using just its poster image. Can you guess why it is a multi-label image classification problem? Think about it for a moment before you look below.

A movie can belong to more than one genre, right? It doesn’t just have to belong to one category, like action or comedy. The movie can be a combination of two or more genres. Hence, multi-label image classification.

The dataset we’ll be using contains the poster images of several multi-genre movies. I have made some changes in the dataset and converted it into a structured format, i.e. a folder containing the images and a .csv file for true labels. You can download the structured dataset from here. Below are a few posters from our dataset:


You can download the original dataset along with the ground truth values here if you wish.

Let’s get coding!

First, import all the required Python libraries:

Now, read the .csv file and look at the first five rows:

multi label dataset

There are 27 columns in this file. Let’s print the names of these columns:

train columns




The genre column contains the list for each image which specifies the genre of that movie. So, from the head of the .csv file, the genre of the first image is Comedy and Drama.

The remaining 25 columns are the one-hot encoded columns. So, if a movie belongs to the Action genre, its value will be 1, otherwise 0. The image can belong to 25 different genres.

We will build a model that will return the genre of a given movie poster. But before that, do you remember the first step for building any image classification model?

That’s right – loading and preprocessing the data. So, let’s read in all the training images:

A quick look at the shape of the array:

image shape


There are 7254 poster images and all the images have been converted to a shape of (400, 300, 3). Let’s plot and visualize one of the images:

This is the poster for the movie ‘Trading Places’. Let’s also print the genre of this movie:


This movie has a single genre – Comedy. The next thing our model would require is the true label(s) for all these images. Can you guess what would be the shape of the true labels for 7254 images?

Let’s see. We know there are a total of 25 possible genres. For each image, we will have 25 targets, i.e., whether the movie belongs to that genre or not. So, all these 25 targets will have a value of either 0 or 1.

We will remove the Id and genre columns from the train file and convert the remaining columns to an array which will be the target for our images:


The shape of the output array is (7254, 25) as we expected. Now, let’s create a validation set which will help us check the performance of our model on unseen data. We will randomly separate 10% of the images as our validation set:

The next step is to define the architecture of our model. The output layer will have 25 neurons (equal to the number of genres) and we’ll use sigmoid as the activation function.

I will be using a certain architecture (given below) to solve this problem. You can modify this architecture as well by changing the number of hidden layers, activation functions and other hyperparameters.

Let’s print our model summary:


Quite a lot of parameters to learn! Now, compile the model. I’ll use binary_crossentropy as the loss function and ADAM as the optimizer (again, you can use other optimizers as well):

Finally, we are at the most interesting part – training the model. We will train the model for 10 epochs and also pass the validation data which we created earlier in order to validate the model’s performance:

model training

We can see that the training loss has been reduced to 0.24 and the validation loss is also in sync. What’s next? It’s time to make predictions!

All you Game of Thrones (GoT) and Avengers fans – this one’s for you. Let’s take the posters for GoT and Avengers and feed them to our model. Download the poster for GOT and Avengers before proceeding.

Before making predictions, we need to preprocess these images using the same steps we saw earlier.

Now, we will predict the genre for these posters using our trained model. The model will tell us the probability for each genre and we will take the top 3 predictions from that.

game of thrones

Impressive! Our model suggests Drama, Thriller and Action genres for Game of Thrones. That classifies GoT pretty well in my opinion. Let’s try our model on the Avengers poster. Preprocess the image:

And then make the predictions:


The genres our model comes up with are Drama, Action and Thriller. Again, these are pretty accurate results. Can the model perform equally well for Bollywood movies ? Let’s find out. We will use this Golmal 3 poster.

You know what to do at this stage – load and preprocess the image:

And then predict the genre for this poster:


Golmaal 3 was a comedy and our model has predicted it as the topmost genre. The other predicted genres are Drama and Romance – a relatively accurate assessment. We can see that the model is able to predict the genres just by seeing their poster.


Next Steps and Experimenting on your own

This is how we can solve a multi-label image classification problem. Our model performed really well even though we only had around 7000 images for training it.

You can try and collect more posters for training. My suggestion would be to make the dataset in such a way that all the genre categories will have comparatively equal distribution. Why?

Well, if a certain genre is repeating in most of the training images, our model might overfit on that genre. And for every new image, the model might predict the same genre. To overcome this problem, you should try to have an equal distribution of genre categories.

These are some of the key points which you can try to improve the performance of your model. Any other you can think of? Let me know!


End Notes

There are multiple applications of multi-label image classification apart from genre prediction. You can use this technique to automatically tag images, for example. Suppose you want to predict the type and color of a clothing item in an image. You can build a multi-label image classification model which will help you to predict both!

I hope this article helped you understand the concept of multi-label image classification. If you have any feedback or suggestions, feel free to share them in the comments section below. Happy experimenting!

About the Author

Pulkit Sharma
Pulkit Sharma

My research interests lies in the field of Machine Learning and Deep Learning. Possess an enthusiasm for learning new skills and technologies.

Our Top Authors

Download Analytics Vidhya App for the Latest blog/Article

64 thoughts on "Build your First Multi-Label Image Classification Model in Python"

Vijit says: April 15, 2019 at 11:29 am
Thanks Pulkit for explaining the Multi-Label Image Classification in such an easy way. Reply
Pulkit Sharma
Pulkit Sharma says: April 15, 2019 at 11:46 am
Glad you liked it Vijit! Reply
Shital says: April 15, 2019 at 1:44 pm
Great Thanks for sharing Reply
Ibrahim K
Ibrahim K says: April 15, 2019 at 3:23 pm
Amazing, thank you so much Reply
Shrikant says: April 15, 2019 at 9:28 pm
How much memory takes when we convert the image to the array. I mean How much memory will hold X variable. I am running on the kaggle platform but I get a memory error. Reply
Pulkit Sharma
Pulkit Sharma says: April 16, 2019 at 6:23 pm
Hi Shrikant, As there are more than 7000 images, you will require good memory space. You can try to run these codes on google colab. Reply
Ian says: April 17, 2019 at 10:12 pm
so glad to have found this site Reply
Mark says: April 18, 2019 at 2:54 am
Hi Pulkit. Thank you so much for this article. You really have a gift of explaining and simplifying these things so that even I can understand them! Have you perhaps done a similar article/tutorial on object detection (multiple objects per image and their bounding boxes)? If so I would be very interested in reading it. Reply
Pulkit Sharma
Pulkit Sharma says: April 18, 2019 at 11:09 am
Hi Mark, Earlier, I have worked on the object detection project as well. Below are some links that you can refer which will clear your concepts of what object detection is and how to build your own object detection model. Here are the links: 1. A Step-by-Step Introduction to the Basic Object Detection Algorithms 2. A Practical Implementation of the Faster R-CNN Algorithm for Object Detection 3. A Practical Guide to Object Detection using the Popular YOLO Framework Reply
enes polat
enes polat says: April 18, 2019 at 3:42 pm
Hi PULKIT SHARMA Thank you first of all. When I tried on Colab, this code caused memory error. Espacially Error occured while I was creating X variable then splitting train test. How can I solve my problem? Thanks Reply
Shyam Chari
Shyam Chari says: April 20, 2019 at 12:55 pm
The article is written very well, i have a few questions about the train_image = [], i tried the kaggle kernel with GPU & without GPU but i keep running out of memory so the X data frame is not created, i also tried the google colab notebook also the same issue, is there a way to load all images without running out of memory, i.e some kind of batch processing the images. I thought of reducing the number of images from the data set itself by removing randomly 1500 images from the data set. I would be helpful if you could help me. Thanks Reply
Pulkit Sharma
Pulkit Sharma says: April 22, 2019 at 3:45 pm
Hi Shyam, You can try to reshape the images to a smaller shape, let's say (224,224,3) or even less to reduce the size. Reply
Pulkit Sharma
Pulkit Sharma says: April 22, 2019 at 3:55 pm
Hi, As the image size is large (400,400,3) in this case, you can reduce this size which will reduce the memory consumption. You have to edit the following code: img = image.load_img('Multi_Label_dataset/Images/'+train['Id'][i]+'.jpg',target_size=(400,400,3)) Pass a smaller target size of let's say (224,224,3) or even smaller. If you are changing the size of the image, you also have to change the input shape while defining the architecture. Reply
Pankaj J
Pankaj J says: April 23, 2019 at 12:41 pm
Hi Pulkit, You have a style to explain concepts so easily. Thank you so much. I have following thought - Can we have any unsupervised method for this problem? Reply
Pulkit Sharma
Pulkit Sharma says: April 23, 2019 at 6:29 pm
Hi Pankaj, I am not sure whether an unsupervised method will work on such problem of genre prediction using the posters. I personally believe that having a supervised learning approach for such task will help you to achieve a better model. But again, this is my personal opinion. You can try some unsupervised techniques on the same project and see whether it perform any better than the supervised approaches. Do share your findings with the community here as it will be helpful for everyone. Reply
Leo says: April 26, 2019 at 5:26 pm
Hi Pulkit, This is a great tutorial and thank you very much for sharing this! This one motivated me to write the same architecture and tsest it on PyTorch. One thing I do not get is that in your summary report (right after you defined your network architecture) and you present a summary of it, the shapes of your output are not consistent ,e.g. after your first convolutional step you get an output size of 396 x 296, which should be 396 x 396. That shouldn't be happening without any padding/stride, right? Maybe you wanted to read your images in 400x300x3 instead of 400x400x3? With this input, your numbers add up perfectly! Plus, Ithink I have a method to avoid overfitting in the loss function. Reply
Pulkit Sharma
Pulkit Sharma says: May 01, 2019 at 5:56 pm
Thank you Leo for pointing it out. Actually, in the beginning, I trained the model on images of shape (400,300,3). Then, I changed this shape to (400,400,3) so missed to replace the summary part. I have updated the summary now. Also, I would be glad if you can share some methods which you are talking about to avoid overfitting. That would be helpful for the community as well. Reply
Dinesh Chauhan
Dinesh Chauhan says: May 03, 2019 at 10:06 pm
Thanks for detailed explaination Pulkit. I tried to reproduce this code on my Laptop & Google Colab but in both cases RAM maxed out (20 GB). Any idea on hardware/cloud side so that I can spin new VM. Also you divided img by 255 "img = img/255" could you explain why ? Reply
Pulkit Sharma
Pulkit Sharma says: May 04, 2019 at 11:46 am
Hi Dinesh, You can try to reduce the shape of your images which will reduce the storage space. I have divided the pixel values of all the images by the max pixel value which is 255. This will bring the pixel values in the range of 0 to 1 and this helps to make our training faster. So, it is always suggested to normalize your pixel values. Reply
Aishwariya Gupta
Aishwariya Gupta says: May 05, 2019 at 10:32 am
Hello! This is a really wonderful explanation. However while running this after model.add(Conv2D(filters=16, kernel_size=(5, 5), activation="relu", input_shape=(224,224,3))) it doesn't run since it shows an AttributeError: module 'tensorflow' has no attribute 'get_default_graph'. I checked on stack overflow and tried implementing changes however it still persisted. Could you give me some alternative approach to tackle this? Reply
Pulkit Sharma
Pulkit Sharma says: May 05, 2019 at 12:37 pm
Hi Aishwariya, Please check the tensorflow version that you currently have. Updating it might resolve the issue. Or you can look at this discussion thread. Reply
PREM PRAKASH PATTNAIK says: May 18, 2019 at 7:44 am
Getting error at this line: X = np.array(train_image) Maxed out of memory. How to use in chunks ? I mean if i have millions of images, it would be impossible for a ram to load all of it at once. How to solve that issue ? we can store as numpy array in chunks in local hard-disk with .npy extension and then use it in chunks too. That would solve the memory issue i guess. Reply
Rahul says: May 22, 2019 at 2:56 pm
Hi Pulkit, Great explanation.Good job bro!!! Could you please help me with an issue as when i am training my model the loss is showing as 0? Reply
shangeth says: June 02, 2019 at 4:00 am
Hey, Nice post. But is accuracy_score a good metric to use for multi label classification?. As most of the labels are 0, so even an un trained/ 0 model will give a good accuracy score. for ex: label1 = [0,0,0,1,0,0,0,0,0,0,0,1] pred1 = [0,0,0,0,0,0,0,0,0,0,0,0] form a zero returning model. here the accuracy will be 83.33% As '1's are the deciding the performance of the model, we should use some metric which consideres the positive predictions and labels like precision,recall..... Reply
Tom says: June 27, 2019 at 3:59 am
Great tutorial, I like it and very good explanations. Is there any recommendation how to run it on lower-memory cpus? Can I simply create Keras checkpoints and use smaller training sets (e.g. 1000 images with 90/10 test-split) and train it in multiple steps by reloading the weights file? Reply
prudviraj says: July 06, 2019 at 11:56 pm
Hi, Help me to understand technical details. how are we learning images with multiple labels? In a nutshell, are we learning {image, [g1, g2,g3]} or {[image1, g1], [image1, g2], [image1, g3]}. if we use the first one that will be simple image classification (that doesn't make sense!!!). later one may confuse the model while training if we use for some 1000 or 2000 classes. how to cope up with this situation. Reply
Ruchika says: July 12, 2019 at 12:33 pm
Hi Pulkit Kindly post codes for building image dataset into .csv file as required in multi label image classification problem. Reply
Pulkit Sharma
Pulkit Sharma says: July 12, 2019 at 3:20 pm
Hi Ruchika, The link to download the dataset (images along with the csv file) has been provided in the article itself. Here is the link for your reference: https://drive.google.com/file/d/1dNa_lBUh4CNoBnKdf9ddoruWJgABY1br/view Reply
Ruchika says: July 15, 2019 at 5:50 pm
Hi pulkit I want to convert my imagedatset into .csv file. I need your help in that. Kindly share some codes which will do the nedful conversion required by multi label problem. waiting for your reply Thanks Reply
Judy says: July 18, 2019 at 6:40 am
PULKIT , you did a great job. I am doing some research on industry inventory management. There are over 2000 kind of components to identify and counting by using artificial intelligence. your research really provide me some great hints. Thank you very much. I will use your article as reference in my thesis. Thank you very much. Reply
Pulkit Sharma
Pulkit Sharma says: July 22, 2019 at 3:10 pm
Hi Prem, Yes! you are correct. Instead of loading all the images at once, you can load them in chunks. But the computation power of the system also plays a key role in deep learning. Having a higher computation will always be a plus if you are training your deep learning model. Reply
Pulkit Sharma
Pulkit Sharma says: July 22, 2019 at 3:12 pm
Glad that it is useful to you! Regarding the loss of the model, which loss function are you using and what are the arguments that you are passing while calculating the loss? Reply
Pulkit Sharma
Pulkit Sharma says: July 22, 2019 at 3:18 pm
Hi Shangeth, It is more of an imbalanced problem. If we have an imbalanced class problem, then yes, using accuracy is not a good option and you can use precision, recall or F1 score instead. Reply
Pulkit Sharma
Pulkit Sharma says: July 22, 2019 at 3:21 pm
Hi Tom! First of all thank you for your feedback on the article. If you do not have high memory to run these models, then I would suggest using Google Colab instead of training model on your local system. They provide free GPU as well so the training will be faster. Reply
Pulkit Sharma
Pulkit Sharma says: July 22, 2019 at 3:25 pm
Hi, We are following the second approach which you have mentioned. And yes as the number of classes increases, it will become harder and harder for the models to learn the insights and hence we have to build more complex models. But in today's world, the models are smart enough to understand and learn in case of 1000s of classes as well. Reply
Pulkit Sharma
Pulkit Sharma says: July 22, 2019 at 3:27 pm
Glad you found it useful Judy! Reply
Ekanshu says: July 31, 2019 at 6:53 pm
Hi Pulkit, Nice post. Link for the structured dataset is not working. Can you please update the link so that I download it. Thanks. Reply
Pulkit Sharma
Pulkit Sharma says: July 31, 2019 at 6:59 pm
Hi Ekanshu, The link is working fine at my end. You can download the dataset using this link: https://www.cs.ccu.edu.tw/~wtchu/projects/MoviePoster/index.html Reply
Ekanshu says: August 01, 2019 at 1:15 pm
Hi Pulkit, I have downloaded the data from the above link. I got metadata in .txt format. Can you please share the .csv file or share the code that will convert TXT file into CSV file. Reply
Pulkit Sharma
Pulkit Sharma says: August 01, 2019 at 2:27 pm
Hi Ekanshu, I have updated the link to download the structured data which contains the images as well as the csv file. Please check it again. Reply
Elisabeth Southgate
Elisabeth Southgate says: August 01, 2019 at 7:44 pm
I appreciate you helping me learn more about image labeling. It is interesting that it can be used for both binary and multi-class image classification. My nephew is getting into all of this. He will be interested to know that you can do both binary and multi-class images. Reply
Ekanshu says: August 02, 2019 at 7:26 pm
Hi Pulkit, Thank you very much. Reply
Ashish says: August 09, 2019 at 8:36 am
Hi Pulkit, Thanks for such a amazing article it helped me to understand multi label image classification.Just want to know if you can help how we can use transfer-learning with such type of multi label classification? Reply
Akhil Jaywant
Akhil Jaywant says: September 10, 2019 at 7:19 pm
I want to know that what objective is achieved on training a dataset containing a totally different kind of posters containing just images , I mean what all parameters will it be trained on..And then you would pass a different poster of a movie and ask it to return the genres. Personally I feel there is nothing common in the posters, I mean there are no similar parameters on the trained images and testing one. Change my mind.... Reply
Pulkit Sharma
Pulkit Sharma says: September 12, 2019 at 12:09 pm
Hi Akhil, There might be some similarities between posters of same genre. For example, posters of horror movies are generally dark, whereas if the genre is comedy, generally the posters are brighter. People in the posters are generally happy when it is a comedy movie and if it is horror movie, people might be tensed or in fear. So, there can be multiple types of similarity between the posters of same genre. This is what I tried to find using this model and it seemed to have worked well. Reply
Ram says: September 25, 2019 at 6:19 pm
Hi Pulkit, Thanks for the great article. I just have one doubt in general about Multilabel classification - If we also have few sets of images which doesn't belong to any of the labels in the training data do we need to have a separate label as "No label" for differentiating these images or if the predicted probabilities for all other labels for an image is less than threshold can we consider it as no label image? Please let me know your thoughts and if we have any resources for this kind of problem. Thanks Reply
Pulkit Sharma
Pulkit Sharma says: September 25, 2019 at 6:26 pm
Hi Ram, There is no need to introduce a new label at the time of training. As you have mentioned, if all the probabilities are less than the threshold, in that case, you can consider that the image does not belong to any of the available tags. Reply
HARIPRIYA says: October 01, 2019 at 2:12 pm
Hi Pulkit, I have downloaded the dataset and tried running the program, when I convert train_image list to numpy array X, I got memory error in spyder anaconda platform. So, I have uploaded the images to google drive and tried running in google colab. But still the images loading to train_image list stops at 89%. I reconnected this for 3 times and tried. The connection stops at that time. Can you give me any idea on how to solve this ? Reply
Pulkit Sharma
Pulkit Sharma says: October 01, 2019 at 6:09 pm
Hi Haripriya, Since there are more than 7200 images and each have a size of (400,400,3), you might get memory error. The memory error is because the RAM is getting filled entirely before even loading the images. In this case, you can either try to increase the ram or else you can reduce the size of the images. To reduce the size, you have to make the following changes while reading the images: img = image.load_img('Multi_Label_dataset/Images/'+train['Id'][i]+'.jpg',target_size=(224,224,3)) Here I have changed the target_size to (224,224,3), you can increase or decrease this size as well. Reply
orde says: October 10, 2019 at 9:54 pm
I am not able to load the images with the code you provided: train_image = [] for i in tqdm(range(train.shape[0])): img = image.load_img('Multi_Label_dataset/Images/'+train['Id'][i],target_size=(224,224,3)) img = image.img_to_array(img) img = img/255 train_image.append(img) X = np.array(train_image) I keep getting the following error. I've tried different things to no avail. No such file or directory: ' Multi_Label_dataset/Images/tt0086425' Reply
Pulkit Sharma
Pulkit Sharma says: October 14, 2019 at 5:26 pm
Hi orde, You have to pass the correct path to read the images. Reply
Ahmed maher
Ahmed maher says: October 27, 2019 at 2:21 pm
Hi I have another an idea , you can reduce the number of images to be, let say 3000 and adjust the train.csv file as well. taking the consideration that each label (address) must point to an actual image in Images folder. I am ready for any further clarification Reply
Ahmed Maher
Ahmed Maher says: October 27, 2019 at 3:35 pm
Hi , You have made my day Thank you Reply
Pulkit Sharma
Pulkit Sharma says: October 30, 2019 at 1:00 pm
Glad you liked it Ahmed! Reply
Estefania says: November 02, 2019 at 3:49 pm
Thank You So Much Sharma. This article motivate me to learn more about image processing. Reply
Pulkit Sharma
Pulkit Sharma says: November 04, 2019 at 1:45 pm
Glad you liked it! Reply
Khani says: November 20, 2019 at 11:48 pm
Hi, i have a question. Can I use this method to classify an image where multiple objects of the same class are on the picture? Or can this even solved with multi-class classification? I have pictures with a box with multiple same class objects in it. Those pictures I would like to take to train, is that possible or do I need a set where only one object is in the picture? Thanks Reply
Pulkit Sharma
Pulkit Sharma says: November 21, 2019 at 2:03 pm
Hi Khani, This method will only classify whether an object is present or not. It will not be able to classify if an object is present multiple times. That can be done using object detection algorithms which will detect each object from the image depending on the training set. Reply
Nisarg Mehta
Nisarg Mehta says: November 22, 2019 at 12:09 am
Hi Pulkit Great Article .. I had a question -- Can You please tell me how to convert image dataset in .csv file? Is there any code for it..? Reply
Pulkit Sharma
Pulkit Sharma says: November 22, 2019 at 11:37 am
Hi Nisarg, You can create a csv file but the code will entirely depend on the format of the dataset. There is no specific code for this, you have to write the code according to the format of your data. Reply
Bryan says: October 15, 2022 at 7:03 am
Good explanation! By the way I want to ask something, the dataset is multi-classification right ? (value is 0 or 1, exist or not), but when you tested it the result is probability (its a regression?), so the result for this model is 25 regression value? If so, I would be so happy because right now im making a multi-label image regression model (predicting the composition of 6 types of algae in a pond image), not multi-label image classification model Reply
Bryan Immanuel
Bryan Immanuel says: October 15, 2022 at 7:39 am
Also can i get the dataset? because the link above doesnt work. Thanks Reply
Blair says: February 20, 2023 at 2:03 pm
Same here. Thanks for your time for contributing this amazing tutorial, just wonder where can we get the updated link for this file? thanks:) Reply
ABCD says: May 06, 2023 at 10:46 pm
Pulkit, this drive link for .CSV is not working as of today for me. Can you please provide an updated link for csv? Reply

Leave a Reply Your email address will not be published. Required fields are marked *