When working with deep learning models, I have often found myself in a peculiar situation when there is not much data to train my model. It was in times like these when I came across the concept of image augmentation.
The image augmentation technique is a great way to expand the size of your dataset. You can come up with new transformed images from your original dataset. But many people use the conservative way of augmenting the images i.e. augmenting images and storing them in a numpy array or in a folder. I have got to admit, I used to do this until I stumbled upon the Keras ImageDataGenerator class.
Keras ImageDataGenerator is a gem! It lets you augment your images in real-time while your model is still training! You can apply any random transformations on each training image as it is passed to the model. This will not only make your model robust but will also save up on the overhead memory! Now let’s dive deeper and check out the different ways in which this class is so great for image augmentation.
I am assuming that you are already familiar with neural networks. If not, I suggest going through the following resources first:
Image augmentation is a technique of applying different transformations to original images which results in multiple transformed copies of the same image. Each copy, however, is different from the other in certain aspects depending on the augmentation techniques you apply like shifting, rotating, flipping, etc.
Applying these small amounts of variations on the original image does not change its target class but only provides a new perspective of capturing the object in real life. And so, we use it is quite often for building deep learning models.
These image augmentation techniques not only expand the size of your dataset but also incorporate a level of variation in the dataset which allows your model to generalize better on unseen data. Also, the model becomes more robust when it is trained on new, slightly altered images.
So, with just a few lines of code, you can instantly create a large corpus of similar images without having to worry about collecting new images, which is not feasible in a real-world scenario. Now, let’s see how it’s done with the good old Keras library!
Keras ImageDataGenerator class provides a quick and easy way to augment your images. It provides a host of different augmentation techniques like standardization, rotation, shifts, flips, brightness change, and many more. You can find more on its official documentation page.
However, the main benefit of using the Keras ImageDataGenerator class is that it is designed to provide real-time data augmentation. Meaning it is generating augmented images on the fly while your model is still in the training stage. How cool is that!
ImageDataGenerator class ensures that the model receives new variations of the images at each epoch. But it only returns the transformed images and does not add it to the original corpus of images. If it was, in fact, the case, then the model would be seeing the original images multiple times which would definitely overfit our model.
Another advantage of ImageDataGenerator is that it requires lower memory usage. This is so because without using this class, we load all the images at once. But on using it, we are loading the images in batches which saves a lot of memory.
Now let’s have a look at a few augmentation techniques with Keras ImageDataGenerator class.
Image rotation is one of the widely used augmentation techniques and allows the model to become invariant to the orientation of the object.
ImageDataGenerator class allows you to randomly rotate images through any degree between 0 and 360 by providing an integer value in the rotation_range argument.
When the image is rotated, some pixels will move outside the image and leave an empty area that needs to be filled in. You can fill this in different ways like a constant value or nearest pixel values, etc. This is specified in the fill_mode argument and the default value is “nearest” which simply replaces the empty area with the nearest pixel values.
It may happen that the object may not always be in the center of the image. To overcome this problem we can shift the pixels of the image either horizontally or vertically; this is done by adding a certain constant value to all the pixels.
ImageDataGenerator class has the argument height_shift_range for a vertical shift of image and width_shift_range for a horizontal shift of image. If the value is a float number, that would indicate the percentage of width or height of the image to shift. Otherwise, if it is an integer value then simply the width or height are shifted by those many pixel values.
Flipping images is also a great augmentation technique and it makes sense to use it with a lot of different objects.
ImageDataGenerator class has parameters horizontal_flip and vertical_flip for flipping along the vertical or the horizontal axis. However, this technique should be according to the object in the image. For example, vertical flipping of a car would not be a sensible thing compared to doing it for a symmetrical object like football or something else. Having said that, I am going to flip my image in both ways just to demonstrate the effect of the augmentation.
It randomly changes the brightness of the image. It is also a very useful augmentation technique because most of the time our object will not be under perfect lighting condition. So, it becomes imperative to train our model on images under different lighting conditions.
Brightness can be controlled in the ImageDataGenrator class through the brightness_range argument. It accepts a list of two float values and picks a brightness shift value from that range. Values less than 1.0 darkens the image, whereas values above 1.0 brighten the image.
The zoom augmentation either randomly zooms in on the image or zooms out of the image.
ImageDataGenerator class takes in a float value for zooming in the zoom_range argument. You could provide a list with two values specifying the lower and the upper limit. Else, if you specify a float value, then zoom will be done in the range [1-zoom_range,1+zoom_range].
Any value smaller than 1 will zoom in on the image. Whereas any value greater than 1 will zoom out on the image.
There are many more augmentation techniques that I have not covered in this article but I encourage you to check them out in the official documentation.
Before we explore the methods of the ImageDataGenerator class, let’s first get familiar with the dataset we will be working with.
We have a dataset of emergency (like fire trucks, ambulances, police vehicles, etc.) and non-emergency vehicles. There are a total of 1646 unique images in the dataset. Since these are not a lot of images to create a robust neural network, it will act as a great dataset to test the potential of the ImageDataGenerator class!
You can download the dataset from here.
So far we have seen how to augment images using ImageDataGenerator().flow() method. However, there are a few methods in the same class which are actually quite helpful and which implement augmentation on the fly.
Let’s first come up with the augmentations we would want to apply to the images.
The flow_from_directory() method allows you to read the images directly from the directory and augment them while the neural network model is learning on the training data.
The method expects that images belonging to different classes are present in different folders but are inside the same parent folder. So let’s create that first using the following code:
Now, let’s try to augment the images using the class method.
The following are few important parameters of this method:
The flow_from_dataframe() is another great method in the ImageDataGenerator class that allows you to directly augment images by reading its name and target value from a dataframe.
This comes very handily when you have all the images stored within the same folder.
This method also has a few parameters that need to be explained in brief:
Right, you have created the iterators for augmenting the images. But how do you feed it to the neural network so that it can augment on the fly?
For that, all you need to do is feed the iterator as an input to the Keras fit_generator() method applied on the neural network model along with epochs, batch_size, and other important arguments. We will be using a Convolutional Neural Network(CNN) model. The fit_generator() method fits the model on data that is yielded batch-wise by a Python generator.
You can use either of the iterator methods mentioned above as input to the model.
Let’s take a moment to understand the arguments of the fit_generator() method first before we start building our model.
Note that this method might be removed in a future version of Keras. The Keras fit() method now supports generators and so we will be using the same to train our model.
Now that we have discussed the various methods of Keras ImageDataGenerator class, it is time to build our own CNN model and see how well the class performs. We will compare the performance of the model both, with and without augmentation to get an idea of how helpful augmentation is.
Let’s first import the relevant libraries.
Now let’s prepare the dataset for the model. Here I split the original batch of images into train and validation parts. 90% will be used for training and 10% will be used for validation.
Then we can append them into a list and prepare them to be input to the model.
Let’s create the architecture for our CNN model. The architecture is simple. It has three Convolutional layers and two fully connected layers.
Now that we have created the architecture for our model, we can compile it and start training it.
Here is the result I got after training the model for 25 epochs without augmenting the images.
You can probably notice overfitting happening here. Let’s see if we can alleviate this using augmentation.
I am going to use the flow() method to augment the images on the fly. You can use other methods discussed in the previous section. We will be applying the following augmentation techniques to the training images.
Finally, let’s train our model and see if the augmentations had any positive impact on the result!
After 25 epochs we get the following loss and accuracy for the model on the augmented data.
As you can notice here, the training and validation loss are both decreasing here with little divergence as compared to the outcome from the previous model. Also, notice how the training and validation accuracy is increasing together. They are comparatively closer than before the augmentation.
Such is the power of augmentation that our model is able to generalize on the images now!
I urge you to experiment around with this dataset yourself and see if you can improve the accuracy of the model!
You can learn how to build CNN models in detail in this awesome article.
And just like that, you have learned to augment images in the easiest and quickest way possible. To summarize, in this article we learned how to avoid the conventional image augmentation technique by using the Keras ImageDataGenerator. Now forget the old school way of augmenting your images and saving them in a separate folder. You now know how to augment images on the fly!
If you are looking to learn Image augmentation using PyTorch, I recommend going through this in-depth article.
Going further, if you are interested in learning more about deep learning and computer vision, I recommend you check out the following awesome courses curated by our team at Analytics Vidhya:
You can apply many more augmentation techniques than the ones discussed here that suit your image dataset and feel free to share your insights in the comments below.
ImageDataGenerator is like a tool that helps us create more examples of images to train our computer model. It takes existing images and applies different changes to them, like rotating or flipping them, making them bigger or smaller, and so on. This helps our model learn better by seeing more diverse examples, so it can recognize objects in new pictures more accurately
The flow
method in ImageDataGenerator
takes input data and their corresponding labels directly from memory. On the other hand, the flow_from_directory
method reads the input data and labels from a directory structure. The latter is useful when dealing with large datasets where images are organized in folders representing their respective classes, making it easier to load and process the data in batches
Lorem ipsum dolor sit amet, consectetur adipiscing elit,
Very nice and helpful. How can I get the total amount of training data included the generated augmentated images?
Helpful article, thanks for sharing :)
Hi, very useful and helpful article. Thanks for sharing.....But what is the difference between traditional data augmentation and ImageDataGenerator?