Why are GPUs necessary for training Deep Learning models?
Most of you would have heard exciting stuff happening using deep learning. You would have also heard that Deep Learning requires a lot of hardware. I have seen people training a simple deep learning model for days on their laptops (typically without GPUs) which leads to an impression that Deep Learning requires big systems to run execute.
However, this is only partly true and this creates a myth around deep learning which creates a roadblock for beginners. Numerous people have asked me as to what kind of hardware would be better for doing deep learning. With this article, I hope to answer them.
Note: I assume that you have a fundamental knowledge of deep learning concepts. If not, you should go through this article.
Table of Contents
- Fact #101: DL requires
a lot ofhardware
- Training a deep learning model
- How to train your model faster?
- CPU vs GPU
- Brief History of GPUs – how did we reach here
- Which GPU to use today?
- The future looks exciting
Fact #101: Deep Learning requires
a lot of hardware
When I first got introduced with deep learning, I thought that deep learning necessarily needs large Datacenter to run on, and “deep learning experts” would sit in their control rooms to operate these systems.
This is because every book that I referred or every talk that I heard, the author or speaker always say that deep learning requires a lot of computational power to run on. But when I built my first deep learning model on my meager machine, I felt relieved! I don’t have to take over Google to be a deep learning expert 😀
This is a common misconception that every beginner faces when diving into deep learning. Although, it is true that deep learning needs considerable hardware to run efficiently, you don’t need it to be infinite to do your task. You can even run deep learning models on your laptop!
Just a small disclaimer; the smaller your system, more is the time you will need to get a trained model which performs good enough. You may basically look like this:
Let’s just ask ourselves a simple question; why do we need more hardware for deep learning?
The answer is simple, deep learning is an algorithm – a software construct. We define an artificial neural network in our favorite programming language which would then be converted into a set of commands that run on the computer.
If you would have to guess which components of neural network do you think would require intense hardware resource, what would be your answer?
A few candidates from top of my mind are:
- Preprocessing input data
- Training the deep learning model
- Storing the trained deep learning model
- Deployment of the model
Among all these, training the deep learning model is the most intensive task. Lets see in detail why is this so.
Training a deep learning model
When you train a deep learning model, two main operations are performed:
- Forward Pass
- Backward Pass
In forward pass, input is passed through the neural network and after processing the input, an output is generated. Whereas in backward pass, we update the weights of neural network on the basis of error we get in forward pass.
Both of these operations are essentially matrix multiplications. A simple matrix multiplication can be represented by the image below
Here, we can see that each element in one row of first array is multiplied with one column of second array. So in a neural network, we can consider first array as input to the neural network, and the second array can be considered as weights of the network.
This seems to be a simple task. Now just to give you a sense of what kind of scale deep learning – VGG16 (a convolutional neural network of 16 hidden layers which is frequently used in deep learning applications) has ~140 million parameters; aka weights and biases. Now think of all the matrix multiplications you would have to do to pass just one input to this network! It would take years to train this kind of systems if we take traditional approaches.
How to train your neural net faster?
We saw that the computationally intensive part of neural network is made up of multiple matrix multiplications. So how can we make it faster?
We can simply do this by doing all the operations at the same time instead of doing it one after the other. This is in a nutshell why we use GPU (graphics processing units) instead of a CPU (central processing unit) for training a neural network.
To give you a bit of an intuition, we go back to history when we proved GPUs were better than CPUs for the task.
Before the boom of Deep learning, Google had a extremely powerful system to do their processing, which they had specially built for training huge nets. This system was monstrous and was of $5 billion total cost, with multiple clusters of CPUs.
Now researchers at Stanford built the same system in terms of computation to train their deep nets using GPU. And guess what; they reduced the costs to just $33K ! This system was built using GPUs, and it gave the same processing power as Google’s system. Pretty impressive right?
|Number of cores||1K CPUs = 16K cores||3GPUs = 18K cores|
We can see that GPUs rule. But what exactly is the difference between a CPU and a GPU?
Difference between CPU and GPU
To understand the difference, we take a classic analogy which explains the difference intuitively.
Suppose you have to transfer goods from one place to the other. You have an option to choose between a Ferrari and a freight truck.
Ferrari would be extremely fast and would help you transfer a batch of goods in no time. But the amount of goods you can carry is small, and usage of fuel would be very high.
A freight truck would be slow and would take a lot of time to transfer goods. But the amount of goods it can carry is larger in comparison to Ferrari. Also, it is more fuel efficient so usage is lower.
So which would you chose for your work?
Obviously, you would first see what the task is; if you have to pick up your girlfriend urgently, you would definitely choose a Ferrari over a freight truck. But if you are moving your home, you would use a freight truck to transfer the furniture.
Here’s how you would technically differentiate the two:
Here’s another video which would make your concept even clearer.
Note: GPU is mostly used for gaming and doing complex simulations. These tasks and mainly graphics computations, and so GPU is graphics processing unit. If GPU is used for non-graphical processing, they are termed as GPGPUs – general purpose graphics processing unit
Brief History of GPUs – how did we reach here
Now, you might be asking this question that why are GPUs so much rage right now. Let us travel through a brief history of development of GPUs
Basically a GPGPU is a parallel programming setup involving GPUs & CPUs which can process & analyze data in a similar way to image or other graphic form. GPGPUs were created for better and more general graphic processing, but were later found to fit scientific computing well. This is because most of the graphic processing involves applying operations on large matrices.
The use of GPGPUs for scientific computing started some time back in 2001 with implementation of Matrix multiplication. One of the first common algorithm to be implemented on GPU in faster manner was LU factorization in 2005. But, at this time researchers had to code every algorithm on a GPU and had to understand low level graphic processing.
In 2006, Nvidia came out with a high level language CUDA, which helps you write programs from graphic processors in a high level language. This was probably one of the most significant change in they way researchers interacted with GPUs
Which GPU to use today?
Here I will quickly give a few know-hows before you go on to buy a GPU for deep learning.
The first thing you should determine is what kind of resource does your tasks require. If your tasks are going to be small or can fit in complex sequential processing, you don’t need a big system to work on. You could even skip the use of GPUs altogether. So, if you are planning to work mainly on “other” ML areas / algorithms, you don’t necessarily need a GPU.
If your task is a bit intensive, and has a handle-able data, a reasonable GPU would be a better choice for you. I generally use my laptop to work on toy problems, which has a slightly out of date GPU (a 2GB Nvidia GT 740M). Having a laptop with GPU helps me run things wherever I go. There are a few high end (and expectedly heavy) laptops with Nvidia GTX 1080 (a 8 GB VRAM) which you can check out at the extreme.
If you are regularly working on complex problems or are a company which leverages deep learning, you would probably be better off building a deep learning system or use a cloud service like AWS or FloydHub. We at Analytics Vidhya built a deep learning system for ourselves, for which we shared our specifications. Here’s the article.
If you are Google, you probably need another datacenter to sustain! Jokes aside, if your task is of a bigger scale than usual, and you have enough pocket money to cover up the costs; you can opt for a GPU cluster and do multi-GPU computing. There are also some options which may be available in the near future – like TPUs and faster FPGAs, which would make your life easier.
The future looks exciting
As mentioned above, there is a lot of research and active work happening to think of ways to accelerate computing. Google is expected to come out with Tensorflow Processing Units (TPUs) later this year, which promises an acceleration over and above current GPUs.
Similarly Intel is working on creating faster FPGAs, which may provide higher flexibility in coming days. In addition, the offerings from Cloud service providers (e.g. AWS) is also increasing. We will see each of them emerge in coming months.
In this article, we covered the motivations of using a GPU for deep learning applications and saw how to choose them for your task. I hope this article was helpful to you. If you have any specific questions regarding the topic, feel free to comment below or ask them on discussion portal.