Learn everything about Analytics

Home » Beginners Guide to Threading in Python

Beginners Guide to Threading in Python

This article was published as a part of the Data Science Blogathon.

Threading in Python- Make your code smarter

Python is cool and we know it. But can it be made intelligent enough in processing aspects? Threading shows the way.

What is Parallelism?

Earlier machines used to have only one core within the CPU where all the processing used to take place.

Why is the number of cores important- It is because it tells about the capacity of the machine to handle multiple things. If you have 16 cores, then you can do 16 different operations at the exact same time.

Let us say you want to perform 16 different addition operations and assume each operation takes 1 second. In a single-core machine, you have to perform these operations one by one, which means the 16 addition operations get done in 16 seconds. Now in a 16 core machine, you can deploy the 16 addition operations to each core at the same time and get the job done in 1 second. This is called Parallelism.

Threading

Thread is a set of operations that needs to execute. The thread will be deployed in one of the cores in the CPU. Note- 1 thread can be deployed only in 1 core, it cannot be transferred/switched tooth

Let us have deployed two threads to a core. Note- A core can do only one thing at a time.

Threading

Now we can process the two threads in the way we want.

First, we can process half of the first thread.

first threading

Half of the next thread can be processed now.

next threading

The remaining half of the threads can be processed in a similar fashion.

processing threads

This what threading is- It is how do we run different things on the same CPU core. TLDR- Threading is about how do we handle the threads in a core.

Note- Threading does not involve running on multiple cores. It is about how to sequence the set of programs(threads) in the same core. In the above example, we are gonna tell the CPU how to sequence and execute the two threads on the given core.

You can actually see the number of threads that are currently running on your machine. In Windows- Go to Task Manager → Performance → CPU.

Why do we need threading? Why can’t we actually process one thread at a time and move on to the next?

Sometimes, a thread can go into a hanging, which means it is supposed to be idle at that point in time. The best example is time.sleep() function, which does nothing but waits for a given time. While one thread is idle/hanging, we can move on and process the other thread until the previous thread becomes active. TLDR- When one thread is waiting, you can process the other thread meanwhile.

This exactly what we call Concurrent Computing.

A small example

Let us explain the threading with a small example. Look at the code snippet below.

#Part One of the code
import time
print(1)
time.sleep(10)
print('Done sleeping)
print(2)

#Part Two of the code
print(3)

When you execute the whole code as a single thread, the code is executed step by step. First, the library is imported. Then ‘1’ is printed. The threads sleep for 10 seconds. Next ‘2’ is printed followed by ‘Done sleeping’ and finally ‘3’ printed.

Output-
1
Done sleeping
2
3

Now let us say you are executing the code as two threads. Part one as a thread and part two as a thread. (Note- By default, the Python code is not provisioned with threading- we need to import the threading library to do so.)

First, the library is imported, and then ‘1’ is printed. Now the thread goes to sleep. This is where threading comes into action.

threading 1

The core now switches to the other thread.

threading 2

Now ‘3’ is printed. Since all the process is done in Thread 2, the core now switches back to Thread 1 (which is still in sleep).

Threading 2

Now after the sleep duration, ‘2’ is printed.

So the output will be

Output-
1
3
Done sleeping
2

 

Practical example

As always, a concept is only clear when we explain it with a real-world example. I/O processes are the ones that benefit from threading.

Let us say you are watching Shawshank Redemption on Netflix. Now two things happen while you are watching Andy Dufresne suffering in jail- One- the application fetches data from the server, Two- The fetched data is shown to you like a movie on your screen.

example

Imagine what would be the situation without threading. You would have to wait for the video to get downloaded once in a while, watch the segment that was fetched, wait for the next segment to get downloaded, and so on.

Thanks to threading, we can divide the two processes into different threads. While one thread fetches data (that is, it is in hang/sleep mode), the other thread can show you the amazing performance of Morgan Freeman.

It is also much useful for you as a Data Scientist. For example, when you scrape the data from multiple web pages, you can simply deploy them in multiple threads and make it faster. Even when you push the data to a server, you can do so in multiple threads, so that when one thread is idle others can be triggered.

 

A detailed example

As said before, by default, the Python code is not provisioned with threading- we need to import the threading library to do so.

Take a look at the code.

import threading
import time

def sleepy_man(secs):
    print('Starting to sleep inside')
    time.sleep(secs)
    print('Woke up inside')

x = threading.Thread(target = sleepy_man, args = (1,))
x.start()
print(threading.activeCount())
time.sleep(1.2)
print('Done')

If you execute the above code, you will get the following output.

Output-
Starting to sleep inside
2
Woke up inside
Done

First, let me explain the code step by step. Then we will analyze the output.

  • You import the library’s threading and time. threading is the library that will allow us to create threads and time is the library that contains the function sleep.
  • The function sleepy_man takes in the one argument- secs. It first prints ‘Starting to sleep inside’. Then it sleeps for the secs seconds and then it prints ‘Woke up inside’.
  • This is the part where we start creating threads. We need to define by calling the class threading.Thread. We need to pass two arguments- target which is the function block that needs to be threaded, args which are the arguments that need to be passed to the function. A thread object is returned which is now stored in x.

 

x = threading.Thread(target = sleepy_man, args = (10,))

  • Now after defining the thread class, we need to call the function start() so as to initiate the threading

 

x.start()

  • Note- Now we have two threads. One default thread for the program and a
    new thread which we defined. Thus the active thread count is two.
  • Thus the statement should print ‘2’.

 

print(threading.activeCount())

Now let us look at the flow of control. Once you call the start() method, it triggers sleepy_man() and it runs in a separate thread. The main program will also run in parallel as another thread. The flow is shown in the image below.

the flow

Now let us increase the time in which the program sleeps inside the function.

import threading
import time

def sleepy_man(secs):
    print('Starting to sleep inside')
    time.sleep(secs)
    print('Woke up inside')

x = threading.Thread(target = sleepy_man, args = (4,))
x.start()
print(threading.activeCount())
time.sleep(1.2)
print('Done')

The output is as follows-

Starting to sleep inside
2
Done
Woke up inside

The flow is given in the diagram below-

flow diagram

Now let’s spice things a bit. Let us run a for loop that triggers multiple threads.

import threading
import time

def sleepy_man(secs):
    print('Starting to sleep inside - Iteration {}'.format(5-secs))
    time.sleep(secs)
    print('Woke up inside - Iteration {}'.format(5-secs))

for i in range(3):
    x = threading.Thread(target = sleepy_man, args = (5-i,))
    x.start()

print('Active threads- ', threading.activeCount())

At every iteration, we trigger a thread. Note that we pass the arguments 5, 4, 3 at 1st, 2nd, and 3rd iteration respectively. Thus the sleepy_man() sleeps 5 seconds, 4 seconds, and 3 seconds respectively.

Thus the output is as shown-

Starting to sleep inside - Iteration 0
Starting to sleep inside - Iteration 1
Starting to sleep inside - Iteration 2
Active threads-  4
Woke up inside - Iteration 2
Woke up inside - Iteration 1
Woke up inside - Iteration 0

Thus we have seen how multiple threads can be defined and triggered, ensuring a better way of processing which is very essential for heavy I/O operations.

The media shown in this article are not owned by Analytics Vidhya and is used at the Author’s discretion. 

You can also read this article on our Mobile APP Get it on Google Play