Python Generators and Iterators in 2 Minutes for Data Science Beginners
This article was published as a part of the Data Science Blogathon
We are continuing our Python: Understanding in 2 minutes series where we cover the medium-level topics that are also frequently asked in Python and Data Science interviews. Last time, we talked about an important topic called *args and **kwargs in 2 minutes. This series is dedicated to aspiring data scientists who want to take the “next step” in Python after learning the basics. Today, we’ll continue our discussion with yet another important topic called Generator and Iterator.
Iterators in Python
The dictionary meaning of “iterate” is to “perform a task repeatedly”.
In computer programming, Wikipedia defines iterators as:
An iterator is an object that enables a programmer to traverse a container, particularly lists.
So, we get the idea that iterators have got to do something with traversing the elements.
Now, what does it mean when something is iterable? It simply means that the items can be looped over. The list is an example of an iterable because we can loop the elements.
Let’s try a very simple example by considering this logic. We will first create a list and will try to implement Python’s built-in iter() method to our list.
my_list = [1,2,3,5,8,13] # converting to a list_iterator with iter() final_list = iter(my_list) final_list
The output would look something like this:
Let’s try to implement the next() function to our final_list.
This is the first item on our list.
Again, try doing the same thing:
This is the second item on our list.
One more time:
This is the third item on our list.
So, basically, we get the idea that the iter() method makes converts an iterable item (such as a list) to an iterator.
An iterable is an object that can be converted into an iterator (just the way we converted a list into a list_iterator).
An iterator is an object that has a next() method.
I assume we don’t have any confusion with iterable and iterator now.
Wikipedia defines Generators as:
One way of implementing iterators is to use a restricted form of coroutine, known as a generator. By contrast with a subroutine, a generator coroutine can yield values to its caller multiple times, instead of returning just once.
We shift our focus to Generators now. Python generators are a simple way of creating iterators. It is a function that returns an object (iterator) which we can iterate over (one value at a time). Let’s see a simple example without a generator and then try to implement a generator to the same operation. We would like to create a function that squares up all the elements in the list. Let’s see how we perform this operation normally.
def square(my_list): result =  for i in my_list: result.append(i**2) return result
And now, let’s pass a list and see the result.
final = square([1,2,3,4,5]) final
[1, 4, 9, 16, 25]
The process was pretty straightforward. We implemented a function where we initialized a new empty list called “result”. Then, we looped through “my_list” that we wanted to pass and we appended the squared result to our previously empty “result” list one by one. Pretty straightforward, right? And on top of that, it’s calculating everything at once. This means, it’s consuming more memory, and performance-wise, this process may be inefficient.
What if we try out the same thing with a generator?
def square(my_list): for i in my_list: yield i**2
And let’s pass a list:
final = square([1,2,3,4,5]) final
Notice, it created a generator object, and therefore, we can implement a next() function to our final variable. Let’s try:
Let’s do it again!
One more time:
What did we do differently here? In our second example, we created a function like the previous. Then, instead of initializing an empty list, we directly looped through our list to be passed on. In each loop, we yield the corresponding square value and that was it! Finally, we created a “final” variable to pass our intended list. This is our generator. Upon applying the next() method, we obtained the squared values every time. This means, not every result was calculated at once. This is called lazy evaluation in Python. In short, a lazy evaluation is a process in which an object is evaluated when it is needed, not when it is created.
What is “yield” doing?
Yield simply produces a sequence of values. We generally use yield when we want to iterate over a sequence, but the idea is that the yield method doesn’t store the entire sequence in memory but executes only when they are told. Note that you can have multiple yield statements inside a function but you cannot have multiple returns.
Closing up, Generators do not store all the values in memory. They yield one result at a time. They are best for calculating large result sets where you don’t want to allocate the memory for all results at the same time.
In the end
The concepts of iterators, iterable, yield, and generators are mostly intermediate-level stuff that beginners often aren’t familiar with. Also, from my professional experience, these topics are frequently asked in the interview process as well. Understanding these concepts demands practice.
About the Author:
Hi there! My name is Akash and I’ve been working as a Python developer for over 4 years now. In the course of my career, I began as a Junior Python Developer at Nepal’s biggest Job portal site, Merojob. Later, I was involved in Data Science and research at Nepal’s first ride-sharing company, Tootle. Currently, I’ve been actively involved in Data Science as well as Web Development with Django.
You can find my other projects on:
Connect me on LinkedIn
Website (Working on The Data Science Blog): https://akashadhikari.github.io/
Thanks for reading!
I hope enjoyed reading the article. If you found it useful, please share it among your friends on social media too. For any queries, suggestions, constructive criticisms, or any other discussion, please ping me here in the comments or you can directly reach me through email.
Previous blog posts in this series:
I am also planning to start The Data Science Blog on my Github page. I will try to include how real companies have been working in the field of Data Science, how to excel in Data Science and/or tech interviews, and other useful content related to Python and general programming. Feel free to check them once in a while.