Learn everything about Analytics

Home » Python – Map, Reduce, Filter in 2 Minutes for Data Science Beginners

Python – Map, Reduce, Filter in 2 Minutes for Data Science Beginners

This article was published as a part of the Data Science Blogathon

Introduction

We are continuing our Python: Understanding in 2 minutes series where we cover the medium-level topics that are also frequently asked in Python and Data Science interviews. Last time, we talked about an important topic called Generators and Iterators in 2 Minutes. This series is dedicated to aspiring data scientists who want to take the “next step” in Python after learning the basics. Today, we’ll continue our discussion with yet another important topic called Map, Reduce, and filter.

about Map reduce
Source: Github Gist roycoding

 

Map

If we have to explain in one sentence, “Map” applies a function or certain operation to all the items in a particular input and reflects changes in all the input elements.

To understand this concept, let’s look into a simple example first.

How do you square all the elements in a given list?

First, let’s see a “normal” approach.

You might do something like this:

items = [1, 2, 3, 4, 5]
squared = []
for i in items:
    squared.append(i**2)
squared
Output:
[1, 4, 9, 16, 25]

Now, let’s try the same example through the “map” approach.

Map approach syntax: map(operation_to_perform, list_of_inputs_for_the_operation)

Here “operation_to_perform” may be described as the operation that we want to perform to a list or other form of input. The second part of the map function takes in the input in the form of a list, tuple, etc. The earlier operation described by “operation_to_perform” is instantiated. Let’s look into an example to see this in practice.

items = [1, 2, 3, 4, 5]
squared = list(map(lambda x: x**2, items))
squared
Output:
[1, 4, 9, 16, 25]

But… wait for a second!

What is lambda doing here?

 

Python’s official docs define lambda as:

Lambda expressions (sometimes called lambda forms) are used to create anonymous functions. The expression lambda parameters: expression yields a function object.

I get it! This may feel a bit complex. But look at our previous example. We wanted to square all the items in our list, right? Well, if that’s the case, wouldn’t it make sense to define a function that tells the list to square themselves? This is where lambdas come in. We use lambda functions when we require a nameless (or an anonymous) function for a short period of time.

Think about this! We normally define functions by doing something like this:

def function_name():
    # body

But lambdas are a “one-time-use” function with no name. They perform a quick operation while using map() or filter() functions. And once the job’s done, that’s it! They don’t exist.

But, when to use def() and when should you use lambdas?

There are interesting Stackoverflow discussions/answers on the topic. You can check “Which is more preferable to use: lambda functions or nested functions (‘def’)?“.

Anyway, coming back to our map() operation, let’s try another example with a tuple.

a = (1,2,3,4)
out = map(lambda a: a**2, a)
out
Output

Now, try:

next(out)
Output
1

Again, try:

next(out)
Output:
4

How cool would it be to pass an entire function instead? Let’s see another example:

def squared(x):
    return x**2
def cubed(x):
    return x**3
functions = [squared, cubed]
for i in range(5):
    li = list(map(lambda x: x(i), functions))
    print(li)

Output:

[0, 0]
[1, 1]
[4, 8]
[9, 27]
[16, 64]

Reduce

The reduce operation takes a condition and applies that condition to input such that our output is “reduced” into a single value.

Let’s try the “normal” way first. We want to find the product of all the values in our list. In a normal case, we’d do something like this:

product = 1
list = [1, 2, 3, 4]
for num in list:
    product = product * num
product
Output:
24

Now, let’s try the “reduce” way:

from functools import reduce
product = reduce((lambda x, y: x * y), [1, 2, 3, 4])
product
Output:
24

Notice that we took the list and performed the lambda operation by defining our condition (x*y). Our problem was thus reduced to a single value.

By the way, we have imported functools here. Functools are basically a “higher order function” that works on other functions. As you might have guessed, it is built-in. It works with other functions without having to completely rewrite them. Similar to functools there’s also something called itertools whose primary job is to create iterators for efficient looping.

Filter

about filter
Image source: Camblab

As the name suggests, a filter method literally filters out the elements as per the given condition. The working procedure of a filter method is fairly straightforward. The function checks for a certain condition and decides whether each of the elements in the input fulfils the given condition or not. After checking the condition, the function verifies true or false accordingly.

Syntax:

filter(function, input)
Parameters:
function: function that tests if each element of an
input true or not.
input: an input that needs to be filtered, it can
be sets, lists, tuples, or containers of any iterators.
Returns:
returns an iterator that is already filtered.

Suppose we have a list of numbers and we want to filter “small” numbers. In this case, small means numbers that are less than 5. How do you apply “filter” to get the result?

li = [1,2,3,4,5,6,7,8,9,10]
small = list(filter(lambda x: x < 5, li))
print(small)

Output:

[1,2,3,4]

Easy isn’t it? Let’s try another example where we filter only the negative numbers.

number_list = range(-5, 5)
less_than_zero = filter(lambda x: x < 0, number_list)
next(less_than_zero)
Output:
-5
next(less_than_zero)
Output:
-4
next(less_than_zero)
Output:
-3
next(less_than_zero)
Output:
-2
next(less_than_zero)
Output:
-1
next(less_than_zero)
StopIteration                             Traceback (most recent call last)
 in ()
----> 1 next(less_than_zero)

Because that’s it. The filtered list only contains [-5, -4, -3, -2, -1].

In the end

Map, reduce. and filter expressions are certainly one of the most important concepts to understand for any Python developer who seeks to go beyond the basics. Also, in order to do efficient computation in large data, and as a data scientist, it’s quite important to understand this concept in a detailed way. Practice is the key!

About the Author:

Hi there! My name is Akash and I’ve been working as a Python developer for over 4 years now. In the course of my career, I began as a Junior Python Developer at Nepal’s biggest Job portal site, Merojob. Later, I was involved in Data Science and research at Nepal’s first ride-sharing company, Tootle. Currently, I’ve been actively involved in Data Science as well as Web Development with Django.

You can find my other projects on:

Connect me on LinkedIn

https://www.linkedin.com/in/akashadh/

Email: [email protected] | [email protected]

Website (Working on The Data Science Blog): https://akashadhikari.github.io/

End Notes:

Thanks for reading!

I hope enjoyed reading the article. If you found it useful, please share it among your friends on social media too. For any queries, suggestions, constructive criticisms, or any other discussion, please ping me here in the comments or you can directly reach me through email.

Previous blog posts in this series:

**args and **kwargs in 2 minutes

Generators and Iterators in 2 minutes

I am also planning to start The Data Science Blog on my Github page. I will try to include how real companies have been working in the field of Data Science, how to excel in Data Science and/or tech interviews, and other useful content related to Python and general programming. Feel free to check them once in a while. Happy coding!

The media shown in this article are not owned by Analytics Vidhya and are used at the Author’s discretion.

You can also read this article on our Mobile APP Get it on Google Play