Speed up Dataframe Operations using Map, Filter, and Reduce

Vijay Krishna Nimmana
5 min readJan 5, 2021

Photo by David Clode on Unsplash

In this tutorial, we will learn how to execute single or multiple operations on a dataframe at a lightning-fast execution time. Over this tutorial, we will learn about lambda functions and how we use lambda functions inside Map, Reduce, and Filter methods that minimize the execution time of code to a great extent.

1. Lambda Functions

Lambda functions are also called anonymous functions i.e. you can execute these functions without defining a function and most importantly these functions are single-line functions.

let us write a simple lambda function to get the square of a number.

Here we are creating a function called square, that uses the lambda function.

If we print square, we can observe square now pointing to the lambda function, which can take a number x and return the square of number x.

When we pass a value of 5, it returns 25. Similarly, we can create one-line functions that eliminate the need for creating standard functions. There are some differences between regular and lambda functions they are as below

  1. Lambda functions return at least 1 value, where standard functions may or may not return a value.
  2. Lambda functions are mostly used for one-time use, whereas standard functions used many times.
  3. Lambda functions hold a single line of code, whereas standard functions have multi-line code.
  4. Standard functions have names, whereas lambda functions may have a name or they can be anonymous.

Map

Now we use the Map method, which implements the lambda function to perform certain tasks on a dataframe within a single line of code.

Let us have a glance at the head of the dataframe being used in this tutorial.

The syntax of the Map method is as follows map(function, iterable). An iterable is a python object that includes a list, dictionary, dataframe, and other objects.

Now, we will use the map method to create a column that holds the length of the First Name.

We are creating a new column named fname_length that holds the length of First Name. We can observe that the map method applies the lambda function on each row of the dataframe column “First Name”. Also, you might have observed map function is held by a list as the map function always returns a map object that holds an iterable, and you need an iterator to hold it.

You can observe, what happens when a map function is not packed inside an iterable i.e. it returns a map object.

Filter

The Filter method allows us to select the desired number of records in a dataframe if a particular condition is satisfied, just like a filter in an excel sheet where we select the desired rows basing on particular values in the column.

Filter method syntax is same as map i.e Filter(function, iterable)

Let us go through using an example, we will select the length of only the female first names in the dataframe.

We have applied the filter method to select records that have gender as ‘F’, and on top of the filter output, we have applied the map method that calculates the length of the first name.

You can observe, the code has 2 lambda functions, one for the selection of records and another one for calculating the length of First Names.

Reduce

Reduce has the same functionality as map and filter, however, the return type is not iterable for reduce, rather it returns a single value.

The syntax of reduce is as follows reduce(function, iterable). The syntax is the same as map and filter, however, the reduce will have a minimum of two inputs. Let us understand this by an example.

Our target is to calculate the cumulative sum of salaries of all the female employees.

To use the reduce method , we need to import functools .

Here, we select all the female records using filter method and then select the salary of all female employees in to list using the map method, and finally, we are applying to reduce to calculate the cumulative sum of all the elements in the list i.e. produced by map method.

The reduce method has a lambda function that has two inputs x and y to calculate the sum of all salaries on the list. Basically, we use reduce function to get cumulative sum or multiplication or some other basic task that needs to be performed which yields a single value.

In the tutorial, we have learned to perform some complex tasks on dataframes using a simple line code involving map, reduce, filter methods, and lambda expressions. Hope you enjoyed this tutorial. Keep reading and Keep learning !!!..

--

--