Iterators , N-Smallest and Largest values in a Dataframe.

Photo by James Lee on Unsplash

In this tutorial, we will learn some of the interesting topics that include iterating over rows and columns of dataframe, retrieve index of maximum and minimum in all columns, retrieve n largest and smallest values of selected columns and finally dealing with null values in a dataframe.

1. Iterating a dataframe:

The head of the dataframe used in this tutorial is as below.

Let us go through the methods of performing iterations in a dataframe.

a. iteritems():

The iteration can be done in two ways one using any loop that iterates all elements as per the condition of “for” or any other loop or using next method that iterates element by element.

You can observe, the column names and first five values are displayed in a series. Only one column name is shown here, truncating remaining output as entire output cannot be shared on this page.

You can observe only one element is displayed i.e. a tuple with label of column and series i.e. all values of the column.

The next method iterates over each item in ‘a’ that contains the generator object.

The next method iterates over the generator object until it reads all the elements in the generator, once all the element reading is complete, it stops moving to next element, if tried to move to the next element it throws an error.

b. Iterrows():

You can observer that 4 is the index and a series with values of all the columns in that row.

If you want to work with a single column, you do the following.

We are printing the values of column ‘A’ only using the loc method.

2. Retrieving index of maximum and minimum values of each column in dataframe

The output of idxmax contains the column name and the index of the maximum value of that column with in that dataframe. We can compare the output with maximum values of each column.

You can observe the max values of each column and the values based on index have the same values.

Similarly, we can retrieve the index of each column using idxmin() method.

3. Retrieve ’N’ Largest and Smallest Values

You can observe, the output contains top 3 maximum values for column ‘A’

If you specify multiple columns in a list, it will provide only for 1 column i.e. the first column name in the list.

To get the top ’N’ smallest elements in a row we use nsmallest.

4. Dealing with Null values

The ‘False’ indicates the value is not a NULL or NA . If True it indicates the presence of NULL or NA values.

We can count the number of NA or NULL values in each row using the sum() method.

On the contrary to isna(), we have notna() method that returns True if an element is not a null or NaN.

In this tutorial we learned about using iterator on a dataframe over rows and columns, retrieve index of maximum, minimum , n largest and n smallest values of selected columns, and validating na values and counting them in each column.

Hope you enjoyed this tutorial, keep reading and keep learning !!!

Data Science and machine learning enthusiast