Iterators , N-Smallest and Largest values in a Dataframe.
In this tutorial, we will learn some of the interesting topics that include iterating over rows and columns of dataframe, retrieve index of maximum and minimum in all columns, retrieve n largest and smallest values of selected columns and finally dealing with null values in a dataframe.
1. Iterating a dataframe:
Iterating rows and columns is one of the most common requirement and the pandas library provides wonderful methods to handle this in an efficient manner.
The head of the dataframe used in this tutorial is as below.
Let us go through the methods of performing iterations in a dataframe.
This method iterates over the column name i.e the label and its content as a series. When we apply the method iteritems() it returns a generator, that has the capability to return element by element.
The iteration can be done in two ways one using any loop that iterates all elements as per the condition of “for” or any other loop or using next method that iterates element by element.
You can observe, the column names and first five values are displayed in a series. Only one column name is shown here, truncating remaining output as entire output cannot be shared on this page.
You can observe only one element is displayed i.e. a tuple with label of column and series i.e. all values of the column.
The next method iterates over each item in ‘a’ that contains the generator object.
The next method iterates over the generator object until it reads all the elements in the generator, once all the element reading is complete, it stops moving to next element, if tried to move to the next element it throws an error.
This method iterates over each row in the dataframe and return all the column values along with index for that specific row. With each iteration the index automatically gets incremented, until it reaches the last row of the dataframe.
You can observer that 4 is the index and a series with values of all the columns in that row.
If you want to work with a single column, you do the following.
We are printing the values of column ‘A’ only using the loc method.
2. Retrieving index of maximum and minimum values of each column in dataframe
This is an interesting requirement, sometimes we might need the index position of maximum or minimum in a selected or all columns of a dataframe. We do max and min methods that display the maximum and minimum values of each column in the dataframe but they don’t return the index position of the max or min values. Pandas library provides a method called idxmax and idxmin to get the index of maximum and minimum of each column.
The output of idxmax contains the column name and the index of the maximum value of that column with in that dataframe. We can compare the output with maximum values of each column.
You can observe the max values of each column and the values based on index have the same values.
Similarly, we can retrieve the index of each column using idxmin() method.
3. Retrieve ’N’ Largest and Smallest Values
With the help of nlargest and nsmallest method, we can retrieve the top N values of each column. Both the methods take 2 parameters i.e the number of rows to be retrieved and the column name.
You can observe, the output contains top 3 maximum values for column ‘A’
If you specify multiple columns in a list, it will provide only for 1 column i.e. the first column name in the list.
To get the top ’N’ smallest elements in a row we use nsmallest.
4. Dealing with Null values
The isna() method of dataframe allows to validate the presence of null or NA values in the dataframe.
The ‘False’ indicates the value is not a NULL or NA . If True it indicates the presence of NULL or NA values.
We can count the number of NA or NULL values in each row using the sum() method.
On the contrary to isna(), we have notna() method that returns True if an element is not a null or NaN.
In this tutorial we learned about using iterator on a dataframe over rows and columns, retrieve index of maximum, minimum , n largest and n smallest values of selected columns, and validating na values and counting them in each column.
Hope you enjoyed this tutorial, keep reading and keep learning !!!