Inserting , Dropping, Renaming and setting index for dataframe

Photo by Paolo Nicolello on Unsplash

This tutorial covers the basics of dataframe, this is an extension of my previous tutorial dataframe basics. In this tutorial, we will learn some interesting topics such as inserting a new column, dropping columns, renaming columns and finally setting index.

To insert a column into dataframe, we use the insert method as below


loc →number of the column, where we want to insert the column

column →name of the column

value → value to be inserted, it can be a single value or a list. If a single value is provided, all the records in the column will have the same value.

allow_duplicates →this is set to False by default, you can set it to True, you can insert a new column with an existing name

Let us have look at the dataset that is being used in this tutorial, using the head method.

To insert a column in this dataframe execute the below line of code.

If you observe, we have inserted a column named ‘Country’ with all the values as ‘Greece’

To insert multiple values, in the column we need to pass a list as below.

Country_list is a list holding values for the column Country, also when passing list we need to ensure the length of the list is equal to number of rows in the dataframe, else it would throw below error. Also, you can observe the newly inserted column took the second place as mentioned in the insert method.

Also, if you observe carefully, we have two columns with same name as country. This is only possible, when you set allow_duplicates to True. Else, it will throw below error.

Alternatively, we can insert a new column directly as below.

You can observe all the rows have the same value, if want to insert different values for each row you need to pass a list with same size as the number of rows in the dataframe.

There may be situations, we end up with unwanted columns and to get rid of these unwanted columns, we simply drop them.

We are dropping the column country, axis=1 indicates we are deleting column, if 0 it indicates rows, but we need to use axis=0 with index only.

We can drop multiple columns, by passing all the names via a list as below.

To delete the rows, we need to pass the index of the row and set axis=0.

We can observe, the index 0 row is missing.

To delete multiple rows in a dataframe pass a list of index positions as below.

The changes made above doesn’t make changes in the dataframe, you need to set the variable inplace=True if you want the dataframe to have changes made. Once, you set this variable the columns or rows will be dropped from the dataframe permanently, and the only way you can restore the dropped data is by re-loading the data again. So, be cautious when deleting rows or columns from a dataframe.

We use the rename method of dataframe to rename the columns and index of a dataframe.

You can observe, the column names are changed to lower case.

You observe the columns are changed to new names.

To rename the index, we pass the old and new indexes in a dictionary as below.

Alternatively, we can rename the index as below.

To set the index of the dataframe to a new desired column, we can use set_index method of dataframe.

To restore the index back to original, we can use reset_index as below.

The reset_index deletes the existing index and creates a new index starting from 0.

If you want to drop the old index, you need to set the drop keyword to true.

You can observe, the old index ‘Emp id’ is being dropped and index reset to 0.

In this tutorial, we learned Inserting columns, deleting rows and columns, Renaming columns and working with index.

Hope you enjoyed the tutorial. Keep Reading.. Keep Learning !!!

Data Science and machine learning enthusiast