Warnings in Pandas

ramesh chourasia Nov 13 2020 · 3 min read
Share this

SettingWithCopyWarning is one of the most common obstacles people run into when learning pandas and even pandas itself does not guarantee one single outcome for two lines of code that may look identical.

This article explains why the warning is generated and shows you how to solve it.

The first thing to understand is that SettingWithCopyWarning is a warning, and not an error. It informs us that our operation might not have worked as expected and that we should check the result to make sure there isn't any mistake.

This is bad practice and SettingWithCopyWarning should never be ignored. We should always take some time to understand why we are getting the warning before taking action.

import pandas as pd

Reading CSV file

Showing first five rows

movies=pd.read_csv('http://bit.ly/imdbratings') 
movies.head()

Example 1

Showing the total number of null values in column "Content_rating"

movies['content_rating'].isnull().sum()

Showing those particular rows where column 'content_rating' has null values i.e NAN

movies[movies['content_rating'].isnull()]

Showing unique values of content rating where we can see that NOT RATED = 65 and it should be represented as missing values and it is best to replace with NAN

movies['content_rating'].value_counts()
import numpy as np

Overwrite "NOT RATED" with NAN values

movies[movies['content_rating']=='NOT RATED']['content_rating']=np.nan
Warning

The problem with the above line of code is that it involves two operation and pandas is having difficulty to know whether this code is returning copy or view if it is a view it will change the underlying data but if it is a copy it will not affect the DataFrame which is happening in this case, henceforth we are getting warnings

movies['content_rating'].isnull().sum()

So, we can see output of the above code which has still not changed means the code "movies[movies['content_rating']=='NOT RATED']['content_rating']=np.nan" has not made any changes so it is advisable to not ignore warnings.

The best way to improve the code or avoid warnings in the above line of code is to use "Loc" method

movies.loc[movies['content_rating']=='NOT RATED','content_rating']=np.nan

we see there is no warning while implementing loc method

movies['content_rating'].isnull().sum()

We can see the output is changed to 68

Example 2

Showing the rows where column "star_rating" is greater than 9

top_movies=movies.loc[movies['star_rating']>9]
top_movies

Using the assignment operator to assign value 150 in place of 142 in duration column of first row

top_movies.loc[0,'duration'] =150
Warning
top_movies

In this output we can see that duration has been changed to 150 from 142 even we see that there was warnings showing 

The problem is with the code "top_movies=movies.loc[movies['star_rating']>9]" which making it difficult for pandas to understand whether it is a view or copy

Improvement in the previous line of code by adding copy function

top_movies=movies.loc[movies['star_rating']>9].copy()
top_movies.loc[0,'duration'] =150
top_movies.head()

So we see in the above two examples how avoiding warning can lead to wrong representation of data

Comments
Read next