SettingWithCopyWarning is one of the most common obstacles
people run into when learning pandas and even pandas itself does not
guarantee one single outcome for two lines of code that may look
This article explains why the warning is generated and shows you how to solve it.
The first thing to understand is that
SettingWithCopyWarning is a warning, and not an error.
It informs us that our operation might not have worked as expected and that we should check the result to make sure there isn't any mistake.
This is bad practice and
SettingWithCopyWarning should never be ignored. We should always take some time to understand why we are getting the warning before taking action.
import pandas as pd
Reading CSV file
Showing first five rows
Showing the total number of null values in column "Content_rating"
Showing those particular rows where column 'content_rating' has null values i.e NAN
Showing unique values of content rating where we can see that NOT RATED = 65 and it should be represented as missing values and it is best to replace with NAN
import numpy as np
Overwrite "NOT RATED" with NAN values
The problem with the above line of code is that it involves two operation and pandas is having difficulty to know whether this code is returning copy or view if it is a view it will change the underlying data but if it is a copy it will not affect the DataFrame which is happening in this case, henceforth we are getting warnings
So, we can see output of the above code which has still not changed means the code "movies[movies['content_rating']=='NOT RATED']['content_rating']=np.nan" has not made any changes so it is advisable to not ignore warnings.
The best way to improve the code or avoid warnings in the above line of code is to use "Loc" method
we see there is no warning while implementing loc method
We can see the output is changed to 68
Showing the rows where column "star_rating" is greater than 9
Using the assignment operator to assign value 150 in place of 142 in duration column of first row
In this output we can see that duration has been changed to 150 from 142 even we see that there was warnings showing
The problem is with the code "top_movies=movies.loc[movies['star_rating']>9]" which making it difficult for pandas to understand whether it is a view or copy
Improvement in the previous line of code by adding copy function
So we see in the above two examples how avoiding warning can lead to wrong representation of data