Netflix: Content Analysis using NLP and Plotly Express

#wordcloud #analytics #nlp #choropleth #netflix

Ameya Harshe Apr 05 2021 · 2 min read
Share this
Netflix and Chill!

We will initially import the required libraries:

1. pandas

2. numpy

3. wordcloud

4. plotly

5. matplotlib

Then we will import the dataset using read_csv function from pandas library

Then, we will identify null values by using isna() function and take a sum of it after which we will eliminate all the null values using dropna() function

Now, we will perform NLP (Natural Language Processing ) using WordCloud and STOPWORDS from wordcloud library 

The main question:

What is NLP?

NLP is a subfield of Artificial Intelligence concerned with interactions between computers and human language

Then, we will use matplotlib to plot the wordcloud images for:

1. Actors

2. Directors

3. Content description 

4. Genre

Majority of the actors' names start with these words:

1. James

2. David

3. Richard

4. Michael

5. John


Majority of the directors' names start with these words:

1. Peter

2. Michael

3. John

4. Chris

5. David


Genres generally preferred by audiences are:

1. International Movies

2. International Dramas (Most popular: Korean Drama)

3. Movies

4. Comedies


Keywords that are commonly used in every movie description:

1. Find

2. Life

3. Family

Movie Description

Now, we will be analysing which type of content (Rating based) is watched around the world in different countries

For that, we will be using Plotly Express from plotly library which consists of more than 30 functions and can perform more computations compared to graph_objects from the same library

Initially, we will create a new dataframe  and group it as per country, rating and release year using the groupby() function

After dataframe creation, we will create a choropleth map that will show what type of rating-based content is watched in different countries

What is choropleth map?

It is a thematic map used to represent statistical data by various colour shading patterns and symbols for predefined geographic areas

Choropleth map

Key insights from the map above:

1. India :  7+ (TV)

2. USA : Un-rated, 7+ and General audience (G)

3. Russia: 14+ (TV)

4. Bangladesh : 14+ (TV)

5. UK : PG-13, General audiences (G), R-Rated

6. South America : TV-14, General audience (G)

This would help in setting up recommendations next time a consumer from any country opens Netflix!

Have a great day folks!

Netflix and Chill...

Ameya Harshe

PGDM in Big Data Analytics 2019-21

Goa Institute of Management

Read next