In this blog, we will describe how we handle a business problem related to Customer churn by in-depth study of the business, analyzing the data to explain the churn rate and then create a interactive dashboard to facilitate decision making. The specific process includes
(2) Business Case
(3) Requirement Gathering and Documentation
(4) Data Summary and Exploratory Analysis
In the last decade, we have seen an exponential rise in digitization with the number of mobile units increasing from 5 billion in 2010 to 8.5 billion in 2020.
In developing nations like India, we had a massive surge of mobility users since 2015 with the lowering of data prices. With the enormous increase in the number of customers using telephone services, the existing operators are losing share in today's market because of tougher competition and highly demanding customers. The marketing division for a telco company wants to attract more new customers and avoid contract termination from existing customers (churn rate). For the telco company to expand its clientele, its growth rate (number of new customers) must exceed its churn rate (number of customers existing). Some of the factors that caused existing customers to leave their telco companies are better price offers, faster internet services, and a more secure online experience from other companies.
A common misconception we have is that a customer is said to have churned when he/she moves from one operator to another. However, a customer can churn even while remaining with an operator. Any monetary loss to the telecom operator can be termed as a customer churn. The different segments of intra-operator churn can be seen below
A high churn rate will adversely affect a company’s profits and impede growth. Since the cost of acquiring new customers is much higher than retaining its existing customers, the company can use the churn rate analysis to provide discounts, special offers, and superior products to keep current customers.
It is also equally important for a business analyst to understand the business from the eyes of the subscribers of the telecom operator and for this we need to develop a decision cycle of a customer based on which we can focus primarily on the customers whose churn can be prevented like unsatisfied customers or conditional churner.
A telecom company wants to overhaul their business strategy. In order to do so the sales manager has requested three specific details: 1. Summarizing the profile of a churner compared to the entire database. 2. Churn risk associated with each customer. 3. Major factors leading to churn 4. Critical Segment of Churning customers 5. Interactive dashboard for each customer displaying the necessary details.
Requirement Gathering and Documentation
As a business analyst the first step would be to gather all the requirements necessary for the analysis and document it. Documentation is the most critical part of a project development as it is the foundation of all further project deliverables describing what inputs and outputs are associated with each process function. I have created a Business Requirement documentation for our business case as shown below. (PS: Names are changed due to confidentiality issues).
Data Summary and Exploratory Analysis
Data Acquisition plays a crucial role in the final result of our analysis. As a business analyst, it is of paramount importance that we know what we need. Based on the business case and the business requirement we filter out the required data.
The provided dataset consists of the information below:
Data Summary and Exploratory Analysis
After data acquisition, the raw data needs to be cleaned and transformed before we can try and draw any insights from it. We can use multiple platforms to do so, but I have implemented the process using Power BI.
The steps involved in Data Pre Processing are:
1. Handling Missing Values: Based on the number of missing values, we can choose different strategies to deal with them.
2. Data Validation: We need to check the data types of each column and make sure they are compatible with our business understanding.
Once the data source is connected we have to perform these cleaning and transformation steps in Power Query Editor. Based on the raw file I have created two different tables,
1. Complete Subscriber database.
2. Complete Churner Database.
This is because one of the customer's demand is to compare the profiles of a churners with the total subscribers.
1. Handling Missing Values: In our dataset we have only 11 rows with missing values out 7043, so deleting those rows won't have any significant impact on our result. The blank rows are removed by choosing the remove empty option in data filtering.
Now after analyzing the dataset we can see that all the data fields are in correct format. Once the Data transformation steps are completed we will Load the cleaned data in our model using the close and apply option in Power Query Editor so that we can create a schema diagram.
In order to analyze the churning risk of a subscriber we need to create models to predict whether a subscriber will churn or not based on its accuracy. This can be done using Machine Learning techniques. Here, I have used Logistic recognition to do the modeling. Logistic regression models need to be scripted in programming languages like R or Python. But Power BI will help us in creating models as well. Power BI can use Python scripts to create prediction tables and probability score associated with it.
This can be done by the following steps:
1. Get Data 2. Choose Python Script 3. Paste the script required for modelling
After running above script, dataset will be loaded .
The Python scripting can easily be done using pre-written code snippets by adjusting the parameters and variables according to our use case and understanding the concept of Logistic regression.
Final Code that we will run :
After running above script you will get below output :
We can then use Power Query Editor to merge the variables from cleaned initial data and the prediction values and then can create new variables to show the Prediction Probabilities.
In order to get the Risk Category we use conditional column as
IF Probability of churn less than 40% then non risky,
between 40% to 60% : Low Risky,
60% to 80%: Risky,
Greater than 80%: High risky
The final table looks like this
Once the dataset is loaded in Power BI Desktop, we have to create the schema diagram using the star schema mechanism. The Customer ID field of both the Transformed and Predicted table are connected on a one to one relationship as shown below.
Once the data modelling is done we have to create certain calculated column which would allow us to better depict the Customer ID in the Customer profile reporting. Here the first 4 characters of Customer ID is extracted and a new column is created for better visual representation.
ID = LEFT('Telco-Churn-Analysis-Predictions'[customerID], 4)
We will start the exploratory analysis and dashboarding together in Power BI.
As per the client's demands and based on the acceptance criteria we have created six separate dashboard namely Home, Summary, Churn Reasons, Churn Factors, Customer Details and Ask a question These dashboard together form the Customer Churn Analysis Report.
The background image as well as the layout customization plays a vital role in increasing the aesthetic appeal of any dashboard or visual. Power BI has certain limitation in this regard and hence we use a graphic designing tool called Snappa.
The Home Page depicts all the dashboard names as well as buttons devised for easier page navigation. This helps the end user to get to the specific dashboard and get the required information.
The Summary dashboard contains the visuals showing the relationship of all the acquired user data with its effect on churning. Here, we perform the actual exploratory data analysis to get an idea of some peculiar features of a churner when compared with the entire subscriber base.
We have created a subdivision in our dashboard comparing certain characteristics of churners and all the subscribers. We have performed univariate and bivariate analysis to extract the insights from the subscriber data. The salient features of the dashboard are:
1. Card Visual to depict total number of subscriber and the churner. The data shows that roughly 27% of the subscriber base have churned.
2. Demographic comparison of all subscribers and churners are made like Gender division of all subscriber and churner, Senior citizen categorization, Partner categorization as well as the tenure of a subscriber and churner with the organization using Donut Chart, Funnel Chart and Bar chart.
From the visualization, we can conclude that:
2.1: Gender has no impact on churning.
2.2: Subscribers in the tenure bucket of 0-20 i.e. NEW customers are churning much more as compared to long term customer.
2.3: Subscribers who do not have partners are 40% more likely to churn as compared to partnered subscriber.
2.4: Higher percentage (around11% more) of Senior citizen are in the churner base then in the subscriber base.
3. In the Phone Service and usage data we can infer that:
3.1 Subscribers with no online security are 30 % more likely to churn.
3.2: Subscribers without any tech support assistance are 32% more likely to churn.
3.3: Subscribers on fiber optics are 25 % more likely to churn as well.
4. In the Contract and Payment data we can infer that:
4.1 Subscribers on monthly plan form 90% of all the churners and are very highly likely to churn as compared to just 55% of all subscriber.
4.2: Subscribers paying through electronic cheques are 24% more likely to churn.
4.3: Churner's average monthly revenue is $10 more than monthly ARPU.
3. Churn Reasons:
The Churn Reason dashboard contains the categories of a churner and the revenue details associated with them. This dashboard satisfies the client's demands of categorizing churners based on their likelihood of churn. We have created a custom categories based on the churn probabilities of each customer and then binned them in Non Risky, Less risky, Risky and High Risky customers. The visual depicts:
3.1: Total no of Risky customer is 2057.
3.2: Average risk score(percentage probability of churning) is 26.57.
3.3: Revenue at risk because of these customer is $2.42 Million.
3.4: Of all the risky subscribers, 692 are highly risky and requires immediate attention followed by 1365 risky.
3.5 Out of the total $2.42 million at risk, Risky subscribers contribute around $2.12 million.
These are very important and business driving insights which will help the client to focus on the most critical subscriber base and also develop strategies to minimize their at-risk revenue.
4. Churn Factors:
The most important aspect of a business analysis project is to find the root cause of the client's problem. In our project, we have to determine the churn factors. Normally, determining the actual factors responsible for churn is a tedious data science challenge. However Power BI takes care of this issue for us and we can find out the churn factors using the Key Influencer visual.
Key Influencers visual is a AI driven visual that helps to find and establish the factors that drive a metric based on their relative importance. It helps to understand the elements which influence the specified target variables. We use the churn field as the 'analyze by' and all the other fields in 'explain by' option which explains the churn (Yes/No) based on the subscriner data. In the visual below it can be observed that
1.Being in Monthly Contract increases the likelihood of churning by 6.3 times.
2. No Online Security increases the likelihood of churning by 3.63 times.
3. No tech Support increases the likelihood of churning by 3.51 times.
... and many more such factors can be observed.
Another very important visual offered by Power BI for AI driven solution is the Top Segment visual. The top segment visual creates certain segments of customer i.e. customer clusters satisfying a set of criteria.
It also shows the churn likelihood of customers in these segemnt when compared with the entire churner base.
In our dashboard, Power BI has created five such segment.
1. The first Segment consist of people whose:
2. The second Segment consist of people whose:
3. The third Segment consist of people whose:
4. The fourth Segment consist of people whose:
5. The fifth Segment consist of people whose:
This visual makes it tremendously easy for the client to target their different business strategies at diffent sections of subscriber resulting in the least amount of churn and most return on investment.
5. Customer Details:
The Customer Details dashboard acts as a Customer Profile portal with all the necessary details of the customer like:
The dashboard also contains the Risk Category, Total revenue and Churn index(percentage probability of churning)
The visual gives a perfect snapshot of the customer details with filters based on ID. This will be very useful for a Incident Mangement team as well as service evaluation teams.
6.Ask a question:
The 'Ask a question' visual comes in handy during a business evaluation meeting or a strategy development meetings. This visual gives the details of churner based on any independant data field involving any number of calculations.
Since not every visual can be depicted in a limited-size dashboard, the 'Ask a question' visual develops the necessary graph as and when it is needed.
Once the final report is prepared, we publish the dashboard to Power BI service and schedule automated data refresh based on the client's specification.
All the tasks mentioned in the BRD has been completed as per the specified acceptance criteria. The project is then evaluated and continuously monitored for any future changes. This concludes a Business Analysis project of Customer Churn Prediction analysis.
Thanks for reading the article, please feel free to post your comments and share the feedback!!!.