DATA CLEANING AND FORMATTING
Data cleansing or data cleaning is the process of identifying and removing inaccurate records from a dataset,table or database and refers to recognising unfinished, unreliable, inaccurate or non-relevant parts of the data and then restoring, remodelling , or removing the dirty and crude data.
EXPLORATORY DATA ANALYSIS
Exploratory data analysis refers to the critical process of performing initial investigations on data to discover patterns, spot anomalies, test hypotheses and check assumptions with the help of summary statistics and graphical representations.
It is an approach for summarizing, visualizing, and becoming intimately familiar with the important characteristics of a dataset.
Feature engineering and selection
The well-known concept of "garbage in-garbage out" applies 100% to any task in ML.
Feature extraction and Feature engineering:
Transformation of raw data into features suitable for modelling.
Feature Transformations: Transformation of data to improve the accuracy of the algorithm.
Feature Selection: Removing unnecessary features.
Compare multiple algorithms
It is important to compare the performance of multiple different machine learning algorithms consistently.
When you work on an ML project, you often end up with multiple good models to choose from, each model will have different performance characteristics.
These are not model parameters and they can't be directly trained from the data.
In ML, hyperparameter optimization or tuning is the problem of choosing a set of optimal hyperparameters for a learning algorithm.
It is a parameter whose value is used to control the learning process.
Evaluate the models
Evaluating helps to find the best model the represents our data and how well the chosen model will work in the future.
Evaluating model performance with the data used for training is not acceptable in ML because it can easily generate over-optimistic and overfitted models.
There are two methods of evaluating models in ML:-
Deploy the model
Model interpretability helps debug the model by analyzing what the model really think is important.
The purpose of deploying your model is so that you can make the predictions from a training ML model by making it available to the outside world.
Conclusions and documentation
This helps in reproducibility and also ensures successful project completion.
Documentation helps to tell the narrative for decisions made.
It is important to record information that can help support the proper treatment plan and the reasoning for such services.