Statistics is a science of collecting, organizing, presenting, analyzing, and interpreting the data in an effective way to get some insight into data. Statistics is a collection of tools that you can use to get answers to important questions about data. You can use descriptive statistical methods to transform raw observations into information that you can understand and share.
Why is Statistics Important to Machine Learning?
Problem Framing: Requires the use of exploratory data analysis and data mining.
Data Understanding: Requires the use of summary statistics and data visualization.
Data Cleaning. Requires the use of outlier detection, imputation, and more.
Data Selection. Requires the use of data sampling and feature selection methods.
Data Preparation. Requires the use of data transforms, scaling, encoding, and much more.
Model Evaluation. Requires experimental design and resampling methods.
Model Configuration. Requires the use of statistical hypothesis tests and estimation statistics.
Model Selection. Requires the use of statistical hypothesis tests and estimation statistics.
Model Presentation. Requires the use of estimation statistics such as confidence intervals.
Model Predictions. Requires the use of estimation statistics such as prediction intervals.
Types of data:
When you collect quantitative data, the numbers you record represent real amounts that can be added, subtracted, divided, etc. There are two types of quantitative variables: discrete(Counts of individual items or values.) and continuous(Measurements of continuous or non-finite values.).
Categorical variables represent groupings of some kind. They are sometimes recorded as numbers, but the numbers represent categories rather than actual amounts of things.
There are three types of categorical variables: binary(Yes/no outcomes.), nominal(Groups with no rank or order between them.), and ordinal variables(Groups that are ranked in a specific order.).