Data Science with Statistics

##statistics ##datascience ##mean ##median ##ml

Kriti Sinha Jan 13 2021 · 1 min read
Share this

“Data Scientist is a person who is better at statistics than any programmer and better at programming than any statistician.” - Josh Wills

with this blog you can easily learn basic of statistics with mean, median , mode, variance and standard deviation with different methods

what is statistics ?

It is a study of data set to organize, summarize ,analyse and draw a conclusion

In any data science project, data helps us to analyse the initial level of insight.

To analyse the data statistics play major role with its different feature and methods.

Types of Statistics

1. Descriptive

2. Inferential

Descriptive

1. It helps us to organize and summarize data using numbers and graphs to look for a pattern in the data set.

2. Measures of Central tendency: Mean, Median, Mode.

3. The measure of Variability: Standard Deviation, Variance & Range

Inferential

1. To make an inference or draw a conclusion from the population, sample data is used.

2. Using probability to determine Confidence Interval & margin of error to make is correct.


#Problem Statement 1:
#The marks awarded for an assignment set for a Year 8 class of 20 students were as
#follows:
#6 7 5 7 7 8 7 6 9 7 4 10 6 8 8 9 5 6 4 8

import statistics import math import pandas as pd import numpy as np # mean is avarage of given number #method 1 #mean=(sum of total number)/count of total number #(6+7+5+7+7+8+7+6+9+7+4+10+6+8+8+9+5+6+4+8)=137 #137/20=6.85 # So, mean of 6 7 5 7 7 8 7 6 9 7 4 10 6 8 8 9 5 6 4 8 = 6.85 #method 2 num = [6 ,7, 5, 7, 7, 8, 7, 6, 9, 7, 4, 10, 6, 8, 8, 9, 5, 6, 4, 8] n = len(num) total = sum(num) mean = total / n mean

6.85

#method 3 statistics.mean([6 ,7, 5, 7, 7, 8, 7, 6, 9, 7, 4, 10, 6, 8, 8, 9, 5, 6, 4, 8])

6.85

#Median median is the middle number in a set of given numbers #method 1 num = [6 ,7, 5, 7, 7, 8, 7, 6, 9, 7, 4, 10, 6, 8, 8, 9, 5, 6, 4, 8] n = len(num) num.sort() if n % 2 == 0: median1 = num[n//2] median2 = num[n//2 - 1] median = (median1 + median2)/2 else: median = n_num[n//2] print("Median is: " + str(median))

Median is: 7.0

#method 2 statistics.median([6 ,7, 5, 7, 7, 8, 7, 6, 9, 7, 4, 10, 6, 8, 8, 9, 5, 6, 4, 8])

7.0

#mode is the number that occurs most time within a set of numbers. #method 1 from collections import Counter # list of numbers to calculate mode num = [6 ,7, 5, 7, 7, 8, 7, 6, 9, 7, 4, 10, 6, 8, 8, 9, 5, 6, 4, 8] n = len(num) data = Counter(num) get_mode = dict(data) mode = [k for k, v in get_mode.items() if v == max(list(data.values()))] if len(mode) == n: get_mode = "No mode found" else: get_mode = "Mode is: " + ', '.join(map(str, mode)) print(get_mode)

Mode is: 7

#method 2 statistics.mode([6 ,7, 5, 7, 7, 8, 7, 6, 9, 7, 4, 10, 6, 8, 8, 9, 5, 6, 4, 8])

7

#Standard deviation is the measure of how spread out numbers are #vale of sum of each (number-mean)square (x-mean)square # 6-6.85= -0.85 square of -0.85= 0.7225 #varience= sum of each ((x-mean)square) like we get for first number above for 6 #standard deviation =square root of varienace #method 1 sr = pd.Series([6 ,7, 5, 7, 7, 8, 7, 6, 9, 7, 4, 10, 6, 8, 8, 9, 5, 6, 4, 8]) print(sr) answer= sr.std() print("The standard deviations of the given numbers are:") print (answer)

0      6
1      7
2      5
3      7
4      7
5      8
6      7
7      6
8      9
9      7
10     4
11    10
12     6
13     8
14     8
15     9
16     5
17     6
18     4
19     8
dtype: int64
The standard deviations of the given numbers are:
1.6311119875071343


#method 2 from statistics import variance from fractions import Fraction as fr num = [6 ,7, 5, 7, 7, 8, 7, 6, 9, 7, 4, 10, 6, 8, 8, 9, 5, 6, 4, 8] varieanceofnum= (variance(num)) standarddeviation = varieanceofnum ** 0.5 standarddeviation

1.6311119875071343


#method 3 statistics.stdev([6 ,7, 5, 7, 7, 8, 7, 6, 9, 7, 4, 10, 6, 8, 8, 9, 5, 6, 4, 8])

1.6311119875071343




Comments
Read next