Monday, 3 September 2018

Statistical Functions - Central Tendency and Variation in R language

Irawen September 03, 2018 R No comments

Descriptive statistics :-

First hand tools which gives first hand information.

Central tendency of data (Mean, median, mode, geometric mean, harmonic mean etc.)
Variation in data (variance, standard deviation, standard error, mean deviation etc.)

Central tendency of the data

Gives an idea about the mean value of the data
The data is clustered around what value?

Data: 𝒳1, 𝒳2, ......,𝒳n
x : Data vector

mean (x)

prod (x) ^ (1/length (x) )
(length (x) is equal to the number of elements in x)

Median :-

Value such that the number of observation above it is equal to the number of observation below it.
median (x)

Example :-

Variability

spread and scatterdness of data around any point, preferably the mean value.

Data set 1: 360, 370, 380
mean = (360 + 370 + 380) /3 = 370
Data set 2: 10, 100, 1000
mean = (10 + 100 + 1000) /3 = 370

How to differentiate between the two data sets?

x : data vector
var (x)
positive square root of variance : standard deviation
sqrt (var (x) )

Variance
Another variant,

If we want divisor to be n, then use
   ( (n-1) /n) * var (x)
where n = length (x)

Range:
maximum(x1, x2, ....., xn) - minimum(x1, x2, ...., xn)
max (x) - min (x)

Interquartile range:
Third quartile (x1, x2, ..., xn) - First quartile (x1, x2, ...., xn)
   IQR (x)

Quartile deviation:
[Third quartile (x1, x2, ..., xn) - First quartile (x1, x2, ..., xn)]/2
   = Interquartile range/2
IQR (x) /2

Example :-