Monday 10 September 2018

Statisticsl Functions - Correlation and Example in R Language

Irawen September 10, 2018 R No comments

Descriptive Statistics :

First hand tools which gives first hand information.

Central tendency of data
Variation in data
Structure and shape of data tendency
Relationship study (correlation coefficient, rank correlation, correlation ratio, regression etc.)

Bivariate Data

Quantitative measures provide quantitative measure of relationship.

Graphical plots provide first hand visual information about the nature and degree of relationship between two variables.

Relationship can be linear or nonlinear.

x, y : Two data vectors

Data x = (x1,x2,....,xn) y = (y1,y2,...,yn)

cov (x,y) : covariance between x and y

var (x) : Variance of x

Correlation coefficient

Measures the degree of linear relationship between the two variables.

cor (x,y) : correlation between x and y

Example :-

Covariance:

Example :-

Correlation coefficient:
Exact positive linear dependence

> cor ( c(1,2,3,4) , c(1,2.3,4) )
[1] 1

Data on Daily water Demand

Statistical Function bivariate three dimensional plot in R Language

Irawen September 10, 2018 R No comments

Bivariate Plot :

Provide first hand visual information about the nature and degree of relationship between two variables.

Relationship can be linear or nonlinear.

We discuss several types of plots through example.

Scatter Plot :

plot command:
x, y : Two data vectors
plot (x,y)
plot (x, y, type)

Get more details from help: help ("type")
Other options:

main             an overall title for the plot.
suba              sub title for the plot.
xlaba             title for the x axis.
ylaba             title for the y axis.
aspthe           y/x aspect ratio.

Example :

Daily water demand in a city depends upon weather temperature.

We know from experience that water consumption increase as weather temperature increase.

Date on 27 days is collected as follows:
Daily water demand (in million liters)
water <- c (33710, 31666, 33495, 32758, 34067, 36069, 37497, 33044, 35216, 35383, 37066, 38037, 38495, 39895, 41311, 42849, 43038, 43873, 43923, 45078, 46935, 47951, 46085, 48003, 45050, 42924, 46061)

Temperature (in centigrade)
temp <- c (23,25,25,26,27,28,30,26,29,32,33,34,35,38,39,42,43,44,45,45,.5,
45, 46,44,44,41,37,40)

Plot command:

x, y : Two data vectors
Various type of plot are possible to draw.

plot (x, y)

plot (water, temp)

plot (water, temp, "1")

"1" for lines,

plot (water, temp, "0")

"0" for both 'overplotted'

plot (water, temp, "h")

"h" for 'histogram' like
(or 'high-density')
vertical lines

plot (water, temp, "s")

"s" for stair steps.

Smooth Scatter plot

scatter.smooth (x, y) provides scatter plot with smooth curve

Example: scatter.smooth (water, temp)

Matrix Scatter plot

The command pairs ( ) allows the simple creation of a matrix of scatter plots.

> pairs ( cbind (water, temp) )

3 Dimensional Scatter Plot:

Scatterplot3d ( ) Plots a three dimensional (3D) point cloud

> install.packages ("sccatterplot3d")

> library (scatterplot3d)

> setwd ("c: /RCourse/")

> data3d <- read.csv ("data-age-height-weight.csv")

> data3d

> scatterplot3d (data3d [, 1: 3])

More functions

contour ( ) for contour lines
dotchart ( ) for dot charts (replacement for bar charts)
image ( ) pictures with colors as third dimension
mosaicplot ( ) mosaic plot for (multidimensional) diagrams of of categorical variables (contingency tables)
persp ( ) perspective surfaces over the x-y plane

Association Rule Mining in R Language

Irawen September 09, 2018 R No comments

Association Rule Mining

In idea mining, Association Rule Learning is a popular and well researched method for discovering interesting relations between variables in large database.
It is intended to identify strong rules discovered in database using different measures of interests.
The rule found in the sales data of a supermarket would indicated that if a customer buys onions and potatoes together, he or she is likely to also buy hamburger meat.
Such information can be used as the basis for decisions about marketing activities such as, e.g., promotional pricing or product placements.

Constraints on below measures are used to select useful and best rules of all rules by R. After analyzing these values for all the rules, best rules for WB have been obtained.

E.g. :- Consider rule: {Jack the Ripper (1988)} => {Strawberry Blonde}
Let Jack the Ripper =X and Strawberry Blonde =Y, Then

Support (X U Y) = No of transactions involving both Jack the Ripper and Strawberry Blonde/Total no of transactions.

Confidence= No of transactions where Strawberry Blonde was also bought when Jack the Ripper was bought/ No of transactions where Jack the Ripper was bought

Lift = Ratio of observed support to the expected support

Statistical Function-Boxplots, Skewness and Kurtosis in R Language

Irawen September 07, 2018 R No comments

Summary of observation

In R, quartiles, minimum and maximum values can be easily obtained by the summary command

summary (x) x: data vector
It gives information on

minimum
maximum
first quartile
second quartile (median) and
third quartile.

Boxplot

Boxplot is a graph which summarizes the distribution of a variable by using its median, quartiles, minimum and maximum values.

boxplot ( ) draws a box plot

Descriptive Statistics:

First hand tools which gives first hand information.

Structure and shape of data tendency (symmetricity, skewness, kurtosis etc.)
Relationship study (correlation coefficient, rank correlation, correlation ratio, regression etc.)

Skewness

Measures the shift of the hump of frequency curve.
Coefficient of skewness based on values x1,x2,....,xn.

Kurtosis

Measures the peakedness of the frequency curve.
Coefficient of kurtosis based on values x1,x2,...,xn.

Skewness and Kurtosis

First we need to install a package 'moments'
> install.packages ("moments")
> library (moments)
skewness ( ) : computes coefficient of skewness
kurtosis ( ) : computes coefficient of kurtosis

Basics Calculations: Matrix Operations in R Language

Irawen September 05, 2018 R No comments

In R, a 4 𝗑 2-matrix X can be created with a following command:

> x <- matrix (nrow=4,   ncol=2, data=c(1,2,3,4,5,6,7,8) )

> x
                [,1]       [,2]
[1,]             1          5
[2,]             2          6
[3,]             3          7
[4,]             4          8

Properties of a Matrix

We can get specific properties of a matrix:

> dim (x)         # tells the
[1]   4   2             dimension of matrix

> nrow (x)       # tells
[1] 4                    the number of rows

> ncol (x)        # tells
[1] 2                  the number of columns

> mode (x) # Informs the type or storage mode of an object, e.g., numerical, logical etc.

[1] "numeric"

attributes provides all the attributes of an object

> attributes (x) # Informs the dimension of matrix

$dim [1] 4 2

Help on the Object "Matrix"

To know more about these important objects, we use R-help on "matrix".

> help ("matrix")

matrix package:base R Documentation

Matrices

Description :

'matrix' creates a matrix from the given set of values.

'as.matrix' attempts to turn its argument into a matrix.

'is.matrix' tests if its argument is a (strict) matrix. It is generic: you can write methods to handle specific classes of objects, see Internal Methods.

Then we get an overview on how a matrix can be created and what parameters are available:

Usage :

matrix(data [= NA, nrow = 1 , ncol = 1, byrow = FALSE, dimension = NULL)

as.matrix (x)

is. matrix (x)

Arguments :

data: an optional data vector.

nrow: the desired number of rows

ncol: the desired number of columns

byrow: logical. If 'FALSE' (the default) the matrix is filled by columns, otherwise the matrix is filled by rows.

dimnames: A 'dimnames' attribute for the matrix: a 'list' of length 2.

x: an R object.

Finally, references and cross-references are displayed...

References :

Becker, R. A., Chambers, J. M. and wilks, A.

R. (1988) _The New S Language_. wadsworth & Books/Cole.

Statistical Functions - Central Tendency and Variation in R language

Irawen September 03, 2018 R No comments

Descriptive statistics :-

First hand tools which gives first hand information.

Central tendency of data (Mean, median, mode, geometric mean, harmonic mean etc.)
Variation in data (variance, standard deviation, standard error, mean deviation etc.)

Central tendency of the data

Gives an idea about the mean value of the data
The data is clustered around what value?

Data: 𝒳1, 𝒳2, ......,𝒳n
x : Data vector

mean (x)

prod (x) ^ (1/length (x) )
(length (x) is equal to the number of elements in x)

Median :-

Value such that the number of observation above it is equal to the number of observation below it.
median (x)

Example :-

Variability

spread and scatterdness of data around any point, preferably the mean value.

Data set 1: 360, 370, 380
mean = (360 + 370 + 380) /3 = 370
Data set 2: 10, 100, 1000
mean = (10 + 100 + 1000) /3 = 370

How to differentiate between the two data sets?

x : data vector
var (x)
positive square root of variance : standard deviation
sqrt (var (x) )

Variance
Another variant,

If we want divisor to be n, then use
   ( (n-1) /n) * var (x)
where n = length (x)

Range:
maximum(x1, x2, ....., xn) - minimum(x1, x2, ...., xn)
max (x) - min (x)

Interquartile range:
Third quartile (x1, x2, ..., xn) - First quartile (x1, x2, ...., xn)
   IQR (x)

Quartile deviation:
[Third quartile (x1, x2, ..., xn) - First quartile (x1, x2, ..., xn)]/2
   = Interquartile range/2
IQR (x) /2

Example :-

Monday 10 September 2018

Statisticsl Functions - Correlation and Example in R Language

Statistical Function bivariate three dimensional plot in R Language

Sunday 9 September 2018

Association Rule Mining in R Language

Friday 7 September 2018

Statistical Function-Boxplots, Skewness and Kurtosis in R Language

Wednesday 5 September 2018

Basics Calculations: Matrix Operations in R Language

Monday 3 September 2018

Statistical Functions - Central Tendency and Variation in R language

Popular Posts

Categories

Followers

Free Courses

Translate

Data Processing Using Python (Free Course)

Courses

Popular Posts

LAND YOUR FIRST JOB IN TECH

Deep Learning

Free Python Books

365 Days Python Coding Challenge

Registration Form for Classes

Cybersecurity for Everyone (Free Course)

Top 10 Python Data Science book

Free Courses

Blog Archive

Popular Posts

Join Us

Free Web Development using Python

My Blog List

Join Us

Monday 10 September 2018

Sunday 9 September 2018

Friday 7 September 2018

Wednesday 5 September 2018

Monday 3 September 2018

Popular Posts

Categories

Followers

Free Courses

Translate

Data Processing Using Python (Free Course)

Courses

Popular Posts

LAND YOUR FIRST JOB IN TECH

Deep Learning

Free Python Books

365 Days Python Coding Challenge

Registration Form for Classes

Cybersecurity for Everyone (Free Course)

Top 10 Python Data Science book

Free Courses

Blog Archive

Popular Posts

Join Us

Free Web Development using Python

Subscribe To

My Blog List

Join Us