# R for Everyone: Advanced Analytics and Graphics

Using the open source R language, you can build powerful statistical models to answer many of your most challenging questions. R has traditionally been difficult for non-statisticians to learn, and most R books assume far too much knowledge to be of help. R for Everyone, Second Edition, is the solution.

Jared P. Lander is the Chief Data Scientist of Lander Analytics, a New York-based data science firm that specializes in statistical consulting and training services, the organizer of the New York Open Statistical Programming Meetup—the world’s largest R meetup—and the New York R Conference and an adjunct professor of statistics at Columbia University. With an M.A. from Columbia University in statistics and a B.S. from Muhlenberg College in mathematics, he has experience in both academic research and industry. Very active in the data community, Jared is a frequent speaker at conferences, universities and meetups around the world. His writings on statistics can be found and been featured in publications such as Forbes and the Wall Street Journal.

# Basics Calculations in R Language

- The assignment operators are the left arrow with dash <- and equal sign =.
> x <- 20 assigns the value 20 to x.
> x = 20 assigns the value 20 to x.
Initially only <- was available in R.

> x = 20 assigns the value 20 to x.
> y = x + 2 assigns the value 2 * x to y.
> z = x + y assigns the value x + y to z.

# : The character # marks the beginning of a comment. All characters until the end of the line are ignored.
> # mu is the mean
> # x <- 20 is treated as comment only.

Capital and small letters are different.
> x <- 20 and > x <- 20 are different
The command c (1,2,3,4,5) combines the numbers 1,2,3,4 and 5 to a vector.

R as a calculator

> 2 + 3              # Command
[1] 5                  # Output

> 2 * 3              # Command
[1]  6                 # Output

Multiplication and Division x * y , x/y

> c ( 2,3,5,7) * 3
[1]   6 9 15 21

Addition and Subtraction x + y , x - y

> (2,3,5,7) + 10
[1] 12 13 15 17

# Business Analytics with R Language

Definition
"Study of business data using statistical technique and programming for creating decision support and insights for achieving business goals"

Why uses it ? How?
- Across Domain
* Dashboard
* Models
- Across A Company

Who creates it ? How?

→ Business Intelligence is a set of theories, methodologies, processes, architecture, and technologies that transform raw data into meaningful and useful information for business purposes.

What is Data Science ?
→ Science of Studying Data:

# PHP: The Complete Reference

PHP is a server-side programming language mainly used for web development and is also used as a general purpose programming language. It has become a rage in the Internet world. PHP: The Complete Reference, as the name suggests is a complete reference guide to the widely popular PHP.
This book deals with explaining how to personalize the PHP work space, define operators and variables, manipulate strings and arrays and the way in which one can use HTML. It also covers details on how to access database information, track client-side preferences using cookies, execute FTP and e-mail transactions and publish your applications to the Web. Additionally, this book deals in PHP's next generation Web 2.0 design features including AJAX, XML and RSS.
One can also learn to use PHP's object-oriented tools to build blogs, guest books and feedback pages with server-side file storage. PHP: The Complete Reference is a step by step guide to mastering PHP. Starting from the basic to the most advanced level, this book covers each aspect in great detail. This book was published by McGraw-Hill Education on 30 November 2007 and is available in paperback.
Key Features
• Detailed coverage of PHP's next-generation Web 2.0 design features, including AJAX, XML and RSS is included.

Your One-Stop Guide to Web Development with PHP--Covers PHP 5.2
Build dynamic, cross-browser Web applications with PHP--the server-side programming language that's taken the Internet by storm. Through detailed explanations and downloadable code examples, this comprehensive guide shows you, step-by-step, how to configure PHP, create PHP-enabled Web pages, and put every advanced development tool to work.
PHP: The Complete Reference explains how to personalize the PHP work space, define operators and variables, manipulate strings and arrays, deploy HTML forms and buttons, and process user input. You'll learn how to access database information, track client-side preferences using cookies, execute FTP and e-mail transactions, and publish your applications to the Web. You'll also get in-depth coverage of PHP's next-generation Web 2.0 design features, including AJAX, XML, and RSS.
• Install PHP and set up a customized development environment
• Work with variables, operators, loops, strings, arrays, and functions
• Integrate HTML controls, text fields, forms, radio buttons, and checkboxes
• Accept and validate user-entered data from Web pages
• Simplify programming using PHP's object-oriented tools
• Build blogs, guest books, and feedback pages with server-side file storage
• Write MySQL scripts that retrieve, modify, and update database information
• Set cookies, perform FTP transactions, and send e-mails from PHP sessions
• Build AJAX-enabled Web pages
• Draw graphics on the server

About the Author: Steven Holzner is an author of over 100 technology books, many of which are bestsellers. His works mainly pertain to online applications and components of Ajax including JavaScript, XML, browser objects and Web services. Steven also teaches programming classes at Fortune 500 companies and has also been a faculty at the Cornell University and MIT. His well known works include books like Ajax For Dummies and the Ajax Visual Blueprint. Steven has also worked as a contributing editor for PC magazine.

# Tasks in Data Mining for R Language

Anomaly Detection
→ Identification of unusual patterns, outliers, which help us in understanding the variation in data.
Example:-

Association Rule Mining
→ Also referred to as market basket analysis, this method is used for discovering interesting "association" patterns among the variables.
Example :- The beer-diaper syndrome

Clustering
→ Identifying groups/classes in data which are similar to each other.
The similarity inside the "cluster" is high and between the "clusters" is low.

Classification
→ Classification is the process of identifying to which category does an observation belong.
Example:-

Regression
→ With the help of regression, we can identify the extent of relationship among variables.
Understanding how the "dependent" variable varies with respect to the variation in "independent" variable.

Who uses R?

1. FACEBOOK :- For behavior analysis related status updates and profile pictures.
3. TWITTER :- For data visualization and semantic clustering.

• Data Importing :- Import the "Houses for sale" dataset.
• Data Pre-processing :- Understand the structure of data and find correlation between different data entities.
• Data Mining :- Use Linear Regression to predict the rates of houses.
• Pattern Evaluation :- Evaluate which model fits better for the dataset.

# Data Mining using R Language

Why Data Mining ?

- I have this financial data with me, I need to find out if any of the transactions are fraudulent.
- I have this email data with me, I have need to check how many of the mails are spam.
- I have this telecom data with me, I need to find out how many of the customers will churn out.

Data Mining to the rescue!
How do I obtain knowledge from this data?
→ Hey, you can use data mining technique to find interesting insights from the data.

What is Data Mining?
→ Data Mining is the computing process of discovering patterns in large datasets involving methods at the intersection of machine learning, statistic, and database systems.

How should the Mined Information be?

New :- The extracted information should give us new patterns, relationships among the data entities.

Correct :- As everything that glitters is not gold, similarly, all the mined information might not be correct/valid. The mined information needs to be evaluated for it's correctness before we use it for any other purpose.

Potentially useful :- As we extract useful products such as petrol, diesel etc. from crude oil, similarly, the mined information from raw data should be useful and relevant to us.

Knowledge Discovery in database

1. Data Selection :- a) Data from
b) Data Warehouse
c) Target Data

2. Data Pre-Processing :-
a) The selected data must be appropriate for mining tasks
b) Simple operations such as summarizing, aggregation, normalization can be done to transform/consolidate the data such that it is suitable for mining.

3. Data Mining :-
a) This is the most important step in KDD process
b) Intelligent operations such as clustering, classification, regression, and applied in order to extract patterns.

4. Pattern Evaluation :-
Once the data mining technique have been applied, the obtained results need to be evaluated for their accuracy.

5. Knowledge Representation :-
The identified patterns must be represented using simple, anesthetic graphs.

# Data Visualization in R Language

Data visualization helps the organizations unleash the power of their most valuable assets:
- Their data and
- Their people

1. Pie Chart :-
Pie Charts are the best to use when you are trying to compare parts of whole.

2. Bar Chart :-
Bar graphs are used to compare things between different group or to track changes over time.

3. Boxplot :-
Boxplot are used summarize data from multiple source and display the results in a single graph.

4. Histogram :-
Histogram are used to plot the frequency of score occurrences in a continuous data set that has been divided into classes, called bins.

5. Line Graph :-
Line graph are used to track changes over short and long periods of time.

6. Scatter Plot :-
Scatter plot show how much one variable is affected by another.

# Fundamental Concepts of R Language

Variables in R

A variables are nothing but reserved memory locations to store values. This means that when you create a variable you reserve some space in memory.

Data Operators

1. Arithmetic Operators
2. Assignment Operators
3. Relational Operators
4. Logical Operators
5. Special Operators

1. Arithmetic Operator

(" + ") → Add two operands or unary plus.
>> 2+3
5
>>+2
(" - ") → Subtract two operands or unary subtract.
>> 3-1
2
>>-2
(" * ") → Multiply two operands
>> 2*3
6
(" / ") → Divide left operand with the right and results is in float.
>> 6/3
2.0
(" ^ ") → Left operand raised to the power of right
>> 2^3
8
(" %% ") → Remainder of the division of left operand by the right
>>5%%2
1
(" %/% ") →Division that results into whole number adjusted to the left in the number line.
>> 7%/%3
2

2. Assignment Operators

(" = ") →  x = <right operand>
>>x=5
>>x
5
(" <- ") → x <- <right operand>
>>5<-15
>> x
15
(" <<- ") → x<<-  <right operand>
>> x<<-2
>> x
2
(" -> ") → <left operand> -> x
>> 25 -> x
>> x
25

3. Relational Operators

(" > ") → True if left operand is greater than the right
>> 2>3
False
(" < ") → True if left operand is less than the right
>> 2>3
True
(" == ") → True if left operand is equal to right
>> 2==2
True
(" != ") → True if left operand is not equal to the right
>> x >>=2
>> print(x)
1
(" >= ") → True if left operand is greater than or equal to the right operand
>> 2 >=3
False
(" =< ") → True if left operand is less than or equal to the right operand
>> 2 =<3
True

4. Logical Operators

(" & ") → Returns x if x is False , y otherwise
>> 2 &3
3
(" | ") → Returns y if x is False, x otherwise
>> 2|3
2
(" ! ") → Returns True if x is True, False otherwise
>> !1
False

5. Special Operators

(" : ") → It creates the series of numbers in sequence for a vector
>> x <- 2:8
>> x
[1] 2 3 4 5 6 7 8
(" %in% ") → This operator is used to identify if an element belongs to a vector
>> x <-2:8
>> y <- 5
>>y %in% x
True

Data Type
We do not need to declare a variables before using them.

Vectors :-
A Vector is a sequence of data elements of data elements of the same basic type.
Example :
vtr = (1,3,5,7,9)
or
vtr <- (1,3,5,7,9)
There are 5 Atomic vectors, also termed as five classes of vectors.

Lists :-

Lists are the R objects which contain elements of different types like -numbers, strings, vectors and another list inside it.
> n = c(2,3,5)
> 5 = c("aa", "bb", "cc", "dd", "ee")
>x = list(n, s, TRUE)

Arrays :-

Arrays are the R data objects which can store data in more than two dimensions.
It takes vectors as input and uses the values in the dim parameter to create an array.
vector 1 <- c(5,9,3)
vector2 <- c(10,11,12,13,14,15)
result <- array(c(vector1, vector2), dim = c(3,3,2))

Matrices :-

Matrices are the R objects in which the elements are arranged in a two-dimensional rectangular layout.
A Matrix is created using the matrix() function.
matrix(data, nrow, ncol, byrow, dimnames)

- data is the input vector which becomes the data elements of the matrix.
- nrow is the number of rows to be created
- ncol is the number of columns to be created.
- byrow is a logical clue. If TRUE then the vector elements are arranged by row.
- dimname is the names assigned to rows and columns.

Factors:-

Factors are the data objects which are used to categorize the data and store it as levels
They can store both strings and integers.
They are useful in data analysis for statistical modeling.

data <- c("East","West","East","North","North","East","West","West","East")
factor_data <- factor(data)

Data Frames :-

A data frame is a table or two-dimensional array-like structure in which each column contains values of one variable and each row contains one set of values from each column.
emp_id = c(1:5),
emp_name = c("Rick","Dan","Michelle","Ryan","Gary"),
salary = c(623.3,515.2,611.0,729.0,843.25),
emp.data <- data.frame(emp_id, emp_name, salary)

Flow Control Statements

if → It evaluates a single condition
if .. else → It evaluates a group of condition and selects the statements
Switch → It checks the different known possibilities and selects the statements

Loops :-

Repeat → Repeat things until the loop condition is true
While → Repeat things until the loop condition is true
For → Repeat things till the given number of times.

# Introduction of R Language

Why do we need Analytics ?

Data analytics helps organizations harness their data and use it to identify few opportunities.
- Cost reduction
- Better marketing & product analysis
- Organization analysis
- Faster, better decision marketing

→ Business Analytics examines large and different types of data to uncover hidden patterns, correlations and other insights.

What is Data Visualizations ?

→ Visualization allows us visual access to huge amounts of data in easily digestible visuals.

→ Well designed data graphics are usually the simplest and at the same time, the most powerful.

Why R ?

→ Programming and Statistical Language
Apart from used as a statistical language, it can also be used a programming language for analytical purposes.

→ Data Analysis and Visualization
Apart from being one of the most dominant analytics tools, R also is one of the most popular tools used for data visualization.

→ Simplest and Easy to Learn
R is a simple and easy to learn, read and write.

→ Free and Open Source
R is an example of a FLOSS (Free/Libre and Open Source Software) which means one can freely distribute copies of this software, read it's source code, modify it, etc.

# Java - The Complete Reference by Herbert Schildt

This book is a comprehensive guide to the Java language, describing its syntax, keywords and fundamental programming principles. Significant portions of the Java API library are also examined. This book is for all programmers, whether you are a novice or an experienced pro. The beginner will find its carefully paced discussions and many examples especially helpful.