# do you know difference between Data Analyst , Data Scientist and Data Engineer?

### Data Analyst

A data analyst sits between business intelligence and data science. They provide vital information to business stakeholders.

Data Management in SQL (PostgreSQL)

Data Analysis in SQL (PostgreSQL)

Exploratory Analysis Theory

Statistical Experimentation Theory

### Data Scientist Associate

A data scientist is a professional responsible for collecting, analyzing and interpreting extremely large amounts of data.

R / Python Programming

### Data Manipulation in R/Python

1.1 Calculate metrics to effectively report characteristics of data and relationships between

features

● Calculate measures of center (e.g. mean, median, mode) for variables using R or Python.

● Calculate measures of spread (e.g. range, standard deviation, variance) for variables

using R or Python.

● Calculate skewness for variables using R or Python.

● Calculate missingness for variables and explain its influence on reporting characteristics

of data and relationships in R or Python.

● Calculate the correlation between variables using R or Python.

1.2 Create data visualizations in coding language to demonstrate the characteristics of data

● Create and customize bar charts using R or Python.

● Create and customize box plots using R or Python.

● Create and customize line graphs using R or Python.

● Create and customize histograms graph using R or Python.

1.3 Create data visualizations in coding language to represent the relationships between

features

● Create and customize scatterplots using R or Python.

● Create and customize heatmaps using R or Python.

● Create and customize pivot tables using R or Python.

1.4 Identify and reduce the impact of characteristics of data

● Identify when imputation methods should be used and implement them to reduce the

impact of missing data on analysis or modeling using R or Python.

● Describe when a transformation to a variable is required and implement corresponding

transformations using R or Python.

● Describe the differences between types of missingness and identify relevant approaches

to handling types of missingness.

● Identify and handle outliers using R or Python.

### Statistical Fundamentals in R/Python

2.1 Perform standard data import, joining and aggregation tasks

● Import data from flat files into R or Python.

● Import data from databases into R or Python

● Aggregate numeric, categorical variables and dates by groups using R or Python.

● Combine multiple tables by rows or columns using R or Python.

● Filter data based on different criteria using R or Python.

2.2 Perform standard cleaning tasks to prepare data for analysis

● Match strings in a dataset with specific patterns using R or Python.

● Convert values between data types in R or Python.

● Clean categorical and text data by manipulating strings in R or Python.

● Clean date and time data in R or Python.

2.3 Assess data quality and perform validation tasks

● Identify and replace missing values using R or Python.

● Perform different types of data validation tasks (e.g. consistency, constraints, range

validation, uniqueness) using R or Python.

● Identify and validate data types in a data set using R or Python.

2.4 Collect data from non-standard formats by modifying existing code

● Adapt provided code to import data from an API using R or Python.

● Identify the structure of HTML and JSON data and parse them into a usable format for

data processing and analysis using R or Python

### Importing & Cleaning in R/Python

3.1 Prepare data for modeling by implementing relevant transformations.
● Create new features from existing data (e.g. categories from continuous data, combining
variables with external data) using R or Python.
● Explain the importance of splitting data and split data for training, testing, and validation
using R or Python.
● Explain the importance of scaling data and implement scaling methods using R or Python.
● Transform categorical data for modeling using R or Python.
3.2 Implement standard modeling approaches for supervised learning problems.
● Identify regression problems and implement models using R or Python.
● Identify classification problems and implement models using R or Python.
3.3 Implement approaches for unsupervised learning problems.
● Identify clustering problems and implement approaches for them using R or Python.
● Explain dimensionality reduction techniques and implement the techniques using R or
Python.
3.4 Use suitable methods to assess the performance of a model.
● Select metrics to evaluate regression models and calculate the metrics using R or Python.
● Select metrics to evaluate classification models and calculate the metrics using R or
Python.
● Select metrics and visualizations to evaluate clustering models and implement them
using R or Python.

### Machine Learning Fundamentals in R/Python

4.2 Demonstrates best practices in production code including version control, testing, and
package development.
● Describe the basic flow and structures of package development in R or Python.
● Explain how to document code in packages, or modules in R or Python.
● Explain the importance of the testing and write testing statements in R or Python.
● Explain the importance of version control and describe key concepts of versioning

### Data Engineer

A data engineer collects, stores, and pre-processes data for easy access and use within an organization. Associate certification is available.

Data Management in SQL (PostgreSQL)

Exploratory Analysis Theory

## Categories

AI (27) Android (24) AngularJS (1) aws (17) Azure (7) BI (10) book (4) Books (118) C (77) C# (12) C++ (82) Course (62) Coursera (180) Cybersecurity (22) data management (11) Data Science (95) Django (6) Downloads (3) edx (2) Engineering (14) Excel (13) Factorial (1) Finance (6) flutter (1) FPL (17) Google (19) Hadoop (3) HTML&CSS (46) IBM (25) IoT (1) IS (25) Java (92) Leet Code (4) Machine Learning (44) Meta (18) MICHIGAN (5) microsoft (4) Pandas (3) PHP (20) Projects (29) Python (753) Questions (2) R (70) React (6) Scripting (1) security (3) Software (17) SQL (40) UX Research (1)