Data Scientist

Introduction

Data Scientist, Decision Scientist or a Business Analyst describes new job opportunity in today’s era.

We have data all around us and a person who can use this data to provide a better insight is called a data scientist 

Data science is an interdisciplinary field about scientific methods, processes and systems to extract knowledge or insights from data in various forms either structured or unstructured.

Data science involves not just computer programming skills but also statistics and other logical reasoning aspects.

With the arrival of Internet of Things (I.O.T), data science is all set to grow.
The data scientist job is in huge demand now and is clearly going to be in demand in future too. 

 Major tools and Technology in this field are – R, SAS, SQL Python, Hadoop, Hive, Tableau, etc. 

Who should attend

  • IT professionals looking for a career in data science and business analytics
  • Software developers looking for a career in data science and business analytics
  • Professionals who are currently working in data and business analytics
  • Computer Science Graduates
  • Financial and Business Analysts
  • University & College Graduates/students looking for a career as a data scientist
  • Anyone with Statistics background
  • Research Associates

Course Outline – Data Scientist with R Programming- 55 Hours 

Course Overview

  • Understanding data science
  • Getting tools ready
  • Capstone project outline
  • R programming basics
  • Data retrieval and cleansing
  • Exploratory data analysis
  • Statistical inference
  • Regression and classification modeling
  • Practical machine learning
  • Creating reports
  • Building interactive web applications
  • Capstone project

 Understanding Data Science

  • Need of data science
  • History
  • Application in various industries
  • Future possibilities
  • What is data?
  • What do data scientists do?
  • What questions should you be asking?
  • Data science focus areas
  • Data science tools

 Getting Tools Ready

  • Downloading and installing R
  • Downloading and installing RStudio
  • Understanding CRAN
  • Understanding RStudio
  • Installing and loading R Packages
  • Workspace and files
  • Console and evaluations
  • Debugging

Capstone Project Outline

  • Search project ideas
  • Retrieve the data
  • Clean and explore the data
  • Build model
  • Tell the story

R Programming Basics

  • Objects and data structures
    • Vectors and lists
    • Matrices and data frames
    • Arrays
    • Factors
  • Subsetting
  • Control structures
  • Logic
  • Looking into the data
  • Split-apply-combine functions
  • Creating user defined functions
  • Date and time
  • Simulation
  • Generating random numbers
  • Reading tabular data
  • Base plotting
  • Scoping

Data Retrieval and Cleansing

  • Raw and tidy data
  • Loading data
    • Download files
    • Reading different file types
    • Reading from databases
    • Web scraping
  • Subsetting and sorting
  • Summarizing data
  • Creating new variables
  • Regular expressions
  • Editing text variables
  • Working with dates
  • data.table package
  • Dplyr package
  • Tidyr package
  • Lubridate package

Exploratory Data Analysis

  • Purpose of exploratory graphs
  • Graphics device
  • Base plotting
    • Functions and parameters
    • Type of plots
    • Multiple plotting
  • Ggplot2 plotting
    • Functions and parameters
    • Qplot
    • Ggplot
  • Clustering
    • Hierarchical
    • K-means
  • Dimension reduction
    • Singular value decomposition
    • Principal component analysis

 Statistical Inference

  • Overview
  • Probability
    • General
    • Conditional
    • Bayes’ rule
  • Random variables
    • Probability mass function
    • Probability density function
    • Cumulative distribution function
    • Independence
  • Expected values and mean
  • Variance
  • Distributions
    • Normal
    • Binomial
    • Poisson
  • Asymptotic/large sample theory
    • Law of large numbers
    • Central limit theorem
    • Confidence intervals
  • Hypothesis testing

 Regression and Classification Modeling

  • Regression versus classification
  • Linear regression
  • Residuals
  • ANOVA
  • Classification algorithms
    • Logistic regression
    • Decision tree
    • K-nearest neighbor
    • Naïve Bayes
    • Random forest
    • Support vector machine

 Practical Machine Learning

  • Understanding unsupervised learning
  • Understanding supervised learning
  • Prediction study design
  • Types of errors
  • Cross validation
  • Caret package
    • Functionality
    • Slicing data
    • Training options
    • Plot predictors
    • Preprocessing
  • Fitting predictive models
  • Determining best model
  • Predicting the outcome

 Creating Reports

  • Structure of a data analysis report
  • Organization of data analysis files
  • What is markdown
  • R Markdown
    • Basics
    • Output formats
    • Notebooks
    • Slide presentations
    • Dashboards
    • Websites
    • Interactive documents

 Building Interactive Web Applications

  • Shiny applications
    • Structure
    • User interface
    • Server
    • Distributing application
  • Building Shiny application demo

 Capstone Project

  • Peer presentations
  • Project findings and results
  • Preparing for a data scientist career
  • Course discussion and assessment

 

Comments are closed.