Data Scientist

Introduction

Data Scientist, Decision Scientist or a Business Analyst describes new job opportunity in today’s era.

We have data all around us and a person who can use this data to provide a better insight is called a data scientist 

Data science is an interdisciplinary field about scientific methods, processes and systems to extract knowledge or insights from data in various forms either structured or unstructured.

Data science involves not just computer programming skills but also statistics and other logical reasoning aspects.

With the arrival of Internet of Things (I.O.T), data science is all set to grow.
The data scientist job is in huge demand now and is clearly going to be in demand in future too. 

 Major tools and Technology in this field are – R, SAS, SQL Python, Hadoop, Hive, Tableau, etc. 

Who should attend

  • IT professionals looking for a career in data science and business analytics
  • Software developers looking for a career in data science and business analytics
  • Professionals who are currently working in data and business analytics
  • Computer Science Graduates
  • Financial and Business Analysts
  • University & College Graduates/students looking for a career as a data scientist
  • Anyone with Statistics background
  • Research Associates

Course Outline – Data Scientist with R Programming- 50 Hours

Course overview

  • Understanding Data Science
  • Arena
  • Getting Tool Ready
  • R Programming Basics
  • Data Logistics in R
  • Exploratory Data Analysis- EDA
  • Understanding and building Models
  • Understanding how to Forecasting
  • Analytics with Text
  • Creating Data Solution
  • Cap stone Project

Understanding Data Science Arena

  • Need of Data Science
  • History
  • Application in various industries
  • Future possibilities
  • Where do you stand?
  • What is Business Analytics and what Business Analyst do?
  • BALC (Business Analytics Life Cycle)
  • What tool should be in your Arsenal?
  • Business Analytics Focus Areas
  • Business Analytics Tool

Getting Tool Ready

  • Downloading and Installing R
  • Downloading and installing Rstudio
  • Understanding CRAN
  • Understanding Rstudio
  • Installing and loading R Packages
  • Introduction and Setting up your github
  • R Markdown understanding

 R Programming Basics

  • Data Structure in R
  • Working with various data structures
  • Control Structures in Programming
  • Loop functions – apply/lapply/split etc
  • Creating user defined functions
  • Generating Random numbers
  • Data Splitting: training and test
  • Data Summary and dealing with missing data
  • Working with Data Frame dplyr

 Data Logistics in R

  • Basics of data input and output
  • Data Connection to different sources like MySQL, HDFS, XML files, JSON, Excel, CSV, other sources
  • Connecting with databases

Exploratory Data Analysis – EDA

  • Understanding Summary Statistics – Mean Median, Mode, Quantiles, Percentiles etc.
  • Box Plot; Scatter plots and Histogram; Multiple Scatter plot
  • Type of data
  • Probability Distribution: Discrete and Continuous
  • Variance and Standard Deviation
  • Random Variable
  • Normal Distribution
  • Z-Score and p-value ; Q-Q plot understanding
  • Hypothesis Testing
  • Central limit theorem, Confidence interval
  • Chi-Square and ANOVA
  • PCA, LDA & MLE
  • Information Gain and WOE

 Understanding and building Models

  • Understanding Unsupervised learning
  • Understanding Supervised Learning
  • Linear regression
  • Logistics Regression
  • Clustering Algorithms : K-Mean
  • Classification Algorithms : KNN
  • Decision Tree Algorithms
  • Random Forest
  • Support Vactor Machine
  • Basics of Neural Network Algorithms

Understanding how to Forecasting

  • What is Time Series
  • ARIMA model
  • MA model
  • Understanding ACF plots
  • Seasonality and trend in data
  • Handling seasonality and trends in model
  • Forecasting using model

Analytics with Text

  • Read the Text Data in R
  • Text Preprocessing – Cleaning
  • Create Document Term Matrix
  • Basic Text Analysis

–Using Term frequency(TF)

–Using Term Frequency Inverse Document Frequency(TF-IDF)

–Sentiment Analysis

  • Create Positive Words wordcloud
  • Create Negative Words wordcloud
  • Positive words v/s Negative Words plot
  • Case Study –Sentiment Analysis for IPL Teams

Creating Data Solution

  • Creating Packages in R
  • Build using R Markdown
  • Overview of R Shiny
  • Scheduling R code

CapStone Project

  • Search project
  • Getting the data
  • Data manipulation and EDA
  • Build Predictive model
  • Tell the Story
  • Peer and final Review

Practice Exam Questions & Review along with internal assessment

Resume Prep. with a sample mock interview

 

 

Comments are closed.