Duration: 5 days / 40 hours
Time: 9am to 6pm
Course Code: CRS-Q-0040898-ICT
Exam Code:
Cloudera Certified Associate (CCA) Data Analyst - (CCA159)
Hortonworks HDP Certified Developer - (HDPCD)

Funding available for this course:

Enhanced Training Support for SMEs NICF-SF SkillsFuture Credit SkillsFuture Mid-Career Enhanced Subsidy

 

What Will Be Taught For This Big Data Course?

This five-day instructor-led course provides participants with concepts beyond the Big Data knowledge to get a head start with Hadoop. This course will also teach about data analysis using Hadoop Ecosystem for data analysts, business intelligence specialists, developers and system architects.

 

 

Statement of Attainment (SOA) from SSG

Participants will receive the SOA from SSG upon completion of training and assessment.

Module 1 - Basics of Big Data and Understanding Hadoop

  • Why we need Hadoop
  • Why Hadoop is in demand in market nowadays
  • Where expensive SQL based tools are failing
  • Key Points, Why Hadoop is leading tool in current IT Industry Definition of Big Data
  • Hadoop nodes
  • Introduction to Hadoop Release-1
  • Hadoop Daemons in Hadoop Release-1
  • Introduction to Hadoop Release-2
  • Hadoop Daemons in Hadoop Release-2
  • Hadoop Cluster and Racks
  • Hadoop Cluster Demo
  • New projects on Hadoop
  • How Open Source tools is capable to run jobs in lesser time Hadoop Storage – HDFS (Hadoop Distributed file system) Hadoop Processing Framework (Map Reduce / YARN) Alternates of Map Reduce
  • Why NOSQL is in much demand instead of SQL
  • Distributed warehouse for HDFS
  • Hadoop Ecosystem and its usages
  • Data import/Export tools

Module 2 - Hadoop Distributed Files System (HDFS) and Ingestion Tools

  • Hadoop installation
  • Introduction to Hadoop FS and Processing Environment’s UIs How to read and write files
  • Basic Unix commands for Hadoop
  • Hadoop FS shell
  • Hadoop releases practical
  • Hadoop daemons practical

Module 3 - Pig Programming

  • Pig-UDFs
  • Pig Use cases
  • Pig Assignment
  • Complex Use cases on Pig
  • Real time scenarios on Pig
  • When we should use Pig
  • When we shouldn’t use Pig

Module 4 - Hive Programming

  • Hive Introduction
  • Meta storage and meta store
  • Introduction to Derby Database
  • Hive Data types
  • HQL
  • DDL, DML and sub languages of Hive
  • Internal, external and Temp tables in Hive
  • Differentiation between SQL based Datawarehouse and Hive

Module 5 - Advanced Hive Programming

  • Hive releases
  • Why Hive is not best solution for OLTP OLAP in Hive
  • Partitioning
  • Bucketing
  • Hive Architecture
  • Thrift Server
  • Hue Interface for Hive
  • How to analyze data using Hive script Differentiation between Hive and Impala UDFs in Hive
  • Complex Use cases in Hive
  • Hive Advanced Assignment

Module 6 - Hadoop 2 and YARN

  • How to load data streaming data without fixed schema
  • How to load unstructured and semi structured data in Hadoop Introduction to Flume
  • Hands-on on Flume
  • How to load Twitter data in HDFS using Hadoop
  • Introduction to Oozie
  • How to schedule jobs using Oozie
  • What kind of jobs can be scheduled using Oozie
  • How to schedule jobs which are time based
  • Hadoop releases From where to get
  • Hadoop and other components to install
  • Introduction to YARN
  • Significance of YARN

Module 7 - HCatalogue

  • Introduction to NOSQL
  • Why NOSQL if SQL is in market since several years
  • Databases in market based on NOSQL CAP Theorem
  • ACID Vs. CAP
  • OLTP Solutions with different capabilities
  • Which Nosql based solution is capable to handle specific requirements Examples of companies that uses NOSQL based databases
  • HBase Architecture of column families

Module 8 - Introduction to Spark Core

  • Introduction to Spark
  • Basics Features of SPARK and Scala available in Hue Why SPARK demand is increasing in market
  • How can we use Spark with Hadoop Eco System Datasets for practice purpose

Module 9 - Emerging Technologies in Big Data and Ecosystem

  • YARN
  • Emerging Technologies of Big Data
  • Emerging use cases e.g. IoT, Industrial Internet, New Applications
  • Certifications and
  • Job Opportunities

Assessment Format & Duration: (only applicable for learner taking SF grant)

Assessment Mode Duration
Practical Performance Summative 5 hours
Oral & Written Assessment Summative 3.5 hours
Total 8.5 hours

Who Should Attend This Big Data Training?

This course is intended for executives, managers, consultants, business analysts, operation personnel, programmers, architects, administrators and data analysts who want a foundational overview of the key components required to effectively understand and analyse Big Data. Familiarity working with computers and business applications is assumed. Programming experience is beneficial but not required.

Pre-requisite

Prior knowledge of SQL is highly recommended. Linux knowledge will be helpful.

  w/o GST w GST
Course Fee $3,000 $3,210
Singapore Citizen & PR aged ≥ 21 years $2,400 $2,610
Singapore Citizen aged ≥ 40 years
(SkillsFuture Mid-Career Enhancement Funding)
$1,000 $1,210

 

  w/o GST w GST
Course Fee $3,000 $3,210
Singapore Citizen & PR aged ≥ 21 years $2,400 $2,610
Singapore Citizen aged ≥ 40 years
(SkillsFuture Mid-Career Enhancement Funding)
$1,000 $1,210

 

  w/o GST w GST
Course Fee $3,000 $3,210
Singapore Citizen & PR aged ≥ 21 years $1,000 $1,210
Singapore Citizen aged ≥ 40 years
(SkillsFuture Mid-Career Enhancement Funding)
$1,000 $1,210

 


Exam:

Course fees listed above are exclusive of exam fees.

  w/o GST w GST
Cloudera Certified Associate (CCA) Data Analyst - (CCA159) $495 $529.65
Hortonworks HDP Certified Developer - (HDPCD) $425 $454.75

 

 

 

 

Trainee shall be bound by the Terms and Conditions of any applicable funding scheme.

Please ensure that you have read our Terms and Conditions before submitting the enrolment form.







Please click on the course date to enrol.


  • CL: Classroom Learning
  • VILT: Virtual Instructor-Led Training
  • GTR: Guaranteed To Run
  • Sat: Saturday
  • Wkn: Weekend
Note: Courses are conducted via classroom unless stated otherwise beside the course dates.