Big Data Analytics - (Advanced)

About Course

Course Objective

Big Data is one of the most expediting and promising fields, considering the technologies available in the market today. To make the most of these opportunities, you need structured training with the latest curriculum as per current industry requirements and best practices.

Besides a strong theoretical understanding, you need to work on various real-world big data projects using different Big Data and Hadoop tools as a solution strategy. This Big Data Analytics course is curated to cover in-depth knowledge on Big Data and Hadoop Ecosystem tools such as HDFS, YARN, MapReduce, Hive, and Pig.

This course will help you gain a comprehensive understanding of various tools that fall in the Hadoop Ecosystem, like Pig, Hive, Sqoop, Flume, Oozie, and HBase.

Course Eligibility

There are no such prerequisites for Big Data & Hadoop Course. However, prior knowledge of Core Java and SQL will be helpful but is not mandatory.

Package Requisites

CloudLab environment that a browser could access.

Course Modules

Module 1: Understanding Big Data and Hadoop Learning Objectives

This module will understand

What is Big Data
The limitations of the traditional solutions for Big Data problems
How Hadoop solves those Big Data problems
Hadoop Ecosystem
Hadoop Architecture
HDFS
Anatomy of File Read
And Write & how Map Reduce works.

Module 2: Hadoop Architecture and HDFS

In this module, you will learn

Hadoop Cluster Architecture
Important configuration files of Hadoop Cluster
Data Loading Techniques using Sqoop & Flume
And how to set up Single Node and Multi-Node Hadoop Cluster.

Module 3: Hadoop MapReduce Framework

In this module, you will understand the

Hadoop MapReduce framework comprehensively
The working of MapReduce on data stored in HDFS.

You will also learn the advanced MapReduce concepts like Input Splits, Combiner & Partitioner.

Module 4: Advanced Hadoop MapReduce

In this module, you will learn advanced MapReduce concepts such as

Counters
Distributed Cache
MRunit, Reduce Join
Custom Input Format
Sequence Input
Format XML parsing.

Module 5: Apache Pig

In this module, you will learn

Apache Pig
Types of use cases where we can use Pig
Tight coupling between Pig and MapReduce
Pig Latin scripting
Pig running modes
Pig UDF
Pig Streaming & Testing
Pig Scripts.

You will also be working on a healthcare dataset.

Module 6: Apache Hive

This module will help you understand

Hive concepts
Hive Data types
Loading and Querying data in Hive
Running Hive Scripts
And Hive UDF.

Module 7: Advanced Apache Hive and HBase

In this module, you will understand

Advanced Apache Hive concepts such as UDF
Dynamic Partitioning
Hive Indexes and Views, and optimizations in Hive.

You will also acquire in-depth knowledge of Apache HBase, HBase Architecture, HBase running modes, and its components.

Module 8: Advanced Apache HBase

This module will cover advanced Apache HBase concepts. We will see demos on HBase Bulk Loading & HBase Filters. You will also learn what Zookeeper is all about, monitor a cluster, & why HBase uses Zookeeper.

Module 9: Processing Distributed Data with Apache Spark

In this module, you will learn what Apache Spark, SparkContext & Spark Ecosystem is. You will learn how to work in Resilient Distributed Datasets (RDD) in Apache Spark. You will be running the application on Spark Cluster & comparing the performance of MapReduce and Spark.

Outcomes

Understand MapReduce Framework
Implement complex business solutions using MapReduce
Learn data ingestion techniques using Sqoop and Flume
Perform ETL operations & data analytics using Pig and Hive
Implementing Partitioning, Bucketing, and Indexing in Hive
Understand HBase, i.e. a NoSQL Database in Hadoop, HBase Architecture & Mechanisms
Integrate HBase with Hive
Schedule jobs using Oozie
Implement best practices for Hadoop development
Understand Apache Spark and its Ecosystem
Learn how to work with RDD in Apache Spark

+91 80083 60077

Myra's Academy