About Course
Course Objective
Big Data is one of the most expediting and promising fields, considering the technologies available in the market today. To make the most of these opportunities, you need structured training with the latest curriculum as per current industry requirements and best practices.
Besides a strong theoretical understanding, you need to work on various real-world big data projects using different Big Data and Hadoop tools as a solution strategy. This Big Data Analytics course is curated to cover in-depth knowledge on Big Data and Hadoop Ecosystem tools such as HDFS, YARN, MapReduce, Hive, and Pig.
This course will help you gain a comprehensive understanding of various tools that fall in the Hadoop Ecosystem, like Pig, Hive, Sqoop, Flume, Oozie, and HBase.
Course Eligibility
There are no such prerequisites for Big Data & Hadoop Course. However, prior knowledge of Core Java and SQL will be helpful but is not mandatory.
Package Requisites
CloudLab environment that a browser could access.
Course Modules
Module 1: Understanding Big Data and Hadoop Learning Objectives
This module will understand
- What is Big Data
- The limitations of the traditional solutions for Big Data problems
- How Hadoop solves those Big Data problems
- Hadoop Ecosystem
- Hadoop Architecture
- HDFS
- Anatomy of File Read
- And Write & how Map Reduce works.
Module 2: Hadoop Architecture and HDFS
In this module, you will learn
- Hadoop Cluster Architecture
- Important configuration files of Hadoop Cluster
- Data Loading Techniques using Sqoop & Flume
- And how to set up Single Node and Multi-Node Hadoop Cluster.
Module 3: Hadoop MapReduce Framework
In this module, you will understand the
- Hadoop MapReduce framework comprehensively
- The working of MapReduce on data stored in HDFS.
You will also learn the advanced MapReduce concepts like Input Splits, Combiner & Partitioner.
Module 4: Advanced Hadoop MapReduce
In this module, you will learn advanced MapReduce concepts such as
- Counters
- Distributed Cache
- MRunit, Reduce Join
- Custom Input Format
- Sequence Input
- Format XML parsing.
Module 5: Apache Pig
In this module, you will learn
- Apache Pig
- Types of use cases where we can use Pig
- Tight coupling between Pig and MapReduce
- Pig Latin scripting
- Pig running modes
- Pig UDF
- Pig Streaming & Testing
- Pig Scripts.
You will also be working on a healthcare dataset.
Module 6: Apache Hive
This module will help you understand
- Hive concepts
- Hive Data types
- Loading and Querying data in Hive
- Running Hive Scripts
- And Hive UDF.
Module 7: Advanced Apache Hive and HBase
In this module, you will understand
- Advanced Apache Hive concepts such as UDF
- Dynamic Partitioning
- Hive Indexes and Views, and optimizations in Hive.
You will also acquire in-depth knowledge of Apache HBase, HBase Architecture, HBase running modes, and its components.
Module 8: Advanced Apache HBase
This module will cover advanced Apache HBase concepts. We will see demos on HBase Bulk Loading & HBase Filters. You will also learn what Zookeeper is all about, monitor a cluster, & why HBase uses Zookeeper.
Module 9: Processing Distributed Data with Apache Spark
In this module, you will learn what Apache Spark, SparkContext & Spark Ecosystem is. You will learn how to work in Resilient Distributed Datasets (RDD) in Apache Spark. You will be running the application on Spark Cluster & comparing the performance of MapReduce and Spark.
Outcomes
- Understand MapReduce Framework
- Implement complex business solutions using MapReduce
- Learn data ingestion techniques using Sqoop and Flume
- Perform ETL operations & data analytics using Pig and Hive
- Implementing Partitioning, Bucketing, and Indexing in Hive
- Understand HBase, i.e. a NoSQL Database in Hadoop, HBase Architecture & Mechanisms
- Integrate HBase with Hive
- Schedule jobs using Oozie
- Implement best practices for Hadoop development
- Understand Apache Spark and its Ecosystem
- Learn how to work with RDD in Apache Spark