This 4 day training course is designed for developers who need to create applications to analyze Big Data stored in Apache Hadoop using Pig and Hive. Topics include: Hadoop, YARN, HDFS, MapReduce, data ingestion, workflow definition, using Pig and Hive to perform data analytics on Big Data and an introduction to Spark Core and Spark SQL.
DAY 1: AN INTRODUCTION TO THE HADOOP DISTRIBUTED FILE SYSTEM
OBJECTIVES
LABS
Starting an HDP Cluster
Demonstration: Understanding Block Storage
Using HDFS Commands
Importing RDBMS Data into HDFS
Exporting HDFS Data to an RDBMS
Importing Log Data into HDFS Using Flume
Demonstration: Understanding MapReduce
Running a MapReduce Job
DAY 2: AN INTRODUCTION TO APACHE PIG
OBJECTIVES
Introduction to Apache Pig
Advanced Apache Pig Programming
LABS
Demonstration: Understanding Apache Pig
Getting Starting with Apache Pig
Exploring Data with Apache Pig
Splitting a Dataset
Joining Datasets with Apache Pig
Preparing Data for Apache Hive
Demonstration: Computing Page Rank
Analyzing Clickstream Data
Analyzing Stock Market Data Using Quantiles
DAY 3: AN INTRODUCTION TO APACHE HIVE
OBJECTIVES
Apache Hive Programming
Using HCatalog
Advanced Apache Hive Programming
LABS
Understanding Hive Tables
Understanding Partition and Skew
Analyzing Big Data with Apache Hive
Demonstration: Computing NGrams
Joining Datasets in Apache Hive
Computing NGrams of Emails in Avro Format
Using HCatalog withApachePig
DAY 4: WORKING WITH SPARK CORE, SPARK SQL AND OOZIE
OBJECTIVES
Advanced Apache Hive Programming (Continued)
Hadoop 2 and YARN
Introduction to Spark Core and Spark SQL
Defining Workflow with Oozie
LABS
Advanced Apache Hive Programming
Running a YARN Application
Getting Started with Apache Spark
Exploring Apache Spark SQL
Defining an Apache Oozie Workflow