We've noticed this is not your region.
Redirect me to my region
What do you want to learn today?

HDP Developer: Quick Start - Hortonworks Official Curriculum

ENDED
Short Course by  Agilitics Pte. ltd
Inquire Now
On-Site / Short Course
Ended last Dec 12, 2018
USD  2,800.00

Details

COURSE OVERVIEW

This 4 day training course is designed for developers who need to create applications to analyze Big Data stored in Apache Hadoop using Apache Pig and Apache Hive, and developing applications on Apache Spark.

Topics include: Essential understanding of HDP and its capabilities, Hadoop, YARN, HDFS, MapReduce/Tez, data ingestion, using Pig and Hive to perform data analytics on Big Data and an introduction to Spark Core, Spark SQL, Apache Zeppelin, and additional Spark features.

Outline

COURSE CONTENT

DAY 1: AN INTRODUCTION TO APACHE HADOOP AND HDFS

OBJECTIVES

  • The Case for Hadoop

  • The Hadoop Ecosystem

  • The HDFS Architecture

  • Ingesting Data Into HDFS

  • Parallel Processing Fundamentals

  • YARN Architecture

  • Introduction to Apache Pig

LABS

  • Starting anHDP Cluster

  • Using HDFS Commands

  • Demonstration: Understanding Apache Pig

  • Getting Started with Apache Pig

  • Exploring Data with Pig

DAY 2: ADVANCED APACHE PIG PROGRAMMING

OBJECTIVES

  • Advanced Apache Pig Programming

  • Introduction to Apache Hive

  • Using HCatalog

LABS

  • Splitting a Dataset

  • Joining Datasets

  • Preparing Data for Apache Hive

  • Understanding Apache Hive Tables

  • Demonstration: Understanding Partitions and Skew

  • Analyzing Big Data with Apache Hive

  • Demonstration: Computing Ngrams

  • Joining Datasets in Apache Hive

  • Computing NGrams of Emails in Avro Format

  • Using HCatalog with Apache Pig

DAY 3: ADVANCED APACHE HIVE PROGRAMMING

OBJECTIVES

  • Advanced Apache Hive Programming

  • An Overview of Apache Zeppelin and Apache Spark

  • An Introduction to RDD Programming

  • An Introduction to Pair RDDs

LABS

  • Advanced Apache Hive Programming

  • Introduction to Apache Spark REPLs and Apache Zeppelin

  • Creating and Manipulating RDDs

  • Creating and Manipulating Pair RDDs

DAY 4: WORKING WITH PAIR RDDS AND BUILDING YARN APPLICATIONS

OBJECTIVES

  • An Introduction to Pair RDDs (Continued)

  • An Introduction to Spark SQL

  • Caching and Persisting

  • Building and Submitting Applications to YARN

LABS

  • Creating and Saving DateFrames and Tables

  • Working with DataFrames

  • Building and Submitting Applications to YARN

Schedules

No. of Days: 4
Total Hours: 32
No. of Participants: 100
Reviews
Be the first to write a review about this course.
Write a Review
Agilitics Pte. Ltd. is a Singapore Headquartered , Big Data Analytics , Business Intelligence and Agile consulting firm. Agilitics Pte. Ltd. is partner of all premium vendors of Big data and have exclusive clientele.Our Motto is to deliver Analytics with Agility+Quality.
Sending Message
Please wait...
× × Speedycourse.com uses cookies to deliver our services. By continuing to use the site, you are agreeing to our use of cookies, Privacy Policy, and our Terms & Conditions.