Preferred method of contact:

Perform Data Engineering on Microsoft HDInsight Training (20775)

COURSE TYPE

Intermediate

Course Number

8491

Duration

5 Days

PDF Add to WishList

The main purpose of the course is to give students the ability plan and implement big data workflows on HDInsight.

This is a Microsoft Official Course (MOC) delivered by a Learning Tree expert instructor.

You Will Learn How To

  • Explain Microsoft R
  • Transform and clean big data sets

Important Course Information

Requirements

  • Programming experience using R, and familiarity with common R packages
  • Knowledge of common statistical methods and data analysis best practices
  • Basic knowledge of the Microsoft Windows operating system and its core functionality
  • Working knowledge of relational databases

Redeem Your Microsoft Training Vouchers (SATV)

Course Outline

  • Module 1: Getting Started with HDInsight

This module introduces Hadoop, the MapReduce paradigm, and HDInsight.

Lessons

  • Big Data
  • Hadoop
  • MapReduce
  • HDInsight

Lab : Querying Big Data

  • Query data with Hive
  • Visualize data with Excel

After completing this module, students will be able to:

  • Describe Big data
  • Describe Hadoop
  • Describe MapReduce
  • Describe HDInsight
  • Module 2: Deploying HDInsight Clusters

At the end of this module the student will be able to deploy HDInsight clusters.

Lessons

  • HDInsight cluster types
  • Managing HDInsight Clusters
  • Managing HDInsight Clusters with PowerShell

Lab : Managing HDInsight clusters with the Azure Portal

  • Create an HDInsight Hadoop Cluster
  • Customize HDInsight using a script action
  • Customize HDInsight using Bootstrap
  • Delete an HDInsight cluster

After completing this module, students will be able to:

  • Describe HDInsight cluster types.
  • Describe the creation, management, and deletion of HDInsight clusters with the Azure portal
  • Describe the creation, management, and deletion of HDInsight clusters with PowerShell
  • Module 3: Authorizing Users to Access Resources

This module covers permissions and the assignment of permissions.

Lessons

  • Non-domain Joined clusters
  • Configuring domain-joined HDInsight clusters
  • Manage domain-joined HDInsight clusters

Lab : Authorizing Users to Access Resources

  • Configure a domain-joined HDInsight cluster
  • Configure Hive policies

After completing this module, students will be able to:

  • Describe how to authorize user access to objects
  • Describe how to authorize users to execute code
  • Describe how to manage domain-joined HDInsight clusters
  • Module 4: Loading data into HDInsight

This module covers loading data into HDInsight.

Lessons

  • HDInsight Storage
  • Data loading tools
  • Performance and reliability

Lab : Loading Data into HDInsight

  • Loading data using Sqoop
  • Loading data using AZcopy
  • Loading data using ADLcopy
  • Use HDInsight to compress data

After completing this module, students will be able to:

  • Describe HDInsight storage configurations and architectures
  • Describe options for loading data into HDInsight
  • Describe benefits of compression and pre-processing in HDInsight
  • Module 5: Troubleshooting HDInsight

This module describes how to troubleshoot HDInsight.

Lessons

  • Analyze HDInsight logs
  • YARN logs
  • Heap dumps
  • Operations management suite

Lab : Troubleshooting HDInsight

  • Analyze HDInsight logs
  • Analyze YARN logs
  • Monitor resources with Operations Management Suite

After completing this module, students will be able to:

  • Analyze HDInsight logs
  • Analyze YARN logs
  • Analyze Heap dumps
  • Use the operations management suite to monitor resources
  • Module 6: Implementing Batch Solutions

This module describes how to implement batch solutions.

Lessons

  • Apache Hive storage
  • Querying with Hive and Pig
  • Operationalize HDInsight

Lab : Backing Up SQL Server Databases

  • Load data into a hive table
  • Query data with Hive and Pig

After completing this module, students will be able to:

  • Describe Apache Hive storage
  • Query data using Hive and Pig
  • Operationalize HDInsight
  • Module 7: Design Batch ETL solutions for big data with Spark

This module describes how to design batch ETL solutions for big data with Spark.

Lessons

  • What is Spark?
  • ETL with Spark
  • Spark performance

Lab : Design Batch ETL solutions for big data with Spark.

  • Create a HDInsight Cluster with access to Data Lake Store
  • Use HDInsight Spark cluster to analyze data in Data Lake Store
  • Analyzing website logs using a custom library with Apache Spark cluster on HDInsight
  • Managing resources for Apache Spark cluster on Azure HDInsight

After completing this module, students will be able to:

  • Describe Spark and when to use it
  • Describe the use of ETL with Spark
  • Analyze Spark performance
  • Module 8: Analyze Data with Spark SQL

This module describes how to analyze data by using Spark SQL. In it, you will be able to explain the differences between RDD, Datasets and Dataframes, identify the uses cases between Iterative and Interactive queries, and describe best practices for Caching, Partitioning and Persistence. You will also look at how to use Apache Zeppelin and Jupyter notebooks, carry out exploratory data analysis, then submit Spark jobs remotely to a Spark cluster.

Lessons

  • Implementing iterative and interactive queries.
  • Perform exploratory data analysis.

Lab : Analyze data with Spark SQL

  • Implement interactive queries
  • Perform exploratory data analysis

After completing this module, students will be able to:

  • Implement interactive queries.
  • Perform exploratory data analysis.
  • Module 9: Analyze Data with Hive and Phoenix

This module describes how to analyze data with Hive and Phoenix.

Lessons

  • Implement interactive queries for big data with interactive hive.
  • Perform exploratory data analysis by using Hive
  • Perform interactive processing by using Apache Phoenix

Lab : Analyze data with Hive and Phoenix

  • Implement interactive queries for big data with interactive Hive
  • Perform exploratory data analysis by using Hive
  • Perform interactive processing by using Apache Phoenix

After completing this module, students will be able to:

  • Implement interactive queries with interactive Hive
  • Perform exploratory data analysis using Hive
  • Perform interactive processing by using Apache Phoenix
  • Module 10: Stream Analytics

This module introduces Azure Stream Analytics.

Lessons

  • Stream analytics
  • Process streaming data from stream analytics
  • Managing stream analytics jobs

Lab : Implement Stream Analytics

  • Process streaming data with stream analytics
  • Managing stream analytics jobs

After completing this module, students will be able to:

  • Describe stream analytics and it’s capabilities
  • Process streaming data with stream analytics
  • Manage stream analytics jobs
  • Module 11: Implementing Streaming Solutions with Kafka and HBase

In this module, you will learn how to use Kafka to build streaming solutions. You will also see how to use Kafka to persist data to HDFS by using Apache HBase, and then query this data.

Lessons

  • Building and Deploying a Kafka Cluster
  • Publishing, Consuming, and Processing data using the Kafka Cluster
  • Using HBase to store and Query Data

Lab : Implementing Streaming Solutions with Kafka and HBase

  • Create a virtual network and gateway
  • Create a storm cluster for Kafka
  • Create a Kafka producer
  • Create a streaming processor client topology
  • Create a Power BI dashboard and streaming dataset
  • Create an HBase cluster
  • Create a streaming processor to write to HBase

After completing this module, students will be able to:

  • Build and deploy a Kafka Cluster.
  • Publish data to a Kafka Cluster, consume data from a Kafka Cluster, and perform stream processing using the Kafka Cluster.
  • Save streamed data to HBase, and perform queries using the HBase API.
  • Module 12: Develop big data real-time processing solutions with Apache Storm

This module explains how to develop big data real-time processing solutions with Apache Storm.

Lessons

  • Persist long term data
  • Stream data with Storm
  • Create Storm topologies
  • Configure Apache Storm

Lab : Developing big data real-time processing solutions with Apache Storm

  • Stream data with Storm
  • Create Storm topologies

After completing this module, students will be able to:

  • Persist long term data
  • Stream data with Storm
  • Create Storm topologies
  • Configure Apache Storm
  • Module 13: Create Spark Streaming Applications

This module describes Spark Streaming; explains how to use discretized streams (DStreams); and explains how to apply the concepts to develop Spark Streaming applications.

Lessons

  • Working with Spark Streaming
  • Creating Spark Structured Streaming Applications
  • Persistence and Visualization

Lab : Building a Spark Streaming Application

  • Installing Required Software
  • Building the Azure Infrastructure
  • Building a Spark Streaming Pipeline

After completing this module, students will be able to:

  • Describe Spark Streaming and how it works.
  • Use discretized streams (DStreams).
  • Work with sliding window operations.
  • Apply the concepts to develop Spark Streaming applications.
  • Describe Structured Streaming.
Show complete outline
Show Less

Convenient Ways to Attend This Instructor-Led Course

Hassle-Free Enrollment: No advance payment required to reserve your seat.
Tuition due 30 days after you attend your course.

?
With a blend of video, text, hands-on labs, and knowledge checks, you will receive the same high quality content as the live event, but you can attend on your own time, at your own pace.

On Demand +
Instructor Coaching

Private Team Training

On Demand + Instructor Coaching
Tuition — $895

With a blend of video, text, hands-on labs, and knowledge checks, you will receive the same high quality content as the live event, but you can attend on your own time, at your own pace.

PLUS, we include access to a Microsoft Certified Trainer (MCT) to help you prepare for your certification exam and help you apply your new skills immediately… Learning Tree knows how to bring learning to life!

  • Flexibility to take the course on your own time, at your own pace
  • Forever access to the digital course materials – for any refreshers
  • You will receive a code with your purchase. The code may be redeemed for online access to this On Demand course for up to six months
  • Upon course activation, the MOC On Demand videos and labs are available for three months
  • 2 FREE hours of individual coaching from an MCT Learning Tree Instructor
  • This delivery is also eligible for Microsoft Assurance Training Vouchers (SATVs)
  • NOTE: Only live, in-class training is eligible for NASBA CPEs; on-demand training is not eligible for CPE credit

For enrolling multiple subscribers at the same time, contact us »

Private Team Training

Enrolling at least 3 people in this course? Consider bringing this (or any course that can be custom designed) to your preferred location as a private team training.

For details, call 1-888-843-8733 or Click Here »

This event has been added to your cart.

Tuition

Standard

Government

In Classroom or
Online

Standard

$3190

Government

$2833

On Demand

$895*

Private Team Training

Contact Us »

*prices exclude applicable taxes


Course Tuition Includes:

After-Course Instructor Coaching
When you return to work, you are entitled to schedule a free coaching session with your instructor for help and guidance as you apply your new skills.

Prev
Next

Training Hours

Standard Course Hours: 9:00 am – 4:30 pm
*Informal discussion with instructor about your projects or areas of special interest: 4:30 pm – 5:30 pm

Enhance Your Credentials with Professional Certification

Learning Tree's comprehensive training and exam preparation guarantees that you will gain the knowledge and confidence to achieve professional certification and advance your career.

Earn 29 Credits from NASBA

This course qualifies for 29 CPE credits from the National Association of State Boards of Accountancy CPE program. NOTE: Only live, in-class attendance qualifies for NASBA CPEs. Read more ...

- ,

Prev
Next
Chat Now

Please Choose a Language

Canada - English

Canada - Français