Hadoop Architecture & Administration Training for Big Data Solutions

Level: Intermediate
Rating: 4.7/5 4.72/5 Based on 36 Reviews

In this Hadoop Architecture and Administration big data training course, you gain the skills to install, configure, and manage the Apache Hadoop platform and its associated ecosystem, and build a Hadoop big data solution that satisfies your business and data science requirements. You will learn to install and build a Hadoop cluster capable of processing very large data sets, then configure and tune the Hadoop environment to ensure high throughput and availability.

Additionally, this course will teach attendees how to allocate, distribute and manage resources; monitor the Hadoop file system, job progress and overall cluster performance; as well as exchange information with relational databases.

Key Features of this Hadoop Administration for Big Data Training

  • After-course instructor coaching benefit
  • Learning Tree end-of-course exam included
  • After-course computing sandbox included

You Will Learn How To

  • Architect a Hadoop solution to satisfy your business requirements
  • Install and build a Hadoop cluster capable of processing large data and executing data science jobs
  • Configure and tune the Hadoop environment to ensure high throughput and availability
  • Allocate, distribute, and manage resources
  • Monitor the file system, job progress, and overall cluster performance

Certifications/Credits:

CPE 23 Credits

Choose the Training Solution That Best Fits Your Individual Needs or Organizational Goals

LIVE, INSTRUCTOR-LED

In Class & Live, Online Training

  • 4-day instructor-led training course
  • After-course instructor coaching benefit
  • Learning Tree end-of-course exam included
  • Earn 23 NASBA credits (live, in-class training only)
View Course Details & Schedule

Standard $2990

Government $2659

RESERVE SEAT

PRODUCT #1252

TRAINING AT YOUR SITE

Team Training

  • Bring this or any training to your organization
  • Full - scale program development
  • Delivered when, where, and how you want it
  • Blended learning models
  • Tailored content
  • Expert team coaching

Customize Your Team Training Experience

CONTACT US

Save More On Training with FlexVouchers – A Unique Training Savings Account

Our FlexVouchers help you lock in your training budgets without having to commit to a traditional 1 voucher = 1 course classroom-only attendance. FlexVouchers expand your purchasing power to modern blended solutions and services that are completely customizable. For details, please call 888-843-8733 or chat live.

In Class & Live, Online Training

Time Zone Legend:
Eastern Time Zone Central Time Zone
Mountain Time Zone Pacific Time Zone

Note: This course runs for 4 Days *

*Events with the Partial Day Event clock icon run longer than normal but provide the convenience of half-day sessions.

  • Jan 12 - 15 9:00 AM - 4:30 PM EST New York / Online (AnyWare) New York / Online (AnyWare) Reserve Your Seat

  • Mar 29 - Apr 1 9:00 AM - 4:30 PM EDT Ottawa / Online (AnyWare) Ottawa / Online (AnyWare) Reserve Your Seat

  • Jul 20 - 23 9:00 AM - 4:30 PM EDT New York / Online (AnyWare) New York / Online (AnyWare) Reserve Your Seat

Guaranteed to Run

When you see the "Guaranteed to Run" icon next to a course event, you can rest assured that your course event — date, time — will run. Guaranteed.

Partial Day Event

Learning Tree offers a flexible schedule program. If you cannot attend full day sessions, this option consists of four-hour sessions per day instead of the full-day session.

Hadoop Administration Course Information

  • Recommended Experience

    • Knowledge of Linux at the level of:
    • Knowledge of Java at the level of:

Hadoop Administration Course Outline

  • Introduction to Data Storage and Processing

    Installing the Hadoop Distributed File System (HDFS)

    • Defining key design assumptions and architecture
    • Configuring and setting up the file system
    • Issuing commands from the console
    • Reading and writing files

    Setting the stage for MapReduce

    • Reviewing the MapReduce approach
    • Introducing the computing daemons
    • Dissecting a MapReduce job
  • Defining Hadoop Cluster Requirements

    Planning the architecture

    • Selecting appropriate hardware
    • Designing a scalable cluster

    Building the cluster

    • Installing Hadoop daemons
    • Optimizing the network architecture
  • Configuring a Cluster

    Preparing HDFS

    • Setting basic configuration parameters
    • Configuring block allocation, redundancy and replication

    Deploying MapReduce

    • Installing and setting up the MapReduce environment
    • Delivering redundant load balancing via Rack Awareness
  • Maximizing HDFS Robustness

    Creating a fault–tolerant file system

    • Isolating single points of failure
    • Maintaining High Availability
    • Triggering manual failover
    • Automating failover with Zookeeper

    Leveraging NameNode Federation

    • Extending HDFS resources
    • Managing the namespace volumes

    Introducing YARN

    • Critiquing the YARN architecture
    • Identifying the new daemons
  • Managing Resources and Cluster Health

    Allocating resources

    • Setting quotas to constrain HDFS utilization
    • Prioritizing access to MapReduce using schedulers

    Maintaining HDFS

    • Starting and stopping Hadoop daemons
    • Monitoring HDFS status
    • Adding and removing data nodes

    Administering MapReduce

    • Managing MapReduce jobs
    • Tracking progress with monitoring tools
    • Commissioning and decommissioning compute nodes
  • Maintaining a Cluster

    Employing the standard built–in tools

    • Managing and debugging processes using JVM metrics
    • Performing Hadoop status checks

    Tuning with supplementary tools

    • Assessing performance with Ganglia
    • Benchmarking to ensure continued performance
  • Extending Hadoop

    Simplifying information access

    • Enabling SQL–like querying with Hive
    • Installing Pig to create MapReduce jobs

    Integrating additional elements of the ecosystem

    • Imposing a tabular view on HDFS with HBase
    • Leveraging memory with Spark
  • Implementing Data Ingress and Egress

    Facilitating generic input/output

    • Moving bulk data into and out of Hadoop
    • Transmitting HDFS data over HTTP with WebHDFS

    Acquiring application–specific data

    • Collecting multi–sourced log files with Flume
    • Importing and exporting relational information with Sqoop
  • Planning for Backup, Recovery and Security

    • Coping with inevitable hardware failures
    • Securing your Hadoop cluster

Team Training

Hadoop Administration Training FAQs

  • Can I learn Hadoop Architecture and Administration online?

    Yes! We know your busy work schedule may prevent you from getting to one of our classrooms which is why we offer convenient online training to meet your needs wherever you want, including online training.

  • Where does MongoDB fit in my data science training?

    A data science algorithm will ingest data from an appropriate storage technology like a relational database, MongoDB, Hadoop distributed file system into R or Python for data wrangling and model building. If the amount of data is large execution is performed in parallel using Spark. The results will often be visualized by the end user on dashboards.

Questions about which training is right for you?

call 888-843-8733
chat Live Chat




100% Satisfaction Guaranteed

Your Training Comes with a 100% Satisfaction Guarantee!*

*Partner-delivered courses may have different terms that apply. Ask for details.

New York / Online (AnyWare)
Ottawa / Online (AnyWare)
New York / Online (AnyWare)
Preferred method of contact:
Chat Now

Please Choose a Language

Canada - English

Canada - Français