Preferred method of contact:

Hadoop Programming with Java for Big Data Solutions



Course Number



4 Days

PDF Add to WishList

The availability of large data sets presents new opportunities and challenges to organizations of all sizes. In this course, you will implement a strategy for developing Hadoop jobs and extracting business value from large and varied data sets. This Apache Hadoop development training is essential for programmers who want to augment their programming skills to use Hadoop for a variety of big data solutions.

You Will Learn How To

  • Write, customize, and deploy Java MapReduce jobs to summarize data
  • Develop Hive and Pig queries to simplify data analysis
  • Test and debug jobs using MRUnit
  • Monitor task execution and cluster health

Important Course Information


  • Java experience at the level of:
    • Course 471, Java Programming Introduction, or at least six months of Java programming experience

Course Outline

  • Introduction to Hadoop
  • Identifying the business benefits of Hadoop
  • Surveying the Hadoop ecosystem
  • Selecting a suitable distribution
  • Parallelizing Program Execution

Meeting the challenges of parallel programming

  • Investigating parallelisable challenges: algorithms, data and information exchange
  • Estimating the storage and complexity of Big Data

Parallel programming with MapReduce

  • Dividing and conquering large-scale problems
  • Uncovering jobs suitable for MapReduce
  • Solving typical business problems
  • Implementing Real-World MapReduce Jobs

Applying the Hadoop MapReduce paradigm

  • Configuring the development environment
  • Exploring the Hadoop distribution
  • Creating the components of MapReduce jobs
  • Introducing the Hadoop daemons
  • Analyzing the stages of MapReduce processing: splitting, mapping, shuffling and reducing

Building complex MapReduce jobs

  • Selecting and employing multiple mappers and reducers
  • Leveraging built-in mappers, reducers and partitioners
  • Analyzing time series data with secondary sort
  • Streaming tasks through various programming languages
  • Customizing MapReduce

Solving common data manipulation problems

  • Executing algorithms: parallel sorts, joins and searches
  • Analyzing log files, social media data and e-mails

Implementing partitioners and comparators

  • Identifying network-bound, CPU-bound and disk I/O-bound parallel algorithms
  • Dividing the workload efficiently using partitioners
  • Controlling grouping and sort order with comparators
  • Collecting metrics with counters
  • Persisting Big Data with Distributed Data Stores

Making the case for distributed data

  • Achieving high performance data throughput
  • Recovering from media failure through redundancy

Interfacing with Hadoop Distributed File System (HDFS)

  • Breaking down the structure and organization of HDFS
  • Loading raw data and retrieving results
  • Reading and writing data programmatically
  • Manipulating Hadoop SequenceFile types
  • Sharing reference data with DistributedCache

Structuring data with HBase

  • Migrating from structured to unstructured storage
  • Applying NoSQL concepts with schema on read
  • Connecting to HBase from MapReduce jobs
  • Comparing HBase to other types of NoSQL data stores
  • Simplifying Data Analysis with Query Languages

Unleashing the power of SQL with Hive

  • Structuring databases, tables, views and partitions
  • Integrating MapReduce jobs with Hive queries
  • Querying with HiveQL
  • Accessing Hive servers through JDBC
  • Extending HiveQL with User-Defined Functions (UDF)

Executing workflows with Pig

  • Developing Pig Latin scripts to consolidate workflows
  • Integrating Pig queries with Java
  • Interacting with data through the grunt console
  • Extending Pig with User-Defined Functions (UDF)
  • Managing and Deploying Big Data Solutions

Testing and debugging Hadoop code

  • Logging significant events for auditing and debugging
  • Debugging in local mode
  • Validating requirements with MRUnit

Deploying, monitoring and tuning performance

  • Deploying to a production cluster
  • Optimizing performance with administrative tools
  • Monitoring job execution through web user interfaces
Show complete outline
Show Less

Convenient Ways to Attend This Instructor-Led Course

Hassle-Free Enrollment: No advance payment required to reserve your seat.
Tuition due 30 days after you attend your course.

In the Classroom

Live, Online

Private Team Training

In the Classroom — OR — Live, Online

Tuition — Standard: $2990   Government: $2659

Feb 13 - 16 (4 Days)
9:00 AM - 4:30 PM EST
Herndon, VA / Online (AnyWare) Herndon, VA / Online (AnyWare) Reserve Your Seat

How would you like to attend?

Live, Online

Mar 6 - 9 (4 Days)
9:00 AM - 4:30 PM EST
New York / Online (AnyWare) New York / Online (AnyWare) Reserve Your Seat

How would you like to attend?

Live, Online

Apr 3 - 6 (4 Days)
9:00 AM - 4:30 PM EDT
Toronto / Online (AnyWare) Toronto / Online (AnyWare) Reserve Your Seat

How would you like to attend?

Live, Online

May 8 - 11 (4 Days)
9:00 AM - 4:30 PM EDT
Ottawa / Online (AnyWare) Ottawa / Online (AnyWare) Reserve Your Seat

How would you like to attend?

Live, Online

Aug 7 - 10 (4 Days)
9:00 AM - 4:30 PM EDT
Herndon, VA / Online (AnyWare) Herndon, VA / Online (AnyWare) Reserve Your Seat

How would you like to attend?

Live, Online

Guaranteed to Run

Private Team Training

Enrolling at least 3 people in this course? Consider bringing this (or any course that can be custom designed) to your preferred location as a private team training.

For details, call 1-888-843-8733 or Click Here »




In Classroom or





Private Team Training

Contact Us »

Course Tuition Includes:

After-Course Instructor Coaching
When you return to work, you are entitled to schedule a free coaching session with your instructor for help and guidance as you apply your new skills.

After-Course Computing Sandbox
You'll be given remote access to a preconfigured virtual machine for you to redo your hands-on exercises, develop/test new code, and experiment with the same software used in your course.

Free Course Exam
You can take your Learning Tree course exam on the last day of your course or online at any time after class and receive a Certificate of Achievement with the designation "Awarded with Distinction."


Training Hours

Standard Course Hours: 9:00 am – 4:30 pm
*Informal discussion with instructor about your projects or areas of special interest: 4:30 pm – 5:30 pm

FREE Online Course Exam (if applicable) – Last Day: 3:30 pm – 4:30 pm
By successfully completing your FREE online course exam, you will:

  • Have a record of your growth and learning results
  • Bring proof of your progress back to your organization
  • Earn credits toward industry certifications (if applicable)

Enhance Your Credentials with Professional Certification

Learning Tree's comprehensive training and exam preparation guarantees that you will gain the knowledge and confidence to achieve professional certification and advance your career.

Earn 23 Credits from NASBA

This course qualifies for 23 CPE credits from the National Association of State Boards of Accountancy CPE program. Read more ...

Very easy to follow and understand. Spoke clearly and progressed at an extremely manageable pace.

- W. Huber, Software Engineer
Lockheed Martin

Chat Now

Please Choose a Language

Canada - English

Canada - Français