Big Data

Preferred method of contact:

Hadoop Programming with Java for Big Data Solutions



Course Number



4 Days

View Schedule

The availability of large data sets presents new opportunities and challenges to organizations of all sizes. In this course, you will implement a strategy for developing Hadoop jobs and extracting business value from large and varied data sets. This Apache Hadoop development training is essential for programmers who want to augment their programming skills to use Hadoop for a variety of big data solutions.

You Will Learn How To

  • Write, customize, and deploy Java MapReduce jobs to summarize data
  • Develop Hive and Pig queries to simplify data analysis
  • Test and debug jobs using MRUnit
  • Monitor task execution and cluster health

Important Course Information


  • Java experience at the level of:
    • Course 471, Java Programming Introduction, or at least six months of Java programming experience

Course Outline

  • Introduction to Hadoop
  • Identifying the business benefits of Hadoop
  • Surveying the Hadoop ecosystem
  • Selecting a suitable distribution
  • Parallelizing Program Execution

Meeting the challenges of parallel programming

  • Investigating parallelisable challenges: algorithms, data and information exchange
  • Estimating the storage and complexity of Big Data

Parallel programming with MapReduce

  • Dividing and conquering large-scale problems
  • Uncovering jobs suitable for MapReduce
  • Solving typical business problems
  • Implementing Real-World MapReduce Jobs

Applying the Hadoop MapReduce paradigm

  • Configuring the development environment
  • Exploring the Hadoop distribution
  • Creating the components of MapReduce jobs
  • Introducing the Hadoop daemons
  • Analyzing the stages of MapReduce processing: splitting, mapping, shuffling and reducing

Building complex MapReduce jobs

  • Selecting and employing multiple mappers and reducers
  • Leveraging built-in mappers, reducers and partitioners
  • Analyzing time series data with secondary sort
  • Streaming tasks through various programming languages
  • Customizing MapReduce

Solving common data manipulation problems

  • Executing algorithms: parallel sorts, joins and searches
  • Analyzing log files, social media data and e-mails

Implementing partitioners and comparators

  • Identifying network-bound, CPU-bound and disk I/O-bound parallel algorithms
  • Dividing the workload efficiently using partitioners
  • Controlling grouping and sort order with comparators
  • Collecting metrics with counters
  • Persisting Big Data with Distributed Data Stores

Making the case for distributed data

  • Achieving high performance data throughput
  • Recovering from media failure through redundancy

Interfacing with Hadoop Distributed File System (HDFS)

  • Breaking down the structure and organization of HDFS
  • Loading raw data and retrieving results
  • Reading and writing data programmatically
  • Manipulating Hadoop SequenceFile types
  • Sharing reference data with DistributedCache

Structuring data with HBase

  • Migrating from structured to unstructured storage
  • Applying NoSQL concepts with schema on read
  • Connecting to HBase from MapReduce jobs
  • Comparing HBase to other types of NoSQL data stores
  • Simplifying Data Analysis with Query Languages

Unleashing the power of SQL with Hive

  • Structuring databases, tables, views and partitions
  • Integrating MapReduce jobs with Hive queries
  • Querying with HiveQL
  • Accessing Hive servers through JDBC
  • Extending HiveQL with User-Defined Functions (UDF)

Executing workflows with Pig

  • Developing Pig Latin scripts to consolidate workflows
  • Integrating Pig queries with Java
  • Interacting with data through the grunt console
  • Extending Pig with User-Defined Functions (UDF)
  • Managing and Deploying Big Data Solutions

Testing and debugging Hadoop code

  • Logging significant events for auditing and debugging
  • Debugging in local mode
  • Validating requirements with MRUnit

Deploying, monitoring and tuning performance

  • Deploying to a production cluster
  • Optimizing performance with administrative tools
  • Monitoring job execution through web user interfaces
Show complete outline
Show Less

Course Schedule

Attend this live, instructor-led course In-Class or Online via AnyWare.

Hassle-Free Enrollment: No advance payment required.
Tuition due 30 days after your course.

May 23 - 26 Toronto/AnyWare Enroll Now

How would you like to attend?

Live, Online via AnyWare

Aug 15 - 18 Herndon, VA/AnyWare Enroll Now

How would you like to attend?

Live, Online via AnyWare

Oct 3 - 6 New York/AnyWare Enroll Now

How would you like to attend?

Live, Online via AnyWare

Oct 10 - 13 Toronto/AnyWare Enroll Now

How would you like to attend?

Live, Online via AnyWare

Nov 6 - 9 AnyWare Enroll Now

How would you like to attend?

Live, Online via AnyWare

Feb 13 - 16 Herndon, VA/AnyWare Enroll Now

How would you like to attend?

Live, Online via AnyWare

Guaranteed to Run

Bring this Course to Your Organization and Train Your Entire Team
For more information, call 1-888-843-8733 or click here






Course Tuition Includes:

After-Course Instructor Coaching
When you return to work, you are entitled to schedule a free coaching session with your instructor for help and guidance as you apply your new skills.

After-Course Computing Sandbox
You'll be given remote access to a preconfigured virtual machine for you to redo your hands-on exercises, develop/test new code, and experiment with the same software used in your course.

Free Course Exam
You can take your course exam on the last day of your course and receive a Certificate of Achievement with the designation "Awarded with Distinction."



Call 1-888-843-8733 or click here »

An experienced training advisor will happily answer any questions you may have and alert you to any tuition savings to
which you or your organization may be entitled.

Training Hours

Standard Course Hours: 9:00 am – 4:30 pm
*Informal discussion with instructor about your projects or areas of special interest: 4:30 pm – 5:30 pm

FREE Online Course Exam (if applicable) – Last Day: 3:30 pm – 4:30 pm
By successfully completing your FREE online course exam, you will:

  • Have a record of your growth and learning results.
  • Bring proof of your progress back to your organization
  • Earn credits toward industry certifications (if applicable)
  • Make progress toward one or more Learning Tree Specialist & Expert Certifications (if applicable)

Enhance Your Credentials with Professional Certification

Learning Tree's comprehensive training and exam preparation guarantees that you will gain the knowledge and confidence to achieve professional certification and advance your career.

Earn 23 Credits from NASBA

This course qualifies for 23 CPE credits from the National Association of State Boards of Accountancy CPE program. Read more ...

- ,


Please Choose a Language

Canada - English

Canada - Français