Extracting Business Value From Big Data With Pig and Hive

Level: Intermediate
Rating: 4.8/5 4.81/5 Based on 36 Reviews

This course will teach you to leverage Pig and Hive for big data to prepare & analyze large data sets on Hadoop to make more informed and timely business decisions. You will learn to increase productivity by avoiding low-level Java coding characteristic of MapReduce, and rapidly begin extracting business value for competitive advantage. In this Pig & Hive for Big Data training course, you will learn to gain access to previously inaccessible data, gather and feed data into Hadoop for storage, transform and filter data using Pig, and extract value using Hive and Spark SQL.

Key Features of this Pig and Hive for Big Data Training

  • After-course instructor coaching benefit
  • Learning Tree end-of-course exam included
  • After-course computing sandbox included

You Will Learn How To

  • Manipulate complex data sets stored in Hadoop for competitive advantage
  • Automate the transfer of data into Hadoop storage with Flume and Sqoop
  • Filter data with Extract-Transform-Load (ETL) operations using Pig
  • Query multiple data sets for analysis with Pig and Hive
  • Perform real-time queries on Hadoop data with Tez and Spark SQL

Certifications/Credits:

CPE 23 Credits

Choose the Training Solution That Best Fits Your Individual Needs or Organizational Goals

LIVE, INSTRUCTOR-LED

In Class & Live, Online Training

  • 4-day instructor-led training course
  • After-course instructor coaching benefit
  • Learning Tree end-of-course exam included
  • Earn 23 NASBA credits (live, in-class training only)
View Course Details & Schedule

Standard $2990

Government $2659

RESERVE SEAT

PRODUCT #1254

TRAINING AT YOUR SITE

Team Training

  • Bring this or any training to your organization
  • Full - scale program development
  • Delivered when, where, and how you want it
  • Blended learning models
  • Tailored content
  • Expert team coaching

Customize Your Team Training Experience

CONTACT US

Save More On Training with FlexVouchers – A Unique Training Savings Account

Our FlexVouchers help you lock in your training budgets without having to commit to a traditional 1 voucher = 1 course classroom-only attendance. FlexVouchers expand your purchasing power to modern blended solutions and services that are completely customizable. For details, please call 888-843-8733 or chat live.

In Class & Live, Online Training

Time Zone Legend:
Eastern Time Zone Central Time Zone
Mountain Time Zone Pacific Time Zone

Note: This course runs for 4 Days

  • Dec 3 - 6 9:00 AM - 4:30 PM EST Greenbelt,MD / Online (AnyWare) Greenbelt,MD / Online (AnyWare) Reserve Your Seat

  • Jan 21 - 24 9:00 AM - 4:30 PM EST Alexandria, VA / Online (AnyWare) Alexandria, VA / Online (AnyWare) Reserve Your Seat

  • Feb 18 - 21 9:00 AM - 4:30 PM EST Greenbelt,MD / Online (AnyWare) Greenbelt,MD / Online (AnyWare) Reserve Your Seat

  • Mar 24 - 27 9:00 AM - 4:30 PM EDT Herndon, VA / Online (AnyWare) Herndon, VA / Online (AnyWare) Reserve Your Seat

  • May 5 - 8 9:00 AM - 4:30 PM EDT Greenbelt,MD / Online (AnyWare) Greenbelt,MD / Online (AnyWare) Reserve Your Seat

  • Jul 21 - 24 9:00 AM - 4:30 PM EDT Alexandria, VA / Online (AnyWare) Alexandria, VA / Online (AnyWare) Reserve Your Seat

  • Sep 1 - 4 9:00 AM - 4:30 PM EDT Greenbelt,MD / Online (AnyWare) Greenbelt,MD / Online (AnyWare) Reserve Your Seat

  • Sep 22 - 25 9:00 AM - 4:30 PM EDT Herndon, VA / Online (AnyWare) Herndon, VA / Online (AnyWare) Reserve Your Seat

Guaranteed to Run

When you see the "Guaranteed to Run" icon next to a course event, you can rest assured that your course event — date, time, location — will run. Guaranteed.

Pig and Hive Course Information

  • Recommended Experience

    • Knowledge of databases and SQL

Pig and Hive Course Outline

  • The Hadoop Ecosystem

    • Hadoop overview
    • Surveying the Hadoop components
    • Defining the Hadoop architecture
  • Exploring HDFS and MapReduce

    Storing data in HDFS

    • Achieving reliable and secure storage
    • Monitoring storage metrics
    • Controlling HDFS from the Command Line

    Parallel processing with MapReduce

    • Detailing the MapReduce approach
    • Transferring algorithms not data
    • Dissecting the key stages of a MapReduce job

    Automating data transfer

    • Facilitating data Ingress and Egress
    • Aggregating data with Flume
    • Configuring data fan in and fan out
    • Moving relational data with Sqoop
  • Executing Data Flows with Pig

    Describing characteristics of Apache Pig

    • Contrasting Pig with MapReduce
    • Identifying Pig use cases
    • Pinpointing key Pig configurations

    Structuring unstructured data

    • Representing data in Pig's data model
    • Running Pig Latin commands at the Grunt Shell
    • Expressing transformations in Pig Latin Syntax
    • Invoking Load and Store functions
  • Performing ETL with Pig

    Transforming data with Relational Operators

    • Creating new relations with joins
    • Reducing data size by sampling
    • Extending Pig with user–defined functions

    Filtering data with Pig

    • Consolidating data sets with unions
    • Partitioning data sets with splits
    • Injecting parameters into Pig scripts
  • Manipulating Data with Hive

    Leveraging business advantages of Hive

    • Factoring Hive into components
    • Imposing structure on data with Hive

    Organizing data in Hive Data Warehouse

    • Creating Hive databases and tables
    • Contrasting available data types in Hive
    • Loading and storing data efficiently with SerDes

    Designing data layout for maximum performance

    • Populating tables from queries
    • Partitioning Hive Tables for optimal queries
    • Composing HiveQL queries
  • Extracting Business Value with HiveQL

    Performing joins on unstructured data

    • Distinguishing joins available in Hive
    • Optimizing join structure for performance

    Pushing HiveQL to the limit

    • Sorting, distributing and clustering data
    • Reducing query complexity with views
    • Improving query performance with indexes

    Deploying Hive in production

    • Designing Hive schemas
    • Setting up data compression
    • Debugging Hive scripts

    Streamlining storage management with HCatalog

    • Unifying the data view with HCatalog
    • Leveraging HCatalog to access the Hive metastore
    • Communicating via the HCatalog interfaces
    • Populating a Hive table from Pig
  • Interacting with Hadoop Data in Real Time

    • Performing low-latency queries with Impala
    • Leveraging the Tez execution engine to improve performance
    • Reducing data  access time with Spark SQL

Team Training

Pig and Hive for Big Data Training FAQs

  • What do Pig and Hive contribute to Big Data Processing?

    Hadoop programming at the low level is done in Java. Pig and Hive provide ease of programming by allowing the programmer to write scripts in a simpler language, Pig Latin or HiveQL. Those scripts are compiled and optimized internally and equivalent Java code generated and executed without the programmer having to write the Java code.

  • What is Pig in big data?

    Apache Pig is a platform for analyzing large data sets. Programs are written in a high-level, Pig Latin. They are converted by Pig's infrastructure into sequences of Java MapReduce programs which are then executed on Hadoop. Without writing Java one can use Pig to leverage Hadoop's ability to process data in parallel. 

  • What is Hive in big data?

    Apache Hive is data warehouse software that translates commands written in a SQL-like language, HiveQL, into Hadoop MapReduce jobs that are then executed on Hadoop. Without writing Java one can use Hive to leverage Hadoop's ability to process data in parallel. 

  • When should one use Pig and when should one use Hive?

    Pig is typically used early in the data pipeline to clean and structure data. Hive is typically used later when there is structure and well-defined fields. Since Hive has the concepts of tables, rows and columns it integrates easily with BI tools.

  • Can I learn to use Pig and Hive to extract value from big data online?

    Yes! We know your busy work schedule may prevent you from getting to one of our classrooms which is why we offer convenient online training to meet your needs wherever you want, including online training.

Questions about which training is right for you?

call 888-843-8733
chat Live Chat




100% Satisfaction Guaranteed

Your Training Comes with a 100% Satisfaction Guarantee!*

  • If you are not 100 % satisfied, you pay no tuition!
  • No advance payment required for most products.
  • Tuition can be paid later by invoice - OR - at the time of checkout by credit card.

*Partner-delivered courses may have different terms that apply. Ask for details.

Greenbelt,MD / Online (AnyWare)
Alexandria, VA / Online (AnyWare)
Greenbelt,MD / Online (AnyWare)
Herndon, VA / Online (AnyWare)
Greenbelt,MD / Online (AnyWare)
Alexandria, VA / Online (AnyWare)
Greenbelt,MD / Online (AnyWare)
Herndon, VA / Online (AnyWare)
Preferred method of contact:
Chat Now

Please Choose a Language

Canada - English

Canada - Français