Thank you for sending your enquiry! One of our team members will contact you shortly.
Thank you for sending your booking! One of our team members will contact you shortly.
Course Outline
Section 1: Introduction to Hadoop
- Hadoop history and core concepts
- Ecosystem overview
- Distributions
- High-level architecture
- Common Hadoop myths
- Challenges associated with Hadoop
- Hardware and software considerations
- Lab: Initial exploration of Hadoop
Section 2: HDFS
- Design principles and architecture
- Core concepts (horizontal scaling, replication, data locality, rack awareness)
- Daemons: NameNode, Secondary NameNode, DataNode
- Communications and heartbeat mechanisms
- Data integrity
- Read and write paths
- High Availability (HA) and Federation for NameNode
- Labs: Interacting with HDFS
Section 3: MapReduce
- Concepts and architecture
- Daemons (MRV1): JobTracker and TaskTracker
- Execution phases: Driver, Mapper, Shuffle/Sort, Reducer
- MapReduce Version 1 versus Version 2 (YARN)
- MapReduce internals
- Introduction to Java MapReduce programming
- Labs: Executing a sample MapReduce program
Section 4: Pig
- Pig compared to Java MapReduce
- Pig job flow
- Pig Latin language
- ETL processes with Pig
- Transformations and Joins
- User-defined functions (UDF)
- Labs: Writing Pig scripts for data analysis
Section 5: Hive
- Architecture and design
- Data types
- SQL support within Hive
- Creating Hive tables and executing queries
- Partitions
- Joins
- Text processing capabilities
- Labs: Various exercises on data processing using Hive
Section 6: HBase
- Concepts and architecture
- HBase versus RDBMS versus Cassandra
- HBase Java API
- Handling time series data in HBase
- Schema design
- Labs: Interacting with HBase via the shell; programming with the HBase Java API; Schema design exercise
Requirements
- Proficiency in the Java programming language, as most coding exercises are conducted in Java.
- Familiarity with the Linux environment, including the ability to navigate the command line and edit files using vi or nano.
Lab Environment
Zero Install: Students are not required to install Hadoop software on their personal machines. A functional Hadoop cluster will be provided for use.
Participants will need:
- An SSH client (Linux and Mac systems come with SSH clients by default; PuTTY is recommended for Windows users).
- A web browser to access the cluster, with Firefox being the recommended option.
28 Hours
Testimonials (1)
Hands on exercises. Class should have been 5 days, but the 3 days helped to clear up a lot of questions that I had from working with NiFi already