Hadoop For Administrators Training Course

Course Code

hadoopadm1

Duration

21 hours (usually 3 days including breaks)

Requirements

  • comfortable with basic Linux system administration
  • basic scripting skills

Knowledge of Hadoop and Distributed Computing is not required, but will be introduced and explained in the course.

Lab environment

Zero Install : There is no need to install hadoop software on students’ machines! A working hadoop cluster will be provided for students.

Students will need the following

Overview

Το Apache Hadoop είναι το πιο δημοφιλές πλαίσιο για την επεξεργασία Big Data σε συμπλέγματα διακομιστών. Σε αυτά τα τρία (προαιρετικά, τέσσερα) ημερήσια μαθήματα, οι συμμετέχοντες θα ενημερωθούν για τα επιχειρηματικά οφέλη και τις περιπτώσεις χρήσης του Hadoop και του οικοσυστήματός του, πώς να σχεδιάσουν την ανάπτυξη και ανάπτυξη συμπλεγμάτων, πώς να εγκαταστήσετε, να διατηρήσετε, να παρακολουθήσετε, να αντιμετωπίσετε και να βελτιστοποιήσετε τον Hadoop . Επίσης, θα ασκήσουν το φορτίο μαζικών δεδομένων συμπλεγμάτων, θα εξοικειωθούν με διάφορες κατανομές του Hadoop και θα εγκαταστήσουν και θα διαχειριστούν εργαλεία οικολογικού συστήματος Hadoop . Το μάθημα ολοκληρώνεται με συζήτηση για την εξασφάλιση συμπλέγματος με τον Kerberos.

"... Τα υλικά ήταν πολύ καλά προετοιμασμένα και καλυμμένα καλά. Το Εργαστήριο ήταν πολύ εξυπηρετικό και καλά οργανωμένο "
- Andrew Nguyen, κύριος μηχανικός ολοκλήρωσης DW, Microsoft Online Advertising

Κοινό

Διαχειριστές του Hadoop

Μορφή

Διαλέξεις και πρακτικά εργαστήρια, κατά προσέγγιση ισορροπία 60% διαλέξεις, 40% εργαστήρια.

Machine Translated

Course Outline

  • Introduction
    • Hadoop history, concepts
    • Ecosystem
    • Distributions
    • High level architecture
    • Hadoop myths
    • Hadoop challenges (hardware / software)
    • Labs: discuss your Big Data projects and problems
  • Planning and installation
    • Selecting software, Hadoop distributions
    • Sizing the cluster, planning for growth
    • Selecting hardware and network
    • Rack topology
    • Installation
    • Multi-tenancy
    • Directory structure, logs
    • Benchmarking
    • Labs: cluster install, run performance benchmarks
  • HDFS operations
    • Concepts (horizontal scaling, replication, data locality, rack awareness)
    • Nodes and daemons (NameNode, Secondary NameNode, HA Standby NameNode, DataNode)
    • Health monitoring
    • Command-line and browser-based administration
    • Adding storage, replacing defective drives
    • Labs: getting familiar with HDFS command lines
  • Data ingestion
    • Flume for logs and other data ingestion into HDFS
    • Sqoop for importing from SQL databases to HDFS, as well as exporting back to SQL
    • Hadoop data warehousing with Hive
    • Copying data between clusters (distcp)
    • Using S3 as complementary to HDFS
    • Data ingestion best practices and architectures
    • Labs: setting up and using Flume, the same for Sqoop
  • MapReduce operations and administration
    • Parallel computing before mapreduce: compare HPC vs Hadoop administration
    • MapReduce cluster loads
    • Nodes and Daemons (JobTracker, TaskTracker)
    • MapReduce UI walk through
    • Mapreduce configuration
    • Job config
    • Optimizing MapReduce
    • Fool-proofing MR: what to tell your programmers
    • Labs: running MapReduce examples
  • YARN: new architecture and new capabilities
    • YARN design goals and implementation architecture
    • New actors: ResourceManager, NodeManager, Application Master
    • Installing YARN
    • Job scheduling under YARN
    • Labs: investigate job scheduling
  • Advanced topics
    • Hardware monitoring
    • Cluster monitoring
    • Adding and removing servers, upgrading Hadoop
    • Backup, recovery and business continuity planning
    • Oozie job workflows
    • Hadoop high availability (HA)
    • Hadoop Federation
    • Securing your cluster with Kerberos
    • Labs: set up monitoring
  • Optional tracks
    • Cloudera Manager for cluster administration, monitoring, and routine tasks; installation, use. In this track, all exercises and labs are performed within the Cloudera distribution environment (CDH5)
    • Ambari for cluster administration, monitoring, and routine tasks; installation, use. In this track, all exercises and labs are performed within the Ambari cluster manager and Hortonworks Data Platform (HDP 2.0)

Testimonials

★★★★★
★★★★★

Related Categories

Course Discounts

Course Discounts Newsletter

We respect the privacy of your email address. We will not pass on or sell your address to others.
You can always change your preferences or unsubscribe completely.

Some of our clients

is growing fast!

We are looking to expand our presence in Greece!

As a Business Development Manager you will:

  • expand business in Greece
  • recruit local talent (sales, agents, trainers, consultants)
  • recruit local trainers and consultants

We offer:

  • Artificial Intelligence and Big Data systems to support your local operation
  • high-tech automation
  • continuously upgraded course catalogue and content
  • good fun in international team

If you are interested in running a high-tech, high-quality training and consulting business.

Apply now!