Get in Touch

Course Outline

Foundations of Cloud Operations on AWS

  • Operational roles and responsibilities in the cloud.
  • AWS account structure, organizations, and multi-account strategy.
  • Core operational services: CloudWatch, CloudTrail, and AWS Config.

Infrastructure as Code and Provisioning

  • Principles of IaC and immutable infrastructure.
  • Provisioning with Terraform and AWS CloudFormation.
  • Managing state, modules, and environment promotion.

CI/CD and Deployment Strategies

  • Designing CI/CD pipelines for cloud-native applications.
  • Blue/green, canary, and rolling deployments.
  • Automating rollback, health checks, and release validation.

Monitoring, Observability, and Alerting

  • Metrics, logs, and traces: shipping, storing, and analyzing data.
  • Utilizing CloudWatch, X-Ray, and third-party observability tools.
  • Defining SLOs/SLIs, alerting policies, and on-call practices.

Security Operations and Identity Management

  • IAM best practices, least privilege principles, and cross-account access.
  • Secrets management, KMS, and secure parameter stores.
  • Operational security: patching strategies, vulnerability scanning, and audit trails.

Resilience, Backup, and Disaster Recovery

  • Designing for fault tolerance and high availability.
  • Backup strategies, snapshot automation, and restore procedures.
  • Disaster recovery planning and runbook creation.

Cost Optimization and Governance

  • Cost visibility: billing, tagging, and cost allocation strategies.
  • Rightsizing, reserved instances/savings plans, and budgeting controls.
  • Governance: policies, guardrails, and automation for compliance.

Containers, Serverless, and Runtime Operations

  • Operational considerations for ECS, EKS, and Lambda.
  • Service discovery, autoscaling, and resource limits.
  • Logging, tracing, and debugging containerized workloads.

Incident Response, Playbooks, and Chaos Engineering

  • Runbook-driven incident response and postmortem practices.
  • Automating remediation and self-healing patterns.
  • Introduction to chaos experiments for validating resilience.

Hands-on Workshop: Operate a Sample Workload

  • Deploy a sample application using IaC and a CI/CD pipeline.
  • Implement monitoring, alerts, and an automated remediation script.
  • Simulate incidents and practice runbook-based response.

Summary and Next Steps

Requirements

  • A fundamental understanding of cloud concepts and networking.
  • Familiarity with the Linux command line and scripting.
  • Experience with source control (Git) and basic CI/CD concepts.

Audience

  • Cloud operations engineers.
  • SREs and platform engineers.
  • DevOps engineers and technical team leads.
 21 Hours

Number of participants


Price per participant

Testimonials (2)

Upcoming Courses

Related Categories