Lesson 6-AWS SysOps Administrator Associate

EC2 Status Checks & Recovery

Master EC2 status checks to identify hardware and software issues, and configure automated recovery options using CloudWatch Alarms and Auto Scaling.

14 Topics

5 Quiz Questions

70 XP Reward

What You'll Learn

Understanding EC2 Status Checks

EC2 performs automated status checks every minute to identify hardware and software issues with your instances. Understanding these checks is crucial for maintaining high availability and implementing automated recovery. There are three types of status checks, each monitoring different aspects of your instance's health.

Key Point

EC2 performs three types of status checks: System, Instance, and Attached EBS—each requiring different remediation actions.

Vocabulary

Key Terms

Monitors problems with AWS systems that require AWS to repair, such as issues with the underlying host hardware or hypervisor

Monitors software and network configuration issues that you can fix by rebooting or modifying the instance

Monitors the reachability and I/O completion status of EBS volumes attached to the instance

Status Check Types and Resolution

The three status check types and their remediation approaches

Status Checks

EC2 Health Monitoring

System Status

AWS infrastructure issues

Stop & Start

Migrates to new host

Instance Status

Guest OS/config issues

Reboot

Or fix configuration

Attached EBS

Volume health issues

Reboot or Replace

Replace affected volumes

Comparison

Status Check Details

	Check Type	What It Monitors
System Status	Physical host, power, network	Stop and Start (migrates to new host)
Instance Status	Guest OS, memory, networking config	Reboot or fix instance configuration
Attached EBS	EBS volume reachability and I/O	Reboot instance or replace volumes

Stop/Start is different from Reboot—Stop/Start moves the instance to new hardware

Stop vs Reboot

Stop and Start moves your instance to different underlying hardware, which resolves System Status failures. Reboot keeps the instance on the same host, which only helps with Instance Status issues. Know the difference!

Check AWS Health Dashboard

For System Status failures, check the AWS Personal Health Dashboard for any scheduled maintenance or known issues. AWS may have already scheduled maintenance to repair the underlying infrastructure.

CloudWatch Metrics for Status Checks

Status check results are published to CloudWatch as metrics at 1-minute intervals. You can create alarms on these metrics to trigger automated recovery actions or notifications. The key metrics are StatusCheckFailed_System, StatusCheckFailed_Instance, StatusCheckFailed_AttachedEBS, and StatusCheckFailed (combined).

Key Point

Status check metrics are published to CloudWatch every minute and can trigger automated recovery actions.

Status Check CloudWatch Metrics

StatusCheckFailed_System: 1 if system status check failed, 0 if passed
StatusCheckFailed_Instance: 1 if instance status check failed, 0 if passed
StatusCheckFailed_AttachedEBS: 1 if EBS status check failed, 0 if passed
StatusCheckFailed: 1 if any status check failed, 0 if all passed

Automated Recovery Options

There are two main approaches to automated recovery: CloudWatch Alarms with EC2 actions and Auto Scaling Groups. Each has different characteristics and preserves different aspects of your instance configuration.

Key Point

Choose CloudWatch Alarm recovery to preserve IP addresses, or Auto Scaling for more robust instance management.

Comparison

Recovery Options Comparison

	Feature	CloudWatch Alarm Recovery
How It Works	Recovers same instance to new host	Launches new instance
Private IP	Preserved	Not preserved (new instance)
Public IP	Preserved	Not preserved
Elastic IP	Preserved	Not preserved (must reassociate)
Metadata	Preserved	Based on launch template
Placement Group	Preserved	Based on launch template

Set min/max/desired to 1 in ASG for single-instance recovery

Creating a CloudWatch Alarm for Recovery

Select the Metric

In CloudWatch, choose the StatusCheckFailed_System metric for your EC2 instance.

Configure the Alarm

Set the threshold (typically >= 1 for 2 or more consecutive periods) to trigger when the check fails.

Add EC2 Action

Choose 'Recover this instance' as an alarm action. This uses the EC2 recover action.

Add SNS Notification

Optionally add an SNS notification to alert your team when recovery occurs.

Test and Monitor

Monitor the alarm and verify it's in the OK state when your instance is healthy.

CloudWatch Alarm Recovery Flow

How automated recovery works with CloudWatch

EC2 Instance

Monitored instance

CloudWatch

Monitors StatusCheckFailed

Alarm Triggered

Check failed

Alarm Actions

Recover + Notify

Instance Recovered

Migrated to new host

SNS Notification

Team alerted

Recovery Limitations

EC2 instance recovery is only supported for instances using EBS storage (not instance store), with specific instance types, and in a VPC. Check AWS documentation for current limitations.

Reflection

Pause & Ponder

How would you design a high-availability solution using EC2 status checks?

•Consider when CloudWatch Alarm recovery is sufficient vs. when you need Auto Scaling
•Think about applications that require IP address persistence
•How would you handle recovery for stateful applications with data on EBS?

Ready to Start Learning?

Dive deeper into this lesson with our interactive learning experience. Complete the quiz and earn 70 XP!

Start This Lesson

Continue Your Journey

Lesson 1

EC2 Status Checks & Recovery

What You'll Learn

Understanding EC2 Status Checks

Key Terms

Status Check Types and Resolution

Status Checks

System Status

Stop & Start

Instance Status

Reboot

Attached EBS

Reboot or Replace

Status Check Details

Stop vs Reboot

Check AWS Health Dashboard

CloudWatch Metrics for Status Checks

Status Check CloudWatch Metrics

Automated Recovery Options

Recovery Options Comparison

Creating a CloudWatch Alarm for Recovery

Select the Metric

Configure the Alarm

Add EC2 Action

Add SNS Notification

Test and Monitor

CloudWatch Alarm Recovery Flow

EC2 Instance

CloudWatch

Alarm Triggered

Alarm Actions

Instance Recovered

SNS Notification

Recovery Limitations

Pause & Ponder

Ready to Start Learning?

Continue Your Journey

EC2 Instance Type Management

EC2 Placement Groups

EC2 SSH Connectivity & Troubleshooting