A CloudOps engineer needs to set up alerting and remediation for a web application. The application consists of Amazon EC2 instances that have AWS Systems Manager Agent (SSM Agent) installed. Each EC2 instance runs a custom web server. The EC2 instances run behind a load balancer and write logs locally.
The CloudOps engineer must implement a solution that restarts the web server software automatically if specific web errors are detected in the logs.
Which combination of steps will meet these requirements? (Select THREE.)
A. Install the Amazon CloudWatch agent on the EC2 instances.
B. Create an AWS CloudTrail metric filter for the web logs. Configure an alarm for the specific errors.
C. Create an Amazon CloudWatch metric filter for the web logs. Configure an alarm for the specific errors.
D. Publish alarm findings to Amazon Simple Email Service (Amazon SES). Invoke an AWS Lambda function to restart the web server software.
E. Create an Amazon EventBridge rule that responds to the alarm. Configure the rule to invoke an AWS Systems Manager Automation runbook to restart the web server software.
F. Create an Amazon Simple Notification Service (Amazon SNS) notification that responds to the alarm. Configure the notification to invoke an AWS Systems Manager Automation runbook to restart the web server software.
Explanation:
Per the AWS Cloud Operations, Monitoring, and Automation documentation, the correct workflow for automated operational remediation is:
Amazon CloudWatch Agent is installed on each EC2 instance (Option A) to collect local log data and push it to Amazon CloudWatch Logs.
A CloudWatch Metric Filter (Option C) is then defined to identify specific error strings or patterns within those logs (e.g., “HTTP 5xx” or “Service Unavailable”). When such an event occurs, CloudWatch Alarms are triggered.
Upon alarm activation, Amazon EventBridge rules (Option E) are configured to respond automatically by invoking an AWS Systems Manager Automation runbook, which executes an action to restart the web server process on the affected instance via SSM Agent.
This approach aligns directly with AWS’s recommended CloudOps remediation pattern, known as event-driven automation, which ensures minimal downtime and eliminates manual intervention.
Options involving CloudTrail (B) or SES notifications (D) are incorrect because they are unrelated to
log-based application monitoring and automated remediation workflows.
Reference: AWS Cloud Operations & Systems Manager Guide C Section: Automated Remediation using CloudWatch, EventBridge, and Systems Manager Automation