AWS CloudWatch allows you to raise alarms when certain values are above or below a given threshold. But what if you want the alarm only when it is between certain thresholds? That’s where metric math comes in.
A Simple Alarm
In CloudFormation, you can define an alarm quite easily:
Type: AWS::CloudWatch::Alarm Properties: ActionsEnabled: true AlarmActions: - arn:aws:sns:eu-west-1:123456789:someSnsTopic AlarmDescription: "Some description" ComparisonOperator: GreaterThanOrEqualToThreshold DatapointsToAlarm: 1 Dimensions: - Name: ApiName Value: "myApi" EvaluationPeriods: 1 MetricName: Count Namespace: AWS/ApiGateway Period: 60 Statistic: Sum Threshold: 5000 TreatMissingData: notBreaching
This will raise an alarm when the amount of requests per minute to the given API goes over (or is equal to) 5000.
What if you want this alarm, but a different alarm with it goes over 10000? If you define the two alarms just like this, both alarms will go off when you have 10000 or more requests per minute. Because 10000 is also more than 5000.
The solution is to use metric math. With this you define an alarm with a list of metrics:
Type: AWS::CloudWatch::Alarm Properties: ActionsEnabled: true AlarmActions: - arn:aws:sns:eu-west-1:123456789:someSnsTopic AlarmDescription: "Some description" ComparisonOperator: GreaterThanThreshold Threshold: 0 DatapointsToAlarm: 1 EvaluationPeriods: 1 Metrics: - Id: "requests_over_5000" Label: "Requests per minute between 5000 and 10000" Expression: "IF(m1 >= 5000 AND m1 < 10000, 1, 0)" # reference the id of the metric below ReturnData: true # true for the one metric that you'll use in your alarm - Id: "m1" Label: "Invocation Count" MetricStat: Metric: Namespace: AWS/ApiGateway MetricName: Count Dimensions: - Name: ApiName Value: "myApi" Period: 60 Stat: Sum ReturnData: false # false for any 'supporting' metric TreatMissingData: notBreaching
We define a metric for the request count just like in a normal alarm. But we put this in the
Metrics list of the alarm. We also set
ReturnData to false and give it an
Id (only numbers, letters and underscores, and it should start with a lowercase).
Then we add another metric to the list. This time we set
ReturnData to true because this is the metric that will be used to evaluate the alarm. Instead of giving it a
MetricStat, we set an
If the count is between 5000 and 10000 we return 1 for this metric (
requests_over_5000). We can reference the id’s of other metrics in our list. In our example, this is
Back to the alarm itself. We define that the alarm should go off when the value is greater than 0. Based on our metric (
requests_over_5000), the value will be either 0 (we’re under 5000 or over 10000) or 1 (we’re between the two values). So it’s sort of a boolean. If it’s 1, the alarm goes off, if it’s 0 it doesn’t.
As a last step, we can add a regular alarm like in our first example to trigger when we have more than 10000 requests.