AWS CloudWatch allows you to raise alarms when certain values are above or below a given threshold. But what if you want the alarm only when it is between certain thresholds? That’s where metric math comes in.
A Simple Alarm
In CloudFormation, you can define an alarm quite easily:
Type: AWS::CloudWatch::Alarm
Properties:
ActionsEnabled: true
AlarmActions:
- arn:aws:sns:eu-west-1:123456789:someSnsTopic
AlarmDescription: "Some description"
ComparisonOperator: GreaterThanOrEqualToThreshold
DatapointsToAlarm: 1
Dimensions:
- Name: ApiName
Value: "myApi"
EvaluationPeriods: 1
MetricName: Count
Namespace: AWS/ApiGateway
Period: 60
Statistic: Sum
Threshold: 5000
TreatMissingData: notBreaching
This will raise an alarm when the amount of requests per minute to the given API goes over (or is equal to) 5000.
What if you want this alarm, but a different alarm with it goes over 10000? If you define the two alarms just like this, both alarms will go off when you have 10000 or more requests per minute. Because 10000 is also more than 5000.
Metric Math
The solution is to use metric math. With this you define an alarm with a list of metrics:
Type: AWS::CloudWatch::Alarm
Properties:
ActionsEnabled: true
AlarmActions:
- arn:aws:sns:eu-west-1:123456789:someSnsTopic
AlarmDescription: "Some description"
ComparisonOperator: GreaterThanThreshold
Threshold: 0
DatapointsToAlarm: 1
EvaluationPeriods: 1
Metrics:
- Id: "requests_over_5000"
Label: "Requests per minute between 5000 and 10000"
Expression: "IF(m1 >= 5000 AND m1 < 10000, 1, 0)" # reference the id of the metric below
ReturnData: true # true for the one metric that you'll use in your alarm
- Id: "m1"
Label: "Invocation Count"
MetricStat:
Metric:
Namespace: AWS/ApiGateway
MetricName: Count
Dimensions:
- Name: ApiName
Value: "myApi"
Period: 60
Stat: Sum
ReturnData: false # false for any 'supporting' metric
TreatMissingData: notBreaching
We define a metric for the request count just like in a normal alarm. But we put this in the Metrics
list of the alarm. We also set ReturnData
to false and give it an Id
(only numbers, letters and underscores, and it should start with a lowercase).
Then we add another metric to the list. This time we set ReturnData
to true because this is the metric that will be used to evaluate the alarm. Instead of giving it a MetricStat
, we set an Expression
.
If the count is between 5000 and 10000 we return 1 for this metric (requests_over_5000
). We can reference the id’s of other metrics in our list. In our example, this is m1
.
Back to the alarm itself. We define that the alarm should go off when the value is greater than 0. Based on our metric (requests_over_5000
), the value will be either 0 (we’re under 5000 or over 10000) or 1 (we’re between the two values). So it’s sort of a boolean. If it’s 1, the alarm goes off, if it’s 0 it doesn’t.
As a last step, we can add a regular alarm like in our first example to trigger when we have more than 10000 requests.