Release Health Monitoring
During the distribution of a release, it's crucial to ensure its health is continuously monitored. This monitoring process involves observing various metrics tracked by a dedicated monitoring tool.
With Tramline, you can configure health rules to check if these tracked metrics cross certain thresholds.
Tramline polls the latest metrics from your monitoring integration every 5 minutes. If any of the metrics fall outside their configured thresholds, Tramline marks the release as unhealthy and sends alert notifications to your Slack channel(s).
These configured health rules help you give an immediate, at-a-glance understanding of how your release is fairing with the users as it is being rolled out.
You can use this information to make a decision about the rollout increase, decrease, or halt.
Configurationโ
The release health rules can be configured in the Configure section of your train under Release Health.
Rules can be defined for your train at a platform level if you have a cross-platform app (separate rules for iOS and Android).
Each rule has two components:
- Trigger events โ The events that trigger the health of the release to switch from healthy to unhealthy. A release is considered unhealthy of any of the configured metrics cross their threshold values.
- Filter conditions - The conditions that determine when a rule is actionable during the process of the release rollout.
In this example, the Overall Stability
rule will be evaluated when the staged rollout
of the release has crossed 5%
and the adoption rate
of the release is above 20%
. The release will be considered unhealthy if the session stability
is below 99.5%
or if the user stability
is below 99%
.
On every fetch of the metrics, the rules are scanned and if the rules are actionable (filter conditions pass), they are evaluated to determine the health of the release.
If a rule fails, Tramline creates a release health event for that release and notifies the users that the release has become unhealthy.
A subsequent notification is sent to the users when the health status of the release changes from unhealthy to healthy and so on.
If the filter conditions are not matched, the rule is not evaluated and does not impact the health of the release.
Supported Health Metricsโ
Session stability: Percentage of sessions in this release that have seen at least one unhandled event computed as:
(1 - (sessions_with_errors / total_sessions_count)) * 100
User stability: Percentage of user sessions with at least one unhandled event computed with:
(1 - (daily_users_with_errors / total_daily_users)) * 100
Error count: Number of errors that happened in this release
New error count: Number of errors introduced in this release
Supported Filter Conditionsโ
Staged rollout: Percentage of users your release has been rollout out to
Adoption rate: Percentage (proportion) of sessions that a given release has seen over the last 24-hour period