Table of Contents [expand]
Occasionally, Heroku needs to conduct maintenance on the platform that might cause customer-visible changes or require certain features to be disabled temporarily.
When this happens, we use the Heroku Status site (documentation) to tell you what to expect.
The expected impact of the work, governs the way we provide notifications. This document describes the type of platform work that we carry out, and the impact of that work on notifications that display on the status site.
Types of Work
Work falls into one of these categories:
- Routine updates
- Service updates
- Maintenance windows
- Urgent maintenance
Routine Updates
Routine updates are normal deployments which won’t cause any impact to stable production apps or to the development tools. They happen with no impact to the functionality of the platform or customer applications because of the redundancy that is designed into the platform. The same features of the platform that manage uptime and reliability for customer apps allow us to make most changes without interrupting the day-to-day operations and improvements to the system. Because of this, we don’t provide notice of this work.
Service Updates
Service updates are work which interrupt the functionality of deployment workflows and tools. These changes to the platform affect the availability of the deployment workflow or tools, but they not for apps that are already running. These changes can require the API to be in maintenance mode or interrupt builds in progress. It can also, in some cases, prevent un-idling of single-dyno apps.
When we need to perform a service update, we put a notice on the status site at least 3 business days before the work takes place. This announcement includes the scheduled time of the work and the expected impact. We update the status site again to indicate when the work has begun and when it has ended. If any changes to the work are required, we update the status site accordingly.
Maintenance Windows
Maintenance windows are changes to the platform that affect stable production applications. Work of this type is rare, and we do everything we can to avoid it. When we do need to do it, we take care to schedule this work outside the peak hours for the region it will be performed in.
When we need a maintenance window, we put a notice on the status site at least 5 business days in advance. The announcement includes the scheduled time and expected impact. We update the status site again to indicate when the work has begun and when it has ended. If any changes to the work are required, we update the status site accordingly.
Urgent Updates
Urgent updates are changes to the platform that can fall under any of the other categories that must happen quickly in order to respond to a problem that could affect the health of the platform or the integrity of customer data. Urgent updates are rare and by their nature, difficult to categorize. An example is a response to security issues (like Heartbleed).
When we need to perform an urgent update that can cause development or production impact, we take into account the possible impact, the time of day, and the risks associated with delays when we select a time to do the work. If possible, we provide advance warning on the status site. In all cases, we use the status site to communicate what we’re doing, what impact is possible, and when we’re finished.
Notifications
When we need to perform work that has a development or production impact, we use the status site to notify you in advance by creating an alert message across the top of the status site and sending an email to subscribers.
When we begin work, we update the status site again. An incident is created in the timeline portion of the status site and notify subscribers via email.
If the work is going to take longer than we planned or if we need to provide updates on our progress, we update the status site the same way we do during an incident. When finished, we resolve the incident. Both of these actions create the same notifications that you usually see during an issue.
If something goes wrong during the updates or maintenances work, we change the status site to show a regular issue. At that point, we handle the notifications the way we do for any other issue with the platform.