Autoscaling
Last updated October 28, 2024
Table of Contents
Heroku’s native Autoscaling feature lets you scale the number of web dynos automatically based on one or more application performance characteristics.
Autoscaling is only available for Performance-, Private-, and Shield-tier dynos. It relies on your application to have very small variance in response time. If it doesn’t meet your needs, or isn’t working as expected for your apps, consider a third-party add-on from the Elements marketplace.
Configuration
You can configure the feature from your app’s Resources
tab on the Heroku Dashboard:
Click
Enable Autoscaling
beside your web dyno details to access the configuration options:Use the slider or text boxes to specify your app’s minimum and maximum allowed number of autoscaled dynos. The minimum dyno limit can’t be less than 1. The associated costs show changes as you adjust the dyno range.
Next, set your app’s
Desired p95 Response Time
. The autoscaling engine uses this value to determine how to scale your dyno count.Enable
Email Notifications
to notify the app’s collaborators and team members when your web dyno count reaches the range’s upper limit. We send a maximum of one notification email per day.
After configuring, click Confirm
. The current dyno count immediately adjusts to conform to the range.
Autoscaling Logic
The dyno manager uses your app’s Desired p95 Response Time
to determine when to scale your app. The algorithm uses data from the past hour to calculate the minimum number of web dynos required to achieve the desired response time for 95% of incoming requests at your current request throughput. It doesn’t include WebSocket traffic in its calculations.
Every time an autoscaling event occurs, a single web dyno is either added or removed from your app. Autoscaling events always occur at least one minute apart.
The algorithm scales down less aggressively than it scales up. This more gradual scale down protects against a situation where substantial downscaling from a temporary lull in requests results in high latency if demand subsequently spikes upward.
If your app experiences no request throughput for 3 minutes, its web dynos scale down at one-minute intervals until throughput resumes.
Sometimes, downstream bottlenecks cause slow requests, not web resources. In these cases, scaling up the number of web dynos can have a minimal (or even negative) impact on latency. To address these scenarios, if the percentage of failed requests is 20% or more, autoscaling ceases. You can monitor the failed request metric using Threshold Alerting.
Monitoring Autoscaling Events
From the Heroku Dashboard
Autoscaling events appear alongside manual scale events in the Events chart
of the Heroku dashboard. In event details, they’re identified by Dyno Autoscaling
. In addition, enabling, disabling and changes to autoscaling show.
If a series of autoscaling events occur in a time interval rollup, only the step where the scaling changed direction shows. For example, in the Events chart below Scaled up to 2 of Performance-M
is an intermediate step to the peak of 3 Performance-M dynos, and doesn’t show.
With Webhooks
You can subscribe to webhook notifications that send whenever your app’s dyno formation changes. Webhook notifications related to dyno formation have the api:formation
type.
Read App Webhooks for more information on subscribing to webhook notifications.
Disabling Autoscaling
Disable autoscaling by clicking the Disable Autoscaling
button on your app’s Resources
tab. Then, specify a fixed web dyno count and click Confirm
.
Manually scaling through the CLI, or otherwise making a call to ps:scale
via the API to instruct it to manually scale, such as with a third-party autoscaling tool, disables autoscaling.
Known Issues and Limitations
As with any autoscaling utility, there are certain application health scenarios for which autoscaling doesn’t help. You can also need to tune your Postgres connection pool, worker count, or add-on plan(s) to accommodate changes in web dyno formation. We designed the mechanism to throttle autoscaling based on a request throughput error rate of 20% or more for the scenario where the bottleneck occurs in downstream components. See Scaling Considerations for additional details.
We strongly recommend that you simulate the production experience with load testing, and use Threshold Alerting in conjunction with autoscaling to monitor your app’s end-user experience. See Load Testing Guidelines for Heroku Support notification requirements.
A small number of customers have observed a race condition when using Heroku web autoscaling in conjunction with a third-party worker autoscaling utility like HireFire. This race condition results in the unexpected disabling of Heroku’s autoscaling. To prevent this scenario, don’t use these two autoscaling tools in conjunction.
Scaling Limits
Different dyno types have different limits to which they can be scaled. See Dyno Scaling and Process Limits to learn about the scaling limits.