Autoscaling

Last updated February 27, 2025

Configuration
Autoscaling Logic
Monitoring Autoscaling Events
Disabling Autoscaling
Known Issues and Limitations
Scaling Limits
Additional Reading

Heroku’s native Autoscaling feature lets you scale the number of web dynos automatically based on one or more application performance characteristics.

Autoscaling is only available for Performance-, Private-, and Shield-tier dynos. It relies on your application to have very small variance in response time. If it doesn’t meet your needs, or isn’t working as expected for your apps, consider a third-party add-on from the Elements marketplace. Autoscaling is not yet available for Fir-generation apps.

Configuration

You can configure the feature from your app’s Resources tab on the Heroku Dashboard:

Autoscaling configuration setting

Click Enable Autoscaling beside your web dyno details to access the configuration options:
Use the slider or text boxes to specify your app’s minimum and maximum allowed number of autoscaled dynos. The minimum dyno limit can’t be less than 1. The associated costs show changes as you adjust the dyno range.
Next, set your app’s Desired p95 Response Time. The autoscaling engine uses this value to determine how to scale your dyno count.
Enable Email Notifications to notify the app’s collaborators and team members when your web dyno count reaches the range’s upper limit. We send a maximum of one notification email per day.

After configuring, click Confirm. The current dyno count immediately adjusts to conform to the range.

Autoscaling settings setter

Autoscaling Logic

The dyno manager uses your app’s Desired p95 Response Time to determine when to scale your app. The algorithm uses data from the past hour to calculate the minimum number of web dynos required to achieve the desired response time for 95% of incoming requests at your current request throughput. It doesn’t include WebSocket traffic in its calculations.

Every time an autoscaling event occurs, a single web dyno is either added or removed from your app. Autoscaling events always occur at least one minute apart.

The algorithm scales down less aggressively than it scales up. This more gradual scale down protects against a situation where substantial downscaling from a temporary lull in requests results in high latency if demand subsequently spikes upward.

If your app experiences no request throughput for 3 minutes, its web dynos scale down at one-minute intervals until throughput resumes.

Sometimes, downstream bottlenecks cause slow requests, not web resources. In these cases, scaling up the number of web dynos can have a minimal (or even negative) impact on latency. To address these scenarios, if the percentage of failed requests is 20% or more, autoscaling ceases. You can monitor the failed request metric using Threshold Alerting.

Monitoring Autoscaling Events

From the Heroku Dashboard

Autoscaling events appear alongside manual scale events in the Events chart of the Heroku dashboard. In event details, they’re identified by Dyno Autoscaling. In addition, enabling, disabling and changes to autoscaling show.

If a series of autoscaling events occur in a time interval rollup, only the step where the scaling changed direction shows. For example, in the Events chart below Scaled up to 2 of Performance-M is an intermediate step to the peak of 3 Performance-M dynos, and doesn’t show.

Monitoring autoscaling events

With Webhooks

You can subscribe to webhook notifications that send whenever your app’s dyno formation changes. Webhook notifications related to dyno formation have the api:formation type.

Read App Webhooks for more information on subscribing to webhook notifications.

Disabling Autoscaling

Disable autoscaling by clicking the Disable Autoscaling button on your app’s Resources tab. Then, specify a fixed web dyno count and click Confirm.

Manually scaling through the CLI, or otherwise making a call to ps:scale via the API to instruct it to manually scale, such as with a third-party autoscaling tool, disables autoscaling.

Known Issues and Limitations

As with any autoscaling utility, there are certain application health scenarios for which autoscaling doesn’t help. You can also need to tune your Postgres connection pool, worker count, or add-on plan(s) to accommodate changes in web dyno formation. We designed the mechanism to throttle autoscaling based on a request throughput error rate of 20% or more for the scenario where the bottleneck occurs in downstream components. See Scaling Considerations for additional details.

We strongly recommend that you simulate the production experience with load testing, and use Threshold Alerting in conjunction with autoscaling to monitor your app’s end-user experience. See Load Testing Guidelines for Heroku Support notification requirements.

A small number of customers have observed a race condition when using Heroku web autoscaling in conjunction with a third-party worker autoscaling utility like HireFire. This race condition results in the unexpected disabling of Heroku’s autoscaling. To prevent this scenario, don’t use these two autoscaling tools in conjunction.

Autoscaling is not yet supported on Fir-tier dynos. Read more about Fir in Heroku Generations to track its progress or subscribe to our changelog.

Scaling Limits

Different dyno types have different limits to which they can be scaled. See Dyno Scaling and Process Limits to learn about the scaling limits.

Additional Reading

Keep reading

Dyno Management

Categories