Optimizing Python Application Concurrency

Last updated December 03, 2024

Default Settings and Behavior
Tuning the Concurrency Level

Web applications that process incoming HTTP requests concurrently make more efficient use of dyno resources than web applications that only process one request at a time. We recommend using web servers that support concurrent request processing for developing and running production services.

The Django and Flask web frameworks feature convenient built-in web servers, but these blocking servers only process a single request at a time. If you deploy with one of these servers on Heroku, your dyno resources is underutilized and your application feels unresponsive.

Instead, we recommend that you use either Gunicorn or Uvicorn, which are performant Python HTTP servers for WSGI or ASGI applications. They allow you to run any Python application concurrently by running multiple Python processes within a single dyno.

Read Deploying Python Applications with Gunicorn to learn how to set up Gunicorn for Python on Heroku.

Default Settings and Behavior

If your application uses multiple buildpacks, ensure that the Python buildpack, as the primary language buildpack of your application, executes after other language buildpacks. Otherwise, the app’s WEB_CONCURRENCY can default to the value set by a buildpack that runs after the Python one.

When booting an application, the Heroku Python buildpack automatically detects the CPU and memory specifications of the current dyno type. It sets the WEB_CONCURRENCY environment variable to a suitable default value:

Common Runtime

Dyno Type	Default `WEB_CONCURRENCY`
Eco, Basic, Standard-1X	2
Standard-2X	4
Performance-M	5
Performance-L	17
Performance-L-RAM	9
Performance-XL	17
Performance-2XL	33

Private Spaces and Shield Private Spaces

We don’t set a WEB_CONCURRENCY environment variable for Fir dynos.

Dyno Type	Default `WEB_CONCURRENCY`
Private-S / Shield-S	4
Private-M / Shield-M	5
Private-L / Shield-L	17
Private-L-RAM / Shield-L-RAM	9
Private-XL / Shield-XL	17
Private-2XL / Shield-2XL	33

Both Gunicorn and Uvicorn will automatically use the value set by the WEB_CONCURRENCY environment variable to control the level of concurrency, or the number of child processes the parent process launches.

If your app uses a different web server, you must configure it to use the value set by WEB_CONCURRENCY for its number of workers. For example, if the server supports a --workers CLI argument, add --workers $WEB_CONCURRENCY to the server command listed under the web process in your Procfile.

The default WEB_CONCURRENCY value is not visible in your app’s config vars, since it is set directly in the dyno environment by the Python buildpack when each dyno boots. This ensures the value adjusts automatically if you change the size of an app’s dyno, or use more than one size of dyno at once.

Tuning the Concurrency Level

Each app has unique memory, CPU, and I/O requirements, so there’s no such thing as a one-size-fits-all scaling solution. The Heroku Python buildpack provides reasonable default concurrency values for each dyno type, but some apps benefit from fine-tuning the concurrency level. See Application Load Testing for guidance on fine-tuning your application.

To manually set the number of child processes running your application, adjust the WEB_CONCURRENCY environment variable by setting a config var.

For instance, to statically set the number of child processes to 8, use:

$ heroku config:set WEB_CONCURRENCY=8

If you set WEB_CONCURRENCY to a fixed value, remember to adjust it when you scale to a different dyno type to optimize the use of the available RAM and CPUs on the new dyno type.

For Django-specific recommendations, see Concurrency and Database Connections in Django.

Keep reading

Python

Categories