Optimizing Python Application Concurrency
Last updated April 16, 2024
Table of Contents
Web applications that process incoming HTTP requests concurrently make more efficient use of dyno resources than web applications that only process one request at a time. We recommend using web servers that support concurrent request processing for developing and running production services.
The Django and Flask web frameworks feature convenient built-in web servers, but these blocking servers only process a single request at a time. If you deploy with one of these servers on Heroku, your dyno resources is underutilized and your application feels unresponsive.
Instead, we recommend that you use either Gunicorn or Uvicorn, which are performant Python HTTP servers for WSGI or ASGI applications. They allow you to run any Python application concurrently by running multiple Python processes within a single dyno.
Read Deploying Python Applications with Gunicorn to learn how to set up Gunicorn for Python on Heroku.
Default Settings and Behavior
If your application uses multiple buildpacks, ensure that the Python buildpack, as the primary language buildpack of your application, executes after other language buildpacks. Otherwise, the app’s WEB_CONCURRENCY
can default to the value set by a buildpack that runs after the Python one.
When booting an application, the Heroku Python buildpack automatically detects the CPU and memory specifications of the current dyno type. It sets the WEB_CONCURRENCY
environment variable to a suitable default value:
Common Runtime
Dyno Type | Default WEB_CONCURRENCY |
---|---|
Eco, Basic, Standard-1X | 2 |
Standard-2X | 4 |
Performance-M | 5 |
Performance-L | 17 |
Performance-L-RAM | 9 |
Performance-XL | 17 |
Performance-2XL | 33 |
Private Spaces and Shield Private Spaces
Dyno Type | Default WEB_CONCURRENCY |
---|---|
Private-S / Shield-S | 4 |
Private-M / Shield-M | 5 |
Private-L / Shield-L | 17 |
Private-L-RAM / Shield-L-RAM | 9 |
Private-XL / Shield-XL | 17 |
Private-2XL / Shield-2XL | 33 |
Both Gunicorn and Uvicorn will automatically use the value set by the WEB_CONCURRENCY
environment variable to control the level of concurrency, or the number of child processes the parent process launches.
If your app uses a different web server, you must configure it to use the value set by WEB_CONCURRENCY
for its number of workers. For example, if the server supports a --workers
CLI argument, add --workers $WEB_CONCURRENCY
to the server command listed under the web
process in your Procfile
.
The default WEB_CONCURRENCY
value is not visible in your app’s config vars, since it is set directly in the dyno environment by the Python buildpack when each dyno boots. This ensures the value adjusts automatically if you change the size of an app’s dyno, or use more than one size of dyno at once.
Tuning the Concurrency Level
Each app has unique memory, CPU, and I/O requirements, so there’s no such thing as a one-size-fits-all scaling solution. The Heroku Python buildpack provides reasonable default concurrency values for each dyno type, but some apps benefit from fine-tuning the concurrency level. See Application Load Testing for guidance on fine-tuning your application.
To manually set the number of child processes running your application, adjust the WEB_CONCURRENCY
environment variable by setting a config var.
For instance, to statically set the number of child processes to 8, use:
$ heroku config:set WEB_CONCURRENCY=8
If you set WEB_CONCURRENCY
to a fixed value, remember to adjust it when you scale to a different dyno type to optimize the use of the available RAM and CPUs on the new dyno type.
For Django-specific recommendations, see Concurrency and Database Connections in Django.