Heroku Postgres Logical Backups
Last updated October 16, 2024
Table of Contents
- How Logical Backups Work
- The Performance Impact of Logical Backups
- Capturing Logical Backups on Small Databases with Heroku PGBackups
- Capturing Logical Backups on Larger Databases
- Capturing Logical Backups for Storage in Another Region
- Restoring Logical Backups
- Direct Transfer of Logical Backups Between Heroku Postgres Databases
Capturing a logical backup creates a single snapshot file from your Heroku Postgres database. While all Heroku Postgres databases are protected through continuous physical backups, logical backups are more flexible for data portability.
Unlike physical backups, logical backups aren’t captured automatically because they share database resources with application code. This article covers how to perform logical backups with minimal effect on your database performance.
How Logical Backups Work
Logical backups are captured using pg_dump
. You can restore backups with pg_restore
or psql
, depending on the dump file format.
pg_dump
uses a Postgres connection to run a series of SQL COPY
statements in a single transaction to produce a consistent snapshot across the database. The file that pg_dump
produces is much smaller than the size of the database. It contains only the live dataset and instructions on how to remake indexes, but not the indexes themselves.
Dumps in script format are plain-text files of SQL commands used to reconstruct the database at the time of capture by restoring through psql
.
Archive file formats are more flexible and performant than script formats. The custom format allows for compression as well as opportunistic reordering of all archived items. You can use pg_restore
to restore archive file formats. This article focuses on backups that are in an archived, compressed format.
The Performance Impact of Logical Backups
The effects of pg_dump
grow worse with the size of your database. Balancing your database’s performance with your backup requirements is necessary to avoid unpleasant surprises.
Running pg_dump
causes many effects on your database:
- It consumes as much filesystem I/O as available and contends for resources with concurrently running queries.
- It impedes other operations that require an exclusive lock, such as schema changes.
VACUUM
and other automatic maintenance processes are paused since long-running transactions prevent autovacuum from freezing rows visible to those transactions.- It evicts cached data from OS and filesystem caches.
Capturing Logical Backups on Small Databases with Heroku PGBackups
Heroku Postgres comes with a pg:backups
command that allows you to manually capture or schedule logical backups via the CLI. It performs a pg_dump
behind the scenes resulting in an archive file that can be restored with pg_restore
. All backups captured via pg:backups
are stored in the U.S. regardless of where your database is located.
Use Heroku PGBackups only on databases up to 20 GB. Contention for the I/O, memory, and CPU needed for backing up a larger database can be prohibitive and cause your backup capture to end prematurely.
For more info, see Heroku PGBackups.
Capturing Logical Backups on Larger Databases
Frequent logical backups of large databases under moderate CPU load can be slow, degrade the performance of other queries, and prevent necessary maintenance operations from running.
For databases over 20 GB, take logical backups against a short-lived fork of your Heroku Postgres database. Capturing a logical backup from a fork preserves the original database’s performance and allows the backup operation to consume as many resources as necessary for the dump to succeed.
Create a short-lived fork of your Heroku Postgres database.
Create a script to run the
pg_dump
command and transfer the backup file to the destination of your choice. The followingpg_dump
command creates a backup file that is the same format as the Heroku PGBackups output.
pg_dump -F c --no-acl --no-owner --quote-all-identifiers $FORKED_DATABASE_URL
Ensure that the script includes the transfer of your backup file after the dump is complete and that the pg_dump
command points to your forked database. You can optionally use the --jobs
flag to parallelize the dump operation. You can also adjust the amount of compression with the --compress
flag. See more configuration options in the official Postgres docs.
Upload the file to your app so that it’s accessible to your one-off dynos.
Run the script in a detached one-off dyno in the same region as the database. A Performance-L, Private-L , or Shield-L dyno is recommended to provide sufficient compute power. For example
heroku run:detached --app example-app-name --size=performance-L script.sh
Some databases are too large for the execution and transfer of your backup file to complete within the 24-hour period limit of a one-off dyno. For these databases, you can retrieve your Postgres connection string to run the script outside of Heroku. If you’re using a Shield or Private database, you must also use PrivateLink or mTLS to facilitate the external connection.
Capturing Logical Backups for Storage in Another Region
All backups captured by Heroku PGBackups are stored in the U.S. You can perform pg_dump
yourself to capture and store backups in another location. Follow the instructions in the Capturing Logical Backups on Larger Databases section.
Restoring Logical Backups
Be aware that the target database is deleted before the restore process occurs.
The heroku pg:backups:restore
command runs the pg_restore
application and takes any dump file that is accessible to a dyno running on Heroku and restores it to your Heroku Postgres database.
pg_restore
uses a Postgres connection to load a dump into a Postgres database. The restore creates the necessary schema, loads data through COPY
commands, adds constraints, and creates indexes and triggers dumped from the source database. These operations can be slow as each requires disk I/O and computation in order to write, process, and alter the restored data for normal operation.
Certain parts of the restore can run over parallel connections. If you run pg_restore
on a one-off dyno using the pg_restore
command instead of heroku pg:backups:restore
, you can run parallel connections by including the --jobs
flag. Because each job is one process, it’s not recommended to run more than one job per vCPU of your database instance. For example, a Standard-4 database has 4 vCPUs, so run pg_restore
with a maximum of 4 jobs to avoid resource contention and decreased performance.
Direct Transfer of Logical Backups Between Heroku Postgres Databases
heroku pg:copy
uses a dedicated dyno to pipe the output of pg_dump
directly to pg_restore
, removing the need to transfer the dump file to a location external to Heroku. The data transfer is also piped through pv
(pipeviewer), to provide more visibility into the transfer. See more info on pg:copy
in the Upgrading Heroku Postgres Databases article.
The pg:copy
method requires approximately 3 minutes of app downtime per GB of your database. For large databases, it’s recommended that you follow the instructions in Capturing Logical Backups on Larger Databases instead.
Creating indexes, adding constraints, and other checks at the end of the pg:copy
run aren’t be reflected in pv
‘s logging. Although a transfer can appear to hang, it’s working in the background.