Heroku Postgres Logical Backups

Last updated October 16, 2024

How Logical Backups Work
The Performance Impact of Logical Backups
Capturing Logical Backups on Small Databases with Heroku PGBackups
Capturing Logical Backups on Larger Databases
Capturing Logical Backups for Storage in Another Region
Restoring Logical Backups
Direct Transfer of Logical Backups Between Heroku Postgres Databases

Capturing a logical backup creates a single snapshot file from your Heroku Postgres database. While all Heroku Postgres databases are protected through continuous physical backups, logical backups are more flexible for data portability.

Unlike physical backups, logical backups aren’t captured automatically because they share database resources with application code. This article covers how to perform logical backups with minimal effect on your database performance.

How Logical Backups Work

Logical backups are captured using pg_dump. You can restore backups with pg_restore or psql, depending on the dump file format.

pg_dump uses a Postgres connection to run a series of SQL COPY statements in a single transaction to produce a consistent snapshot across the database. The file that pg_dump produces is much smaller than the size of the database. It contains only the live dataset and instructions on how to remake indexes, but not the indexes themselves.

Dumps in script format are plain-text files of SQL commands used to reconstruct the database at the time of capture by restoring through psql.

Archive file formats are more flexible and performant than script formats. The custom format allows for compression as well as opportunistic reordering of all archived items. You can use pg_restore to restore archive file formats. This article focuses on backups that are in an archived, compressed format.

The Performance Impact of Logical Backups

The effects of pg_dump grow worse with the size of your database. Balancing your database’s performance with your backup requirements is necessary to avoid unpleasant surprises.

Running pg_dump causes many effects on your database:

It consumes as much filesystem I/O as available and contends for resources with concurrently running queries.
It impedes other operations that require an exclusive lock, such as schema changes.
VACUUM and other automatic maintenance processes are paused since long-running transactions prevent autovacuum from freezing rows visible to those transactions.
It evicts cached data from OS and filesystem caches.

Capturing Logical Backups on Small Databases with Heroku PGBackups

Heroku Postgres comes with a pg:backups command that allows you to manually capture or schedule logical backups via the CLI. It performs a pg_dump behind the scenes resulting in an archive file that can be restored with pg_restore. All backups captured via pg:backups are stored in the U.S. regardless of where your database is located.

Use Heroku PGBackups only on databases up to 20 GB. Contention for the I/O, memory, and CPU needed for backing up a larger database can be prohibitive and cause your backup capture to end prematurely.

For more info, see Heroku PGBackups.

Capturing Logical Backups on Larger Databases

Frequent logical backups of large databases under moderate CPU load can be slow, degrade the performance of other queries, and prevent necessary maintenance operations from running.

For databases over 20 GB, take logical backups against a short-lived fork of your Heroku Postgres database. Capturing a logical backup from a fork preserves the original database’s performance and allows the backup operation to consume as many resources as necessary for the dump to succeed.

Create a short-lived fork of your Heroku Postgres database.
Create a script to run the pg_dump command and transfer the backup file to the destination of your choice. The following pg_dump command creates a backup file that is the same format as the Heroku PGBackups output.

pg_dump -F c --no-acl --no-owner --quote-all-identifiers $FORKED_DATABASE_URL

Ensure that the script includes the transfer of your backup file after the dump is complete and that the pg_dump command points to your forked database. You can optionally use the --jobs flag to parallelize the dump operation. You can also adjust the amount of compression with the --compress flag. See more configuration options in the official Postgres docs.

Upload the file to your app so that it’s accessible to your one-off dynos.
Run the script in a detached one-off dyno in the same region as the database. A Performance-L, Private-L , or Shield-L dyno is recommended to provide sufficient compute power. For example heroku run:detached --app example-app-name --size=performance-L script.sh

Some databases are too large for the execution and transfer of your backup file to complete within the 24-hour period limit of a one-off dyno. For these databases, you can retrieve your Postgres connection string to run the script outside of Heroku. If you’re using a Shield or Private database, you must also use PrivateLink or mTLS to facilitate the external connection.

Capturing Logical Backups for Storage in Another Region

All backups captured by Heroku PGBackups are stored in the U.S. You can perform pg_dump yourself to capture and store backups in another location. Follow the instructions in the Capturing Logical Backups on Larger Databases section.

Restoring Logical Backups

Be aware that the target database is deleted before the restore process occurs.

The heroku pg:backups:restore command runs the pg_restore application and takes any dump file that is accessible to a dyno running on Heroku and restores it to your Heroku Postgres database.

pg_restore uses a Postgres connection to load a dump into a Postgres database. The restore creates the necessary schema, loads data through COPY commands, adds constraints, and creates indexes and triggers dumped from the source database. These operations can be slow as each requires disk I/O and computation in order to write, process, and alter the restored data for normal operation.

Certain parts of the restore can run over parallel connections. If you run pg_restore on a one-off dyno using the pg_restore command instead of heroku pg:backups:restore, you can run parallel connections by including the --jobs flag. Because each job is one process, it’s not recommended to run more than one job per vCPU of your database instance. For example, a Standard-4 database has 4 vCPUs, so run pg_restore with a maximum of 4 jobs to avoid resource contention and decreased performance.

Direct Transfer of Logical Backups Between Heroku Postgres Databases

heroku pg:copy uses a dedicated dyno to pipe the output of pg_dump directly to pg_restore, removing the need to transfer the dump file to a location external to Heroku. The data transfer is also piped through pv (pipeviewer), to provide more visibility into the transfer. See more info on pg:copy in the Upgrading Heroku Postgres Databases article.

The pg:copy method requires approximately 3 minutes of app downtime per GB of your database. For large databases, it’s recommended that you follow the instructions in Capturing Logical Backups on Larger Databases instead.

Creating indexes, adding constraints, and other checks at the end of the pg:copy run aren’t be reflected in pv‘s logging. Although a transfer can appear to hang, it’s working in the background.

Keep reading

Postgres Data Transfer & Preservation

Categories