Recently, I need to restore a PostgreSQL Database from production to staging. Unfortunately, our staging server is not as powerful as our production server. Specifically, our staging only has ~800GB of disk storage. But the total spaces needed for production database is slightly over 800GB. And due to some technical constraints, we can’t add more spaces to staging. So we choose to do a partial restore, i.e. only restoring data from tables we needed, excluding data we didn’t need. This task turns out to be harder than I expected. I’ve made many mistakes until I find the correct way to do it. Hope you can learn from my mistakes and restore your data correctly.
TL;DR
Don’t use pg_restore -t for a clean restore! Don’t use pg_restore -t for a clean restore! Don’t use pg_restore -t for a clean restore!
- Use pg_restore -lto dump a table of contents of the archive.
- Comment out unneeded DATA from the ToC
- Use pg_restore -Lto restore the whole database except DATA that were excluded in 2
Problems with pg_restore -t
I searched on StackOverflow. The first solution I found was pg_restore -t. So I dropped the staging db, ran the pg_restore -t command immediately.
But I ran into three annoying issues. Only after I’d fixed all of them, did I realize I shouldn’t have used pg_restore -t at the first place.
- Indices were missing. - The first thing I noticed was the staging app became so slow after the restore. Then I checked the indices in staging database and found nothing. This was the moment I realized that the -toption is doing its job too well.(Apparently, another developer has already run into the same issue before: postgresql – pg_restore on a single table not restoring indexes – Server Fault)
 
- The first thing I noticed was the staging app became so slow after the restore. Then I checked the indices in staging database and found nothing. This was the moment I realized that the 
- Constraints were missing. - The next issue happened when I started using the Rails app. Whenever I requested a record, I got the same error: Unknown primary key for tableThis was because all the primary key CONSTRAINTS were missing (not restored) from the database. When these constraints were missing,ActiveRecorddidn’t knowidwas the primary key, which triggered the error.After primary key constraints were restored, this issue was fixed.
 
- The next issue happened when I started using the Rails app. Whenever I requested a record, I got the same error: 
- Sequences/Defaults were missing.- Finally, I got another issue about id. Whenever I saved a record, I got this error:ActiveRecord::StatementInvalid: PG::Error: ERROR: null value in column “id” violates not-null constraintThis was because all the SEQUENCES and DEFAULTS for theidcolumns were missing (not restored) from the database.ActiveRecorduses SEQUENCES and DEFAULTS to make primary keys automatically increment itself when a new record is created in the db. When these two were missing from the database,ActiveRecordcould only assign primary key tonil, which violated the constraint.More interestingly, I only restored theSEQUENCESforidcolumns. But the issue still persisted. So I also tried to reset theSEQUENCES(which didn’t work becauseDEFAULTSwere still missing):
- ActiveRecord::Base.connection.tables.each do |t| ActiveRecord::Base.connection.reset_pk_sequence!(t) end
- After I restored DEFAULTS, the issue was fixed.
 
- Finally, I got another issue about 
After these three annoying issues, I knew I did something wrong. Fortunately, it was only our staging server, so I only wasted my own time and energy.
Reading the documentation for pg_restore -t, I finally realized it wasn’t the right tool for the job.
-t table
--table=table
    Restore definition and/or data of only the named table. For this purpose,
    "table" includes views, materialized views, sequences, and foreign tables.
    Multiple tables can be selected by writing multiple -t switches. This
    option can be combined with the -n option to specify table(s) in a
    particular schema.
        Note
        When -t is specified, pg_restore makes no attempt to restore any other
        database objects that the selected table(s) might depend upon.
        Therefore, there is no guarantee that a specific-table restore into a
        clean database will succeed.
As you may have notices from the Note, pg_restore makes no attempt to restore any other database objects that the selected table(s) might depend upon. (I didn’t read this Note because I thought StackOverflow was trustworthy.)
Besides all the missing indices, constraints, sequences, defaults, and more, pg_restore -t won’t create any tables that were not specified with the -t option. So any db migration related to the missing tables would fail on staging in the future. Which means this solution is absolutely unacceptable in this use case.
What else shall we do?
Introducing pg_restore -l and pg_restore -L
Turns out the correct answer is right below the first one in the same StackOverflow page.
- pg_restore -l psql_backup.dump > db.list
- Passing -loption topg_restorewould produce a table of content for the dump file. This ToC includes all the data and their locations included in the dump file:
; ; Archive created at Mon Sep 14 13:55:39 2009 ; dbname: DBDEMOS ; TOC Entries: 81 ; Compression: -1 ; Dump Version: 1.13-0 ; Format: CUSTOM ; Integer: 4 bytes ; Offset: 8 bytes ; Dumped from database version: 10.10 ; Dumped by pg_dump version: 10.10 ; ; ; Selected TOC Entries: ; 28920 70946 DATABASE - DBDEMOS pasha 55141; 15509 237 SCHEMA - public pasha 50798; 80741 51835 COMMENT - SCHEMA public pasha 66197; 75700 30831 ACL - public pasha 79603; 43218 86982 TYPE public composite pasha 59759; 64565 4792 EXTENSION - pg_trgm 95301; 10755 17786 COMMENT - EXTENSION pg_trgm 31637; 99705 30851 TABLE public users DBDEMOS 18171; 24739 26392 SEQUENCE public users_id_seq DBDEMOS 81822; 19526 48192 SEQUENCE OWNED BY public users_id_seq DBDEMOS 59215; 11301 6736 DEFAULT public users id DBDEMOS 2227; 53943 37511 TABLE DATA public users DBDEMOS 37029; 1384 1099 SEQUENCE SET public users_id_seq DBDEMOS 14500; 53947 96995 CONSTRAINT public users users_pkey DBDEMOS 26691; 92878 55511 INDEX public index_users_on_email DBDEMOS 85474; 88765 68415 TRIGGER public users username_update DBDEMOS
- Notice that this file contains all types of data in the dump: databases, schemas, comments, access control lists (ACL), extensions, tables, sequences, defaults, table data, constraints, indices, triggers, and so on.
- Comment out data we do not need
- We can change this ToC by commenting out or deleting the part we don’t want to restore. In this case, we don’t want to restore DATAfromuserstable, so we delete it.
; ; Archive created at Mon Sep 14 13:55:39 2009 ; dbname: DBDEMOS ; TOC Entries: 81 ; Compression: -1 ; Dump Version: 1.13-0 ; Format: CUSTOM ; Integer: 4 bytes ; Offset: 8 bytes ; Dumped from database version: 10.10 ; Dumped by pg_dump version: 10.10 ; ; ; Selected TOC Entries: ; 28920 70946 DATABASE - DBDEMOS pasha 55141; 15509 237 SCHEMA - public pasha 50798; 80741 51835 COMMENT - SCHEMA public pasha 66197; 75700 30831 ACL - public pasha 79603; 43218 86982 TYPE public composite pasha 59759; 64565 4792 EXTENSION - pg_trgm 95301; 10755 17786 COMMENT - EXTENSION pg_trgm 31637; 99705 30851 TABLE public users DBDEMOS 18171; 24739 26392 SEQUENCE public users_id_seq DBDEMOS 81822; 19526 48192 SEQUENCE OWNED BY public users_id_seq DBDEMOS 59215; 11301 6736 DEFAULT public users id DBDEMOS ; 2227; 53943 37511 TABLE DATA public users DBDEMOS 37029; 1384 1099 SEQUENCE SET public users_id_seq DBDEMOS 14500; 53947 96995 CONSTRAINT public users users_pkey DBDEMOS 26691; 92878 55511 INDEX public index_users_on_email DBDEMOS 85474; 88765 68415 TRIGGER public users username_update DBDEMOS
- pg_restore -L db.list
- -Loption would tell- pg_restoreto only restore the data specified in the Table of Content file.
After restoring with this solution, our staging behaves the same as production. No weird issues like Unknown primary key for table anymore!