In this article, we'll see how to setup an automatic backup of a PostgreSQL database from a linux machine.
First, we'll do some setup. We are going to install the required tools and create a demo database we can use to practice.
Next, we will have a look at the pg_dump
, pg_dumpall
and pg_restore
commands that let us backup and restore our PostgreSQL database.
Then, we will learn how to automate the backup process with a bash script and a cron job.
Finally, We will setup an email notification to monitor whether the backup is done correctly and be alerted if there has been any problem during the process.
Setup
Before starting, we need to install the tools we are going to use to backup and restore our database. We are also going to create a small demo database we will use in this demo.
Postgresql client
The postgresql client provides the commands we need to backup and restore databases.
To test whether the client is installed on your system, use the psql --version
command. You should get this output:
$ psql --version
psql (PostgreSQL) 10.10 (Ubuntu 10.10-0ubuntu0.18.04.1)
If the postgresql client is not installed, you should get an output that looks like this:
$ psql --version
bash: /usr/bin/psql: No such file or directory
To install in Debian based systems:
$ sudo apt install postgresql-client
The command should install the postgresql-client-common
and postgresql-client-<version>
. At the time of writing, version is 10.
Once the client is installed, you can start using the psql
command.
For other systems, see this article.
Create a demo database
We are now going to create a database that we'll use in the following example.
We have three ways to interact with our PostgreSQL server:
- Enter the server's command line where we can directly type SQL instructions.
$ psql -h <host_ip> -p <port> -U <username> -d <database_name>
username=>
- Send an SQL command using the
-c
option
$ psql -h <host_ip> -p <port> -U <username> -d <database> -c 'SELECT * FROM table'
- Run an SQL script from a file using the
-f
command
$ psql -h <host_ip> -p <port> -U <username> -d <database> -f myscript.sql
I will be working on a local PostgreSQL server, that's why I skip the -h
option. By default, the client connects to localhost
. If your PostgreSQL server is on a remote machine, you need to add the -h <host_ip>
option to the commands that follows. The -p
command that sets the port can be skipped if the PostgreSQL server is set to listen on the default port (5432).
First, create the demo database:
$ psql -U postgres -c 'CREATE DATABASE my_database;'
CREATE DATABASE
Then, we are going to write a script that will create a table in the my_database
database and fill it with some data. We will execute the script using the psql
command.
First, create the script.
$ vim create_database.sql
-- Create a users table
CREATE TABLE users (
id serial PRIMARY KEY
, username varchar(50) UNIQUE NOT NULL
, email varchar(100) UNIQUE NOT NULL
);
-- Insert some data in the table
INSERT INTO
users(username, email)
VALUES
('Alice', 'alice@example.com')
, ('Bob', 'bob@acme.com')
, ('Charlie', 'charlie@some_domain.com')
;
Then, we execute the script:
$ psql -U postgres -f create_database.sql
We can control that all is set:
$ psql -U postgres -d my_database -c 'SELECT * FROM users;'
id | username | email
----+----------+-------------------------
1 | Alice | alice@example.com
2 | Bob | bob@acme.com
3 | Charlie | charlie@some_domain.com
(3 rows)
We're all set! We can now learn how to backup our database.
pg_dump, pg_dumpall and pg_restore
The pg_dump
command lets us extract a database into a backup file. The pg_dumpall
extracts an entire cluster.
The pg_restore
reloads an archive file into a database.
For example, if I want to backup the my_database database, I would run the following commmand:
$ pg_dump my_database > backup.sql
I will then have a local backup.sql
file that contains all the SQL instructions needed to create my database objects and fill it with the backed up data.
pg_dump options
Here are some basic options we can use with the pg_dump
command.
-U
: specify which user will connect to the PostgreSQL server.-w
or--password
: will force pg_dump to prompt for a password-F
: specify the format of the output filep
: plain-text SQL (default)c
: custom-format archived
: directory-format archive. Will create a directory with one file per table.t
: tar-format archive.
custom, directory and tar formats are suitable for input into pg_restore
while the plain-text SQL file has to be executed with the pslq command as we will see in the next part on restoring databases.
-a
: dumps only the data, not the schema.-s
: dumps only the schema, not the data.-n <schema>
: will dump only the specified schema. To dump multiple schema, type-n schema_1 -n schema_2
Dumping a remote database
If the database is stored on a remote server, we can specify the host and the user as options.
$ pg_dump -h <host_ip> -U <username> my_database > my_database_backup.sql
We will be prompted to enter the password after hitting enter.
Restoring a database from a backup file
Restoring from plaintext backup.
If you used the default settings to backup your database, you now have a plain-text .sql
file.
We are now going to restore my_database
in a new database called new_db
.
$ psql -U postgres -c 'CREATE DATABASE new_db'
CREATE DATABASE
psql -U postgres -d new_db < my_database_backup.sql
SET
SET
SET
SET
SET
set_config
------------
(1 row)
SET
SET
SET
SET
CREATE EXTENSION
COMMENT
SET
SET
CREATE TABLE
ALTER TABLE
CREATE SEQUENCE
ALTER TABLE
ALTER SEQUENCE
ALTER TABLE
COPY 3
setval
--------
3
(1 row)
ALTER TABLE
ALTER TABLE
ALTER TABLE
We control that the data is there:
$ psql -U postgres -d new_db -c 'SELECT * FROM users;'
id | username | email
----+----------+-------------------------
1 | Alice | alice@example.com
2 | Bob | bob@acme.com
3 | Charlie | charlie@some_domain.com
(3 rows)
Here, we used the psql
command to restore the database instead of the pg_restore
one. When creating a backup without specifying the backup format, the format option defaults to plain-text SQL (-F p
). This will generate a plain-text file with SQL commands. pg_restore
cannot process those files.
To get a file that can be processed by pg_restore
, we have to save our backup using the -F
option with the c
, d
, or t
values.
Restoring with pg_restore
When the backup was done with the custom, directory or archive formats, you will have to use the pg_restore
command to restore your database.
pg_restore options
Here are some basic options that can be used with pg_restore
:
-a
: restore only the data.-c
: drop the database objects before recreating them.-C
: creates the database before restoring into it.-d dbname
: connects to the database dbname and restore into it.-t table
: restore only the named table.
Backup the databases
$ pg_dump -U postgres -F t my_database > my_database_backup.tar
To demonstrate the command, we are going to delete a record from the users table, restore from the backup file and control that the record is back.
Delete the record:
$ psql -U postgres -d my_database -c 'DELETE FROM users WHERE id = 2;'
DELETE 1
$ psql -U postgres -d my_database -c 'SELECT * FROM users;'
id | username | email
----+----------+-------------------------
1 | Alice | alice@example.com
3 | Charlie | charlie@some_domain.com
(2 rows)
Restore the data from the backup file:
$ pg_restore -U postgres -d my_database -c my_database_backup.tar
$ psql -U postgres -d my_database -c 'SELECT * FROM users;'
id | username | email
----+----------+-------------------------
1 | Alice | alice@example.com
2 | Bob | bob@acme.com
3 | Charlie | charlie@some_domain.com
(3 rows)
Automate this!
So far, we have seen how to backup our database manually. But a useful backup should be done on a regular basis and we don't want to have to do it manually.
In the last part of this post, I am going to explain how to automate the process. First, we are going to write a script that will backup the database. Then, we will create a .pgpass
file that will allow connections to the database without having to type the password. Finaly, we will create a cron job that will run our backup script at a specific time.
Writing a bash script
Some scripts for automated backup on linux have been made available here by the PostgreSQL community.
We are however going to write a simpler script following their example. The script will backup one database defined in the configuration file. We want to have a rotated backup that will run daily but will only keep the last n days of backups.
Configuration file
First we create a pg_backup.config
file where we can set up what and how we want to backup.
############################
# POSTGRESQL BACKUP CONFIG #
############################
### CONNECTION SETTINGS ####
# The hostname or ip address of the database host
HOSTNAME=<host_ip>
# The user name used to connect to the database. Will default to "postgres"
USERNAME=<user_name>
### BACKUP SETTINGS ###
# Name of the database we want to backup
DATABASE=my_database
# The directory where the backup files will be saved. Will be created if it doesn't exist.
# Must be writable by the user the script is running as
BACKUP_DIR=/path/to/backups/
# Set to "yes" to produce custom-format backup files
ENABLE_TAR_BACKUPS=yes
# Set to "yes" to produce gziped plain-text format files
ENABLE_PLAIN_BACKUPS=yes
### SETTINGS FOR ROTATED BACKUPS ###
# Which day to take the weekly backup from (1-7 = Monday-Sunday)
DAY_OF_WEEK_TO_KEEP=5
# Number of days to keep daily backups
DAYS_TO_KEEP=7
# How many week to keep weekly backups
WEEKS_TO_KEEP=5
Backup script
First, we need to get the path to the configuration file that will be passed with the -c
option when calling our script.
Load the configuration file
# Get the path to the configuration file from the option passed
while [ $# -gt 0 ]; do
# Look at the value of the first argument. It should be the option "-c"
case $1 in
-c)
# If the option passed is -c, then we set the value that follows
# as the variable that stores the path to the configuration file.
CONFIG_FILE_PATH=$2
shift 2
;;
*)
# Any other option will raise an error and stop the script
echo "Unknown option $1" 1>&2
exit 2
;;
esac
done
# Load the configuration
source "${CONFIG_FILE_PATH}"
Then we create a function called backup_database()
that will actually do the backup.
It will first create a directory where the backup files will be stores:
# The SUFFIX is set with the value of the first parameter passed to the function
# it can be "-daily", "-weekly" or "-monthly"
SUFFIX=$1
# Build the name of the directory from the base name set in the configuration file
# with the current date and the suffix
FINAL_BACKUP_DIR=$BACKUP_DIR"`date +\%Y-\%m-\%d`$SUFFIX/"
# Try to create the directory. If it fails, show an error message and stop the script.
if ! mkdir -p $FINAL_BACKUP_DIR; then
echo "Cannot create backup directory in $FINAL_BACKUP_DIR"
exit 1;
fi
Then we check the values of the different backup types and run the pg_dump
for each one set to yes. We'll look at the backup in tar format.
# Control whether the option is set to "yes"
if [ $ENABLE_TAR_BACKUPS = "yes" ]
then
# Try to run the pg_dump command with the -Ft option
if ! pg_dump -h "$HOSTNAME" -U "$USERNAME" -Ft "$DATABASE" -f $FINAL_BACKUP_DIR"$DATABASE".tar.in_progress; then
# If the attempt failed, show an error message and continue
echo "Failed to produce custom backup of the $DATABASE database"
else
# If the backup succeed, remove the ".in_progress" extension from the file name.
mv $FINAL_BACKUP_DIR"$DATABASE".tar.in_progress $FINAL_BACKUP_DIR"$DATABASE".tar
fi
fi
The backup_database()
function will be called for daily, weekly and monthly backups. The Monthly backups are made on the first day of the month.
# Get what day of the month is the current day.
DAY_OF MONTH=`date +%d`
# Control if today is the first day of the month.
if [ "$DAY_OF_MONTH" -eq 1]; then
# Delete expired monthly backups
find $BACKUP_DIR -maxdepth 1 -name "*-monthly" -exec rm -rf '{}' ';'
# Create a new monthly backup
backup_database "-monthly"
# Stop the script
exit 0;
fi
Let's put all of this together to get our backup script. We can add some echo
statements to log what the script is doing.
#!/bin/bash
## LOAD CONFIGURATION ##
# get the path to the configuration file from the option passed
while [ $# -gt 0 ]; do
case $1 in
-c)
CONFIG_FILE_PATH=$2
shift 2
;;
*)
echo "Unknown option $1" 1>&2
exit 2
;;
esac
done
# load configurations
source "${CONFIG_FILE_PATH}"
## DATABASE BACKUP ##
function backup_database()
{
# Create the directory where the backup file will be saved
SUFFIX=$1
FINAL_BACKUP_DIR=$BACKUP_DIR"`date +\%Y-\%m-\%d`$SUFFIX/"
echo -e "Creating the backup directory in $FINAL_BACKUP_DIR"\n
if ! mkdir -p $FINAL_BACKUP_DIR; then
echo "Cannot create backup directory in $FINAL_BACKUP_DIR"
exit 1;
fi
if [ $ENABLE_TAR_BACKUPS = "yes" ]
then
echo "Starting tar format backup of the $DATABASE database"
if ! pg_dump -h "$HOSTNAME" -U "$USERNAME" -Ft "$DATABASE" -f $FINAL_BACKUP_DIR"$DATABASE".tar.in_progress; then
echo "Failed to produce custom backup of the $DATABASE database"
else
mv $FINAL_BACKUP_DIR"$DATABASE".tar.in_progress $FINAL_BACKUP_DIR"$DATABASE".tar
fi
fi
if [ $ENABLE_PLAIN_BACKUPS = "yes" ]
then
echo "Starting plain format backup of the $DATABASE database"
if ! pg_dump -h "$HOSTNAME" -U "$USERNAME" -Fc "$DATABASE" | gzip > $FINAL_BACKUP_DIR"$DATABASE".sql.gz.in_progress; then
echo "Failed to produce plain backup of the $DATABASE database"
else
mv $FINAL_BACKUP_DIR"$DATABASE".sql.gz.in_progress $FINAL_BACKUP_DIR"$DATABASE".sql.gz
fi
fi
# Display the files saved in the backup directory
echo -e "\n--------------------------------------------------"
echo -e "Showing files in $FINAL_BACKUP_DIR\n"
ls -lh $FINAL_BACKUP_DIR
}
START_TIME=$SECONDS
echo "##################################################"
echo -e "## BACKUP LOG FOR `date +\%Y-\%m-\%d` ##"
echo -e "##################################################\n"
echo -e "Start time: `date +\%H:\%M`\n"
echo "Database: $DATABASE"
echo "Backup Directory: $BACKUP_DIR"
## MONTHLY BACKUPS
DAY_OF_MONTH=`date +%d`
if [ "$DAY_OF_MONTH" -eq 1 ]; then
echo -e "\n--------------------------------------------------"
echo -e "Creating monthly backup\n"
# Delete expired monthly directories
find $BACKUP_DIR -maxdepth 1 -name "*-monthly" -exec rm -rf '{}' ';'
# Create a new monthly backup
backup_database "-monthly"
exit 0;
fi
## WEEKLY BACKUPS
DAY_OF_WEEK=`date +%u` # 1-7 (Monday-Sunday)
EXPIRED_DAYS=`expr $((($WEEKS_TO_KEEP * 7) + 1))`
if [ "$DAYS_OF_WEEK" = "$DAY_OF_WEEK_TO_KEEP" ];
then
echo -e "\n--------------------------------------------------"
echo -e "Creating weekly backup\n"
# Delete expired weekly directories
find $BACKUP_DIR -maxdepth 1 -mtime +$EXPIRED_DAYS -name "*-weekly" -exec rm -rf '{}' ';'
# Create a new weekly backup
backup_database "-weekly"
exit 0;
fi
## DAILY BACKUPS
echo -e "\n--------------------------------------------------"
echo -e "Creating daily backups\n"
# Delete expired daily backups
find $BACKUP_DIR -maxdepth 1 -mtime +$DAYS_TO_KEEP -name "*-daily" -exec rm -rf '{}' ';'
# Create a new daily backup
backup_database "-daily"
echo -e "\n--------------------------------------------------"
ELAPSED_TIME=$(($SECONDS - $START_TIME))
echo "The backup took $(($ELAPSED_TIME/60)) min $(($ELAPSED_TIME%60)) sec"
We can now test the script by running it manually.
$ scripts/pg_backup.sh -c scripts/pg_backup.config
Configuration file: scripts/pg_backup.config
--------------------------
Creating daily backups
Making backup directory in /tmp/2019-10-24-daily/
Starting tar format backup of the my_database database
Starting plain format backup of the my_database database
.pgpass
When the connection to the database requires a password, we can create a .pgpass
file in our home directory to store the credentials. The file will contain one line per connection.
Each line must have the following format:
hostname:port:database:username:password
For security reasons, we must disallow access to the file to world or group.
$ chmod 0600 ~/.pgpass
The permissions should look like this
$ ls -al
...
-rw------- 1 user user 45 Oct 24 02:39 .pgpass
...
Create a cron job
At this point, we should be able to backup our database by just running the pg_backup.sh
script. This is already less work than typing the backup commands manually.
The next step is to create a cron job to have the script executed by our system automatically at a given time.
First, we have to make our script executable:
$ chmod +x /path/to/pg_backup.sh
Then, to create a cron job, run the crontab -e
command. This will open the user crontab file in your favorite editor. All we have to do is add a line that define the job we want done.
The line needs to follow this format:
minutes hours day_of_month month day_of_week command
We can use the following operators:
- The asterisk (*): all possible values for the field.
- The comma (,): specifies a list of values (e.g. "1,5,10,15,20,25").
- The dash (-): specifies a range of values. (e.g. "5-15" == "5,6,7,...,13,14,15").
- The separator (/): specifies a step value. "0-23/" can be used in hours field to run the command every other hour. Steps are also permitted after an asterisk: "*/2" == every two hours.
Let say that we want our database to be backed up every day at 11pm. We would add the following line:
00 23 * * * /path/to/pg_backup.sh -c /path/to/pg_backup.config
If the hour of the backup is not important, we can use the @daily
special string. This will execute our script every day at midnight.
@daily /path/to/pg_backup.sh -c /path/to/pg_backup.config
Or, as another example, if nobody is working on weekends and no changes are made to the database during those days, we may want to backup our database only on weekdays. In this case, we would add this line
00 23 * * 1-5 /path/to/pg_backup.sh -c /path/to/pg_backup.config
where 1-5
tells the system to run the job only from Monday (1) to Friday (5).
Get a notification email
We're all set! The database should be backed up every day. But we still need to monitor the process to be sure that the backup is done. Don't wait the day you need your backup file to find out that the backup didn't run correctly and that you don't have any usable backup file. In this last step, we are going to send an email with the output from the backup script.
First, we need to install mailutils
$ sudo apt install mailutils
We can now easily send emails from the command line using the mailx
command:
$ mailx -s "my subject" recipient@example.com <<< "Some text goes there"
If we want to send the content of a file, we can use the cat
command with a pipe |
:
$ cat filename.txt | mailx -s "content of filename.txt" recipient@example.com
Knowing this, we can modify our cron job command to have the output sent by email.
Return in the crontab with crontab -e
and modify the job as follow.
00 23 * * * /path/to/pg_backup.sh -c /path/to/pg_backup.config 2>&1 | tee /path/to/pg_backup.log | mailx -s "database backup for `date +\%Y-\%m-\%d" recipient@example.com
In this cron job, the 2>&1
redirects errors (2 is the file descriptor for stderr
) to the standard output (1 is the file descriptor for stdout
). The output is then passed to the tee
command that reads from a standard input and write to our pg_backup.log file. The file is finally passed to the mailx
command that will send an email with our job's log as the body.
Conclusion
Et voila! Our database is now backed up every day automatically. All we have to do is check our email to see if everything went smoothly.
The script we wrote in this article will only backup one database at a time. We can create a new script and a new cron job for each database we want to backup. This may not be the most efficient way in some settings. Writing our own script is a good way to learn how it works, but I recommend using the scripts made available on the PostgreSQL wiki.