Introduction

Version control is already widely adopted in software development. It is not that common in database management. When you are working with several environments (e.g. development, staging, production) it is not easy to keep the schema of your database synchronized across those environments. You may also want to keep a history of the changes that are made to your schema.

Why do you need to put your database under version control? You cannot use a single database for all the phases of development. If you connect to your production database for development or testing, you're up for a disaster. You however need to have an authoritative version of your database schema so everybody who needs to work on it knows what is the latest version.

Liquibase is an application that can help database managers put their schema under version control. All changes are stored in text files making it easy to also use git with your project. The text files format accepted are XML, YAML, JSON, and SQL. It supports all major relational database systems and NoSQL systems. You can find the list on the Liquibase website.

Going through the basics and some usage examples is quite long. I decided to cut this article into three parts. In this first part, I will explain the key concepts of Liquibase. In part two, I will explain how to use Liquibase to manage the schema of a single database. Finally, in the third and last part, I will show how you can use Liquibase to synchronize your database schema between multiple environments.

Database versioning principles

The three rules of database work

In a series on database versioning that Scott Allen wrote on his blog Ode to Code, he states three rules to follow when working on databases.

Never use a shared database for development work.
The developers may overwrite each other changes or break the code on someone else's machine.
Always have a single, authoritative source for your schema.
Everybody should know where the official schema is and be able to deploy it easily on their machine.
Always version your database.
The goals of versioning are:
- to always be able to propagate the changes from a development environment to the test and production ones.
- to be able to recreate a database as it was at any given point in time.

Steps to versioning databases

Wait for the schema to settle in. Don't start versioning on day 1 of a project At the beginning of a project, there will be a lot of changes made to your schema. There's no need to track all of them. Wait until the schema gets into a more stable state.
Create a baseline This is your starting point for database versioning. From this point on, you will track all the changes that are made to the schema with your versioning system. In this step, you also need to create a table in your database where you will record all the schema changes. It should at least contain the date of the change and the name of the script that was executed. You can add other columns like the author, the version number, etc... We will see later that Liquibase takes care of this.
Push to a repository You need a central place where to store your official database schema so anybody who needs it in your organization knows where to find it.
Write schema change scripts for all changes Any new change that is made to the schema requires a schema change script. There are some rules to follow to make it work well:
- Once the script is published into source control, it cannot be changed.
- The tool that applies the updates is responsible for updating the table you created in step 2 to store the schema changes.
- Always backup your database before applying the changes.

Finally, remember that it's never too late to start doing the right thing. You still can start versioning if the database has been used for years. You start at step 2, you create a baseline and start version control from there.

Liquibase concepts

Now that I went through the basics of database versioning, I can explain some Liquibase concepts that we need to understand before we start using it.

Changelog

The changelog is just a file where all the changes made to the database schema are saved. It is used by Liquibase to look for any changes that have not been made yet and execute them. There are four accepted formats: - SQL - XML - JSON - YAML

While it is possible to write all your changes in a single file, it is best practice to work with multiple files. The most common way to organize your changes is to group them in changelog files by major release. To make it work, you need to have a master changelog file that points to all your changelogs. I will explain how to set up your changelog files in part 2.

Changesets

The changesets are the changes that are saved in the changelog. One changeset should represent a single change, so if you want to create a new table and add a column to another one, you have to write two changesets. You have to tag each changeset with an author and an id attributes like this: author:id. Tags must be unique as they are used to identify your changesets uniquely.

Labels and Contexts

Both contexts and labels allow you to specify which changesets will be deployed when you run the update command. There are some differences so you should carefully choose which one to use. They differ in who has the power to select what changesets Liquibase will execute: the changeset author or the deployment manager. Here is how to know which one is appropriate: - If the changeset author needs to specify the logic based on the environment that Liquibase run in, use context. - If the person running Liquibase needs to choose the changesets to run, use labels. - If you don't need logic, there is no difference between labels and contexts.

When in doubt, use context because it makes deployment easier.

Simply put: - contexts are for choosing the environment. - labels are for choosing changesets. -

DATABASECHANGELOG table

The DATABASECHANGELOG table is a table that Liquibase will create in your database to log all the changes that were made. A new record is added for every changeset that is executed.

The main columns of the table are:
- ID: the id value you entered in your changeset tag.
- AUTHOR: the author value you entered in your changeset tag.
- FILENAME: the name of the changelog file where the changeset was saved.
- DATEEXECUTED: the timestamp of when the changeset was executed.
- COMMENTS: the comments you added to your changeset. This is where you should describe what the changeset do.
- CONTEXTS: the changeset's labels.
- LABELS: the changeset's labels.
- DEPLOYMENT_ID: a unique id that identify all the changesets that were executed at the same time.

DATABASECHANGELOGLOCK table

This is another table created by Liquibase. This one is used by Liquibase to ensure that there's only one instance of Liquibase running at a time.

liquibase.properties file

This is a text file where you can save your Liquibase configurations like the database credentials, the name of your changelog file, etc. ... You can also set the name of the DATABASECHANGELOG and DATABASECHANGELOGLOCK tables to make them match your database naming conventions.
Here are some common values you can set in your properties file:

Property	Definition
changeLogFile	Path to the changelog
referenceDriver	Driver class name for the source database
referenceUrl	Source database for performing comparisons
referenceUsername	Username of source database
referencePassword	Password of source database
driver	Driver class name for the target database
url	Target database that is compared to the source database
username	Username of target database
password	Password for target database
classpath	Directories and jar files to search for changelog and extensions (separated by `:`)
changeSetAuthor	Specifies the author of auto-generated changesets
changeSetContext	Specified the contexts for changesets in the generated changelog.
contexts	Specifies changesets contexts to be executed
databaseChangeLogLockTableName	Specifies the name of the changelog lock table
databaseChangeLogTableName	Specigies the name of the changelog table

Basic commands

Now that we are familiar with the basic concepts, we can have a look at the some commands we will have to use to manage our schema.

status

This command returns the list of pending changesets. Without any option, the command displays the number of changes. We can add the --verbose option to see the changesets that Liquibase will execute.

liquibase \
    --changeLogFile=changelog.xml \
    --url=jdbc:postgresql://localhost:5432/mydb \
    --driver=org.postgresql.Driver \
    --username=postgres \
    --password=postgres \
    status --verbose
1 change sets have not been applied to postgres@jdbc:postgresql://172.17.0.2:5432/mydb
     changelogs/views/00001-user-view.sql::3::Guillaume
Liquibase command 'status' was executed successfully.

snapshot

This command takes a snapshot of the current state of the database. The output is either a JSON or YAML file, depending on the value of the snapshotFormat parameter you entered when running the command.

liquibase \
    --changeLogFile=changelog.sql \
    --username=postgres \
    --password=postgres \
    --url=jdbc:postgresql://172.17.0.2:5432/mydb \
    --outputFile=snapshot.yml \
    --snapshotFormat=yaml \
    snapshot
[...]
Output saved to /liquibase/snapshot.yml
Liquibase command 'snapshot' was executed successfully.

If you want to get a snapshot of the reference database that is specified by referenceUrl in the liquibase.properties file, you have to use the snapshotReference comand.

Here is a sample of a snapshot in YAML format. I am showing only the header, the users table, and two columns.

snapshot:
  created: 2021-12-14T07:15:40.403
  database:
    productVersion: '13.2'
    shortName: postgresql
    majorVersion: '13'
    minorVersion: '2'
    user: postgres
    productName: PostgreSQL
    url: jdbc:postgresql://172.17.0.2:5432/mydb
  objects:
    liquibase.structure.core.Catalog:
    - catalog:
        default: true
        name: mydb
        snapshotId: '9737100'
[...]
    - column:
        name: username
        nullable: true
        order: 2!{java.lang.Integer}
        relation: liquibase.structure.core.Table#9737146
        snapshotId: '9737149'
        type:
          characterOctetLength: 10!{java.lang.Integer}
          columnSize: 10!{java.lang.Integer}
          columnSizeUnit: BYTE!{liquibase.structure.core.DataType$ColumnSizeUnit}
          dataTypeId: 12!{java.lang.Integer}
          radix: 10!{java.lang.Integer}
          typeName: varchar
[...]
    - primaryKey:
        backingIndex: liquibase.structure.core.Index#9737147
        columns:
        - liquibase.structure.core.Column#9737148
        name: users_pkey
        snapshotId: '9737151'
        table: liquibase.structure.core.Table#9737146
[...]
    - table:
        columns:
        - liquibase.structure.core.Column#9737148
        - liquibase.structure.core.Column#9737149
        - liquibase.structure.core.Column#9737150
        default_tablespace: false
        indexes:
        - liquibase.structure.core.Index#9737147
        name: users
        primaryKey: liquibase.structure.core.PrimaryKey#9737151
        schema: liquibase.structure.core.Schema#9737101
        snapshotId: '9737146'

diff

This command compares two databases and generates a text file listing the differences. It will compare tables, views, and indexes. The differences are grouped into three categories: - Missing: objects in the source database that don't exist in the target database. - Unexpected: objects in the target database that don't exist in the source database. - Changed: objects that exist in the target database but are different from the corresponding object in the source database.

When running this command, you need to specify a reference database and a target database.

liquibase \
    --outputFile=diff.txt \
    --driver=org.postgresql.Driver \
    --url=jdbc:postgresql://localhost:5432/db_prod \
    --username=postgres \
    --password=postgres \
    --referenceUrl=jdbc:postgresql//localhost:5432/db_stag \
    --referenceUsername=postgres \
    --referencePassword=postgres \
    diff

It is also possible to use a snapshot as reference. It is useful if you want to see all the changes that have been made to a database since a given point in time.

liquibase \
    --url=offline:postgresql?snapshot=mySnapshot.json \
    --referenceUrl="jdbc:postgresql://localhost:5432/mydb" \
    --referenceUsername=<USERNAME> \
    --referencePassword=<PASSWORD> \
    diff

Here is a sample from a diff.txt file I generated. We can see that three tables are missing. It means that those tables exist in the Reference Database but are not in the Comparison Database.

Reference Database: postgres @ jdbc:postgresql://172.17.0.2:5432/mydb (Default Schema: public)
Comparison Database: postgres @ jdbc:postgresql://172.17.0.4:5432/mydb (Default Schema: public)
Compared Schemas: public
Product Name: EQUAL
Product Version: EQUAL
Missing Catalog(s): NONE
Unexpected Catalog(s): NONE
Changed Catalog(s): NONE
Missing Column(s): 
[...]
Missing Table(s): 
     database_change_log
     database_change_log_lock
     users

changlogSync

This command will mark all undeployed changes in the changelog as executed in the DATABASECHANGELOG table. This way, the next time you run an update, Liquibase will ignore all the changes up to that point. We will typically use it to set a baseline for a new database environment.

liquibase --changelogFile=mychangelog.postgresql.sql changelogSync

diffChangeLog

Like the diff command, diffChangeLog compares two databases and identifies the differences. But diffChangeLog will generate a list of changesets in an XML changelog file. We will typically use it to synchronize two schemas.

liquibase \
   --changeLogFile=dbchangelog.xml \
   --outputFile=diff.txt \
    --driver=org.postgresql.Driver \
    --url=jdbc:postgresql://localhost:5432/db_prod \
    --username=postgres \
    --password=postgres \
    --referenceUrl=jdbc:postgresql//localhost:5432/db_stag \
    --referenceUsername=postgres \
    --referencePassword=postgres \
    diffChangeLog

If you want to generate an SQL changelog file, add the database type in the file name:

--changeLogFile=changelog.postgresql.sql

The categories of objects included in the changelog are:

Catalog
Column
Foreign Key
Index
Primary Key
Schema
Sequence
Procedure
Unique Constraints
View

generateChangeLog

This command generates a changelog file with all the changesets required to recreate the database in its current state.

liquibase \
    --changeLogFile=dbchangelog.xml \
    --driver=org.postgresql.Driver \
    --url=jdbc:postgresql://localhost:5432/mydb \
    --username=postgres \
    --password=postgres \
    generateChangeLog

The extension of the changelog files determines the format of the changelog. Accepted extensions are .xml, .json, .sql. When generating an SQL file, you need to specify the database type: dbchangelog.postgresql.sql.

You can specify which schema to include in the changelog file:

--schema=schema_1,schema_2,schema_3 \
--includeSchema=True

update

This command deploys all the changes in the changelog that have not been deployed yet. Those are the changes that were shown in the output of the status command.

liquibase \
    --changeLogFile=changelog.xml \
    --driver=org.postgresql.Driver \
    --url=jdbc:postgresql://localhost:5432/mydb \
    --liquibaseSchemaName=public \
    --username=postgres \
    --password=postgres \
    update

You can filter the changesets that will be deployed using the --labels and --contexts parameters.

tag

We use the tag command to mark the current state of the database. Liquibase will update the value of the tag column in the last row of the DATABASECHANGELOG table. It then serves as a reference point when running the rollback <tag> command. To set a tag, we have to execute the following command:

liquibase tag <tag_value>

rollback

The rollback command is a way to cancel changes that have been made to your schema. There are three ways to rollback: - rollback <tag>: revert all changes made after the specified tag. It requires that a tag has been set, i.e. there is at list one row in the DATABASECHANGELOG table that contains a value in the tag column. - rollbackToDate: revert all the changes from the current date to the specified date and time. - rollbackCount: revert a specified number of changesets.

Conclusion

There are more commands and concepts to learn, but there's enough here to get starting doing version control on your database schema. In the next part, I will show how to work on a single database project.