How to handle JSON data in PostgreSQL

Introducing JSON data type

PostgreSQL introduced the JSON data type in version 9.2 and the JSONB data type in version 9.4.
JSON (JavaScript Object Notation) is a data-interchange format designed to be easy to read and write by humans. A JSON object is an unordered set of key:value pairs that looks like {key_1:value_1, key_2:value_2, ...}. The value can be a string, a number, a boolean, an object or an array.
Here is an example of JSON object taken from the wikipedia page:

{
  "firstName": "John",
  "lastName": "Smith",
  "isAlive": true,
  "age": 27,
  "address": {
    "streetAddress": "21 2nd Street",
    "city": "New York",
    "state": "NY",
    "postalCode": "10021-3100"
  },
  "phoneNumbers": [
    {
      "type": "home",
      "number": "212 555-1234"
    },
    {
      "type": "office",
      "number": "646 555-4567"
    },
    {
      "type": "mobile",
      "number": "123 456-7890"
    }
  ],
  "children": [],
  "spouse": null
}

The values or the firstName, isAlive and age keys are strings, boolean or numbers.
The values of address is a JSON object.
the values of phoneNumbers and children are arrays.

In PostgreSQL, JSON data is stored in plain text while JSONB data is stored in some binary representation. The JSONB data type takes more disk space but is preferable when doing a lot of operations on the JSON data or when indexing the JSON column.

The use of JSON data gives more flexibility as we can add and remove attributes without changing our schema by adding and deleting columns.

Setup the training database

To practice with handling JSON data, let's create a simple table orders that has a info column storing the orders details.

CREATE TABLE orders(
  id serial primary key
  , info json
);

Then, we add some data. To insert data in a JSON column, we simply insert the JSON object as a string: '{"key":"value"}'.

INSERT INTO 
    orders (info)
VALUES
    ('{ "customer": "Alice", "items": {"product": "chocolate","qty": 6}}')
    , ('{ "customer": "Bob", "items": {"product": "beer","qty": 24}}')
    , ('{ "customer": "Charlie", "items": {"product": "milk","qty": 1}}')
    , ('{ "customer": "Doris", "items": {"product": "cheese","qty": 2}}')
    , ('{ "customer": "Emily", "items": {"product": "milk","qty": 2}}')
    , ('{ "customer": "Fabian", "items": {"product": "beer","qty": 6}}')
;

We should now have a table where each row represents an order. The info column contains all the details of the order: customer, items, product, quantity.

tests=# SELECT * FROM orders;
 id |                                info                                
----+--------------------------------------------------------------------
  1 | { "customer": "Alice", "items": {"product": "chocolate","qty": 6}}
  2 | { "customer": "Bob", "items": {"product": "beer","qty": 24}}
  3 | { "customer": "Charlie", "items": {"product": "milk","qty": 1}}
  4 | { "customer": "Doris", "items": {"product": "cheese","qty": 2}}
  5 | { "customer": "Emily", "items": {"product": "milk","qty": 2}}
  6 | { "customer": "Fabian", "items": {"product": "beer","qty": 6}}
(6 rows)

Select data

There are four operators that we can use in PostgreSQL to query JSON data:

-> returns JSON objects.
->> returns JSON objects as text.
#> returns JSON objects at a specified path.
#>> returns JSON objects at a specified path as text.

The `->` and `->>` operators

Because the -> operator returns a JSON object, we can chain operators.
I am going to try to select the products from the orders table using the ->> operator to show the difference:

SELECT 
  info ->> 'items' ->> 'product' AS product
FROM 
  orders
;

ERROR:  operator does not exist: text ->> unknown
LINE 2:   info ->> 'items' ->> 'product' AS product
                           ^
HINT:  No operator matches the given name and argument type(s). You might need to add explicit type casts.

The ->> operator returns items as text so we can't apply the operator ->> on them. That is what operator does not exist: text ->> unknown actually means.
To get the list of products, I need to get the items as a JSON object using ->, then I can use the ->> operator to extract the products:

SELECT 
  info -> 'items' ->> 'product' AS product
FROM 
  orders
;

  product  
-----------
 chocolate
 beer
 milk
 cheese
 milk
 beer
(6 rows)

Note that info -> 'items' -> 'product' AS product would also work.

The `#>` and `#>>` operators

When using the -> and ->> operators, I was adding the key as a text in the right operand. If I wanted to get a list of products, I had to chain the operators like this: info -> 'items' ->> 'product'.
Another way is to specify the path to the key I want to extract:

SELECT 
  info #> '{items, product}' AS product
FROM 
  orders
;

This query is returning the same result as info -> 'items' -> 'product':

  product   
-------------
 "chocolate"
 "beer"
 "milk"
 "cheese"
 "beer"
 "milk"
(6 rows)

info #>> '{items, product}' would return the same result as info -> 'items' ->> 'product'.

The jsonb_pretty function outputs the result in a more readable format.

SELECT 
    jsonb_pretty(info::jsonb) AS orders
FROM
    orders
LIMIT 2
;

          orders          
--------------------------------
 {                             +
     "items": {                +
         "qty": 6,             +
         "product": "chocolate"+
     },                        +
     "customer": "Alice"       +
 }
 {                             +
     "items": {                +
         "qty": 24,            +
         "product": "beer"     +
     },                        +
     "customer": "Bob"         +
 }
(2 rows)

The function also works with operators:

SELECT 
  jsonb_pretty((info -> 'items')::jsonb) AS items
FROM 
  orders
LIMIT 2
;

        items        
----------------------------
 {                         +
     "qty": 6,             +
     "product": "chocolate"+
 }
 {                         +
     "qty": 24,            +
     "product": "beer"     +
 }
(2 rows)

The JSON operators can be used in other parts of a query, like we'll see in the following examples.

SELECT with WHERE clause

Let's list the customers who bought beer.

SELECT 
  info ->> 'customer' AS customer
FROM 
  orders
WHERE
  info -> 'items' ->> 'product' = 'beer'
;

customer 
----------
 Bob
 Fabian
(2 rows)

Use aggregate functions

We can also use the usual aggregate functions on JSON columns. Note that as the operators return values as text, we need to cast them as integers if we want to calculate sums, averages, etc...
Let's get some statistics on the quantities ordered:

SELECT 
  MIN((info -> 'items' ->> 'qty')::INTEGER) AS mini
  , MAX((info -> 'items' ->> 'qty')::INTEGER) AS maxi
  , SUM((info -> 'items' ->> 'qty')::INTEGER) AS total
  , AVG((info -> 'items' ->> 'qty')::INTEGER) AS average
FROM 
  orders
;

 mini | maxi | total |      average       
------+------+-------+--------------------
    1 |   24 |    41 | 6.8333333333333333
(1 row)

GROUP BY statement

Aggregates often come with a GROUP BY statement. Like for the WHERE clause, we can use the -> and ->> operators in the GROUP BY statement. Let's get the total quantity ordered per product:

SELECT 
    info -> 'items' ->> 'product' AS product
    , SUM((info -> 'items' ->> 'qty')::INTEGER) AS total
FROM 
    orders
GROUP BY
    info -> 'items' ->> 'product'
;

  product  | total 
-----------+-------
 milk      |     3
 beer      |    30
 cheese    |     2
 chocolate |     6
(4 rows)

UPDATE data

Changing an existing value

To change a value inside a JSON attribute, we can use the jsonb_set() function. As it is implemented for JSONB data types only, we need to cast our JSON columns as JSONB.
Function:

jsonb_set(target            jsonb
          , path            text[],
          , new_value       jsonb
          [, create_missing boolean])

The function will replace the section of target designated by path with the new_value. If the section doesn't exist and create_missing is set to true (default), the new_value will be added.

Let's change Fabian's name to Bob

UPDATE 
   orders
SET
  info = jsonb_set(info::jsonb, '{customer}', jsonb '"Bob"')
WHERE 
  info ->> 'customer' = 'Fabian'
;

tests=# SELECT * FROM orders WHERE info ->> 'customer' = 'Bob';
 id |                             info                             
----+--------------------------------------------------------------
  2 | { "customer": "Bob", "items": {"product": "beer","qty": 24}}
  6 | {"items": {"qty": 6, "product": "beer"}, "customer": "Bob"}
(2 rows)

We now have two customers named Bob. Note that because the keys in the JSON data are not sorted, the two records look different.

Using the path parameter, we can update data deeper into the JSON hierarchy.
For example, let's set the quantity of cheese bought by Doris to three:

UPDATE 
   orders
SET
  info = jsonb_set(info::jsonb, '{items,qty}', jsonb '3')
WHERE 
  info ->> 'customer' = 'Doris'
  AND info -> 'items' ->> 'product' = 'cheese'
;

tests=# SELECT * FROM orders WHERE info ->> 'customer' = 'Doris';
 id |                              info                               
----+-----------------------------------------------------------------
  4 | {"items": {"qty": 3, "product": "cheese"}, "customer": "Doris"}
(1 row)

Creating a new key

Let's say that I want to add a unit_price to the items. We can use the jsonb_insert function that was introduced in 9.6.
Function:

jsonb_insert(target          jsonb
             , path          text[]
             , new_value     jsonb
             [, insert_after boolean])

The new_value will be inserted in the target in the section designated by path.

I am going to add 5 as the unit_price of chocolate.

UPDATE
    orders
SET 
    info = jsonb_insert(info::jsonb, '{items,unit_price}', jsonb '5')
WHERE 
    info -> 'items' ->> 'product' = 'chocolate'
;

tests=# SELECT * FROM orders WHERE info -> 'items' ->> 'product' = 'chocolate';
 id |                                info                                
----+--------------------------------------------------------------------
  1 | {"items": {"qty": 6, "product": "chocolate", "unit_price": 5}, "customer": "Alice"}
(1 row)

I have now a new attribute that I was able to add without modifying my schema. This is one of the main advantages of using JSON data, it allows for more flexibility. I also added a new attribute to one record without impacting the other records.

Removing a key

Removing a key from a JSON object is very easy and doesn't require the use of a function. If I cast my data as JSONB, I can simply remove the key by typing column = column - key:

UPDATE 
  orders
SET 
  info = info::jsonb - 'customer'
WHERE
  info ->> 'customer' = 'Emily'
;

tests=# SELECT * FROM orders WHERE id = 5;
 id |                   info                   
----+------------------------------------------
  5 | {"items": {"qty": 2, "product": "milk"}}
(1 row)

The record doesn't have a customer anymore.