Introducing JSON data type
PostgreSQL introduced the JSON data type in version 9.2 and the JSONB data type in version 9.4.
JSON (JavaScript Object Notation) is a data-interchange format designed to be easy to read and write by humans. A JSON object is an unordered set of key:value pairs that looks like {key_1:value_1, key_2:value_2, ...}
.
The value
can be a string, a number, a boolean, an object or an array.
Here is an example of JSON object taken from the wikipedia page:
{
"firstName": "John",
"lastName": "Smith",
"isAlive": true,
"age": 27,
"address": {
"streetAddress": "21 2nd Street",
"city": "New York",
"state": "NY",
"postalCode": "10021-3100"
},
"phoneNumbers": [
{
"type": "home",
"number": "212 555-1234"
},
{
"type": "office",
"number": "646 555-4567"
},
{
"type": "mobile",
"number": "123 456-7890"
}
],
"children": [],
"spouse": null
}
The values or the firstName, isAlive and age keys are strings, boolean or numbers.
The values of address is a JSON object.
the values of phoneNumbers and children are arrays.
In PostgreSQL, JSON data is stored in plain text while JSONB data is stored in some binary representation. The JSONB data type takes more disk space but is preferable when doing a lot of operations on the JSON data or when indexing the JSON column.
The use of JSON data gives more flexibility as we can add and remove attributes without changing our schema by adding and deleting columns.
Setup the training database
To practice with handling JSON data, let's create a simple table orders
that has a info
column storing the orders details.
CREATE TABLE orders(
id serial primary key
, info json
);
Then, we add some data. To insert data in a JSON column, we simply insert the JSON object as a string: '{"key":"value"}'
.
INSERT INTO
orders (info)
VALUES
('{ "customer": "Alice", "items": {"product": "chocolate","qty": 6}}')
, ('{ "customer": "Bob", "items": {"product": "beer","qty": 24}}')
, ('{ "customer": "Charlie", "items": {"product": "milk","qty": 1}}')
, ('{ "customer": "Doris", "items": {"product": "cheese","qty": 2}}')
, ('{ "customer": "Emily", "items": {"product": "milk","qty": 2}}')
, ('{ "customer": "Fabian", "items": {"product": "beer","qty": 6}}')
;
We should now have a table where each row represents an order. The info
column contains all the details of the order: customer, items, product, quantity.
tests=# SELECT * FROM orders;
id | info
----+--------------------------------------------------------------------
1 | { "customer": "Alice", "items": {"product": "chocolate","qty": 6}}
2 | { "customer": "Bob", "items": {"product": "beer","qty": 24}}
3 | { "customer": "Charlie", "items": {"product": "milk","qty": 1}}
4 | { "customer": "Doris", "items": {"product": "cheese","qty": 2}}
5 | { "customer": "Emily", "items": {"product": "milk","qty": 2}}
6 | { "customer": "Fabian", "items": {"product": "beer","qty": 6}}
(6 rows)
Select data
There are four operators that we can use in PostgreSQL to query JSON data:
->
returns JSON objects.->>
returns JSON objects as text.#>
returns JSON objects at a specified path.#>>
returns JSON objects at a specified path as text.
The ->
and ->>
operators
Because the ->
operator returns a JSON object, we can chain operators.
I am going to try to select the products from the orders table using the ->>
operator to show the difference:
SELECT
info ->> 'items' ->> 'product' AS product
FROM
orders
;
ERROR: operator does not exist: text ->> unknown
LINE 2: info ->> 'items' ->> 'product' AS product
^
HINT: No operator matches the given name and argument type(s). You might need to add explicit type casts.
The ->>
operator returns items as text so we can't apply the operator ->>
on them. That is what operator does not exist: text ->> unknown
actually means.
To get the list of products, I need to get the items as a JSON object using ->
, then I can use the ->>
operator to extract the products:
SELECT
info -> 'items' ->> 'product' AS product
FROM
orders
;
product
-----------
chocolate
beer
milk
cheese
milk
beer
(6 rows)
Note that info -> 'items' -> 'product' AS product
would also work.
The #>
and #>>
operators
When using the ->
and ->>
operators, I was adding the key as a text in the right operand. If I wanted to get a list of products, I had to chain the operators like this: info -> 'items' ->> 'product'
.
Another way is to specify the path to the key I want to extract:
SELECT
info #> '{items, product}' AS product
FROM
orders
;
This query is returning the same result as info -> 'items' -> 'product'
:
product
-------------
"chocolate"
"beer"
"milk"
"cheese"
"beer"
"milk"
(6 rows)
info #>> '{items, product}'
would return the same result as info -> 'items' ->> 'product'
.
The jsonb_pretty
function outputs the result in a more readable format.
SELECT
jsonb_pretty(info::jsonb) AS orders
FROM
orders
LIMIT 2
;
orders
--------------------------------
{ +
"items": { +
"qty": 6, +
"product": "chocolate"+
}, +
"customer": "Alice" +
}
{ +
"items": { +
"qty": 24, +
"product": "beer" +
}, +
"customer": "Bob" +
}
(2 rows)
The function also works with operators:
SELECT
jsonb_pretty((info -> 'items')::jsonb) AS items
FROM
orders
LIMIT 2
;
items
----------------------------
{ +
"qty": 6, +
"product": "chocolate"+
}
{ +
"qty": 24, +
"product": "beer" +
}
(2 rows)
The JSON operators can be used in other parts of a query, like we'll see in the following examples.
SELECT with WHERE clause
Let's list the customers who bought beer.
SELECT
info ->> 'customer' AS customer
FROM
orders
WHERE
info -> 'items' ->> 'product' = 'beer'
;
customer
----------
Bob
Fabian
(2 rows)
Use aggregate functions
We can also use the usual aggregate functions on JSON columns. Note that as the operators return values as text, we need to cast them as integers if we want to calculate sums, averages, etc...
Let's get some statistics on the quantities ordered:
SELECT
MIN((info -> 'items' ->> 'qty')::INTEGER) AS mini
, MAX((info -> 'items' ->> 'qty')::INTEGER) AS maxi
, SUM((info -> 'items' ->> 'qty')::INTEGER) AS total
, AVG((info -> 'items' ->> 'qty')::INTEGER) AS average
FROM
orders
;
mini | maxi | total | average
------+------+-------+--------------------
1 | 24 | 41 | 6.8333333333333333
(1 row)
GROUP BY statement
Aggregates often come with a GROUP BY statement. Like for the WHERE clause, we can use the ->
and ->>
operators in the GROUP BY statement.
Let's get the total quantity ordered per product:
SELECT
info -> 'items' ->> 'product' AS product
, SUM((info -> 'items' ->> 'qty')::INTEGER) AS total
FROM
orders
GROUP BY
info -> 'items' ->> 'product'
;
product | total
-----------+-------
milk | 3
beer | 30
cheese | 2
chocolate | 6
(4 rows)
UPDATE data
Changing an existing value
To change a value inside a JSON attribute, we can use the jsonb_set()
function. As it is implemented for JSONB data types only, we need to cast our JSON columns as JSONB.
Function:
jsonb_set(target jsonb
, path text[],
, new_value jsonb
[, create_missing boolean])
The function will replace the section of target
designated by path
with the new_value
. If the section doesn't exist and create_missing
is set to true (default), the new_value
will be added.
Let's change Fabian's name to Bob
UPDATE
orders
SET
info = jsonb_set(info::jsonb, '{customer}', jsonb '"Bob"')
WHERE
info ->> 'customer' = 'Fabian'
;
tests=# SELECT * FROM orders WHERE info ->> 'customer' = 'Bob';
id | info
----+--------------------------------------------------------------
2 | { "customer": "Bob", "items": {"product": "beer","qty": 24}}
6 | {"items": {"qty": 6, "product": "beer"}, "customer": "Bob"}
(2 rows)
We now have two customers named Bob. Note that because the keys in the JSON data are not sorted, the two records look different.
Using the path
parameter, we can update data deeper into the JSON hierarchy.
For example, let's set the quantity of cheese bought by Doris to three:
UPDATE
orders
SET
info = jsonb_set(info::jsonb, '{items,qty}', jsonb '3')
WHERE
info ->> 'customer' = 'Doris'
AND info -> 'items' ->> 'product' = 'cheese'
;
tests=# SELECT * FROM orders WHERE info ->> 'customer' = 'Doris';
id | info
----+-----------------------------------------------------------------
4 | {"items": {"qty": 3, "product": "cheese"}, "customer": "Doris"}
(1 row)
Creating a new key
Let's say that I want to add a unit_price to the items.
We can use the jsonb_insert
function that was introduced in 9.6.
Function:
jsonb_insert(target jsonb
, path text[]
, new_value jsonb
[, insert_after boolean])
The new_value
will be inserted in the target
in the section designated by path
.
I am going to add 5 as the unit_price of chocolate.
UPDATE
orders
SET
info = jsonb_insert(info::jsonb, '{items,unit_price}', jsonb '5')
WHERE
info -> 'items' ->> 'product' = 'chocolate'
;
tests=# SELECT * FROM orders WHERE info -> 'items' ->> 'product' = 'chocolate';
id | info
----+--------------------------------------------------------------------
1 | {"items": {"qty": 6, "product": "chocolate", "unit_price": 5}, "customer": "Alice"}
(1 row)
I have now a new attribute that I was able to add without modifying my schema. This is one of the main advantages of using JSON data, it allows for more flexibility. I also added a new attribute to one record without impacting the other records.
Removing a key
Removing a key from a JSON object is very easy and doesn't require the use of a function. If I cast my data as JSONB, I can simply remove the key by typing column = column - key
:
UPDATE
orders
SET
info = info::jsonb - 'customer'
WHERE
info ->> 'customer' = 'Emily'
;
tests=# SELECT * FROM orders WHERE id = 5;
id | info
----+------------------------------------------
5 | {"items": {"qty": 2, "product": "milk"}}
(1 row)
The record doesn't have a customer anymore.
Further reading
json.org
JSON wikipedia article
Explanation of JSONB introduced by PostgreSQL
PostgreSQL JSON Fonctions and Operators