Ahmad Firuze: Managing Hierarchical Data In Postgresql

Berikut ane kumpulin rangkuman/artikel/tanya-jawab seputar cara me-manage data hirarkis pada postgresql.

Menggunakan query standard:

This may have been asked/answered a million times already, but what the
heck...

Basically, I have a table that maintains parent<-->child relationships
within itself.  The parent_id field points to the collection_id field.
A parent_id id of -1 means it's a root record (ie, no parent).  Pretty
simple.

Question is, how do I sort a query so that children follow their parent?

I came up with this, and it works, but I'm sure there's a better way:

SELECT *, CASE WHEN parent_id = -1 THEN collection_id||'' WHEN parent_id
!= -1 THEN parent_id||collection_id END as z FROM collection order by z;

Any advice will be greatly appreciated.

thanks!

eric

Source

Some self join would work best I suppose:

select p.*, c.*
from collection p, collection.c
where c.parent_id = p.collectionid
order by p.collectionid ASC, c.collectionid

Depending on your datset you might need to use an outer join instead.

Jochem

Source

Menggunakan ltree:

The ltree extension

The ltree extension is a great choice for querying hierarchical data. This is especially true for self-referential relationships.

Lets rebuild the above example using ltree. We'll use the page's primary keys as the "labels" within our ltree paths and a special "root" label to denote the top of the tree.

CREATE EXTENSION ltree;

CREATE TABLE section (
    id INTEGER PRIMARY KEY,
    name TEXT,
    parent_path LTREE
);

CREATE INDEX section_parent_path_idx ON section USING GIST (parent_path);

We'll add in our data again, this time rather than using the id for the parent, we will construct an ltree path that represents the parent node.

INSERT INTO section (id, name, parent_path) VALUES (1, 'Section 1', 'root');
INSERT INTO section (id, name, parent_path) VALUES (2, 'Section 1.1', 'root.1');
INSERT INTO section (id, name, parent_path) VALUES (3, 'Section 2', 'root');
INSERT INTO section (id, name, parent_path) VALUES (4, 'Section 2.1', 'root.3');
INSERT INTO section (id, name, parent_path) VALUES (4, 'Section 2.2', 'root.3');
INSERT INTO section (id, name, parent_path) VALUES (5, 'Section 2.2.1', 'root.3.4');

Cool. So now we can make use of ltree's operators @> and <@ to answer our original question like:

SELECT * FROM section WHERE parent_path <@ 'root.3';

However we have introduced a few issues.

Our simple parent_id version ensured referential consistancy by making use of the REFERENCES constraint. We lost that by switching to ltree paths.
Ensuring that the ltree paths are valid can be a bit of a pain, and if paths become stale for some reason your queries may return unexpected results or you may "orphan" nodes.

Source
Reference

Menggunakan Array untuk memetakan jalur. Seperti yang dijelaskan pada artikel dibawah ini: Source

Modeling Hierarchies

There are a number of ways you can represent a tree in a relational database. The most obvious approach is for each node to have a reference to its parent. For instance if we might model a giant eight person company like this:

id | name
---+------------
1  | Harold
2  |   Arthur
3  |     Alice
4  |       Rose
5  |     Bob
6  |   Sally
7  |     Mortimer
8  |     Penny

One common approach is to simply store the id of each person's boss. Although this works for simple cases, it requires multiple queries to get all the parents or children of any node. For instance to find all the people who work under Arthur, you need at least one query per level beneath him.

An alternative is to encode the hierarchy as a materialized path. A materialized path is an array that contains all the ids of the record's parents.

CREATE TABLE employees (
    id        integer primary key,
    path      integer[],
    name      text
);

For example, Alice would have a materialized path of [1,2,3] since the people above her are Harold (id:1), Arthur (id:2), and Alice herself is (id:3).

In these examples, we also include the current record's id in its path, this is optional, but is often helpful if you want to get a record with its parents or children. If you do not encode the current record's id in its path, you may need to cast your empty arrays with ARRAY[]::integer[].

Depth

The depth of a record indicates how deeply nested it is. With a materialized path, this is just the length of the array. Postgres provides array_length(array, dim), which will return the length of an array in dimension dim. Since we are using one dimensional arrays, dim will always be 1.

We can now find all the people in the top two levels of our massive eight person company:

SELECT id, name, path, array_length(path,1) as depth
FROM employees 
WHERE array_length(path,1) <= 2

Doing this will give us back just 3 employees:

 id |  name    | path  | depth 
----+----------+-------+-------
  1 | Harold   | {1}   |     1
  2 |   Arthur | {1,2} |     2
  6 |   Sally  | {1,6} |     2

Children

Finding a record's children is easy with a materialized path. We can apply what we learned about the overlap operator from Tagging in Postgres and ActiveRecord to this problem. Since each record contains the ids of all the parent records, we just look for all the records containing the id we are interested in.

For example, if we want to find all the people who work under Arthur (id:2), we would look for all the paths that overlap him (path && ARRAY[2]):

SELECT id, name, path FROM employees
WHERE path && ARRAY[2]

As you can see, each of the employees returned have Arthur's id in their materialized path:

 id |  name      |   path    
----+------------+-----------
  2 | Arthur     | {1,2}
  3 |   Alice    | {1,2,3}
  4 |     Rose   | {1,2,3,4}
  5 |   Bob      | {1,2,5}

Parents

Finding the parents of a record is just as easy as finding their children. Instead of looking for a path that overlaps a given id, we can look for any id that overlaps a given path.

So if we wanted to find all of Alice's superiors, we would look for all the records whose idoverlaps her path ([1,2,3]).

SELECT id, name, path FROM employees
WHERE ARRAY[id] && ARRAY[1,2,3]

This will give you the three people listed in Alice's path:

 id |  name      |  path   
----+------------+---------
  1 | Harold     | {1}
  2 |   Arthur   | {1,2}
  3 |     Alice  | {1,2,3}

Alternatively, since you already know the ids you are looking for you could also query for those records directly:

SELECT id, name, path
FROM employees 
WHERE id in (1,2,3)

Although both approaches yield the same results, Postgres will use different indices, so if this is a performance critical query, benchmark both approaches in your application.

Moving Records

Up to this point, we have focused on querying a hierarchy. Updating a hierarchy can be a little tricky since child records need to be updated as well.

Let's imagine that Alice and everyone working for her are moved under Mortimer (path: [1,6,7]):

id | name         | path
---+--------------+----------
1  | Harold       | {1}
2  |   Arthur     | {1,2}
5  |     Bob      | {1,2,5}
6  |   Sally      | {1,6}
7  |     Mortimer | {1,6,7}
3  |       Alice  | {1,6,7,3}
4  |         Rose | {1,6,7,3,4}
8  |     Penny    | {1,6,8}

Alice's and her subordinates's path will now need to start with Mortimer's path instead of Arthur's. To perform this operation, we will slice off Arthur's path, and replace it with Mortimer's path:

UPDATE employees
SET path = ARRAY[1,6,7] || path[3:array_length(path,1)]
WHERE path && ARRAY[3]

There are a few things going on here, so let's work through this from the bottom up.

We limit the update to the employees who work under Alice by selecting her, and her child records. To slice off Arthur's path, we get the path from Alice's depth (3) to the end of the path. Now we can tack on Mortimer's path ([1,6,7]).

Now you can move people around your organization.

Menggunakan CTE (Common Table Expressions) untuk membentuk tree struktur: Source

USING RECURSIVE COMMON TABLE EXPRESSIONS TO REPRESENT TREE STRUCTURES

Printer Friendly

A very long time ago, we wrote an article on how to use PostgreSQL to show the fully qualified name of an item in an inventory tree. Basically we were modeling a paper products tree. The original article can be found here Using PostgreSQL User-Defined Functions to solve the Tree Problem and was based on PostgreSQL 7.4 technology.
We'll repeat the text here for completeness and demonstrate the PostgreSQL 8.4 that solves this and more efficiently.

The Problem
Suppose you are tracking supplies and have a field called si_item and another called si_parentid. The parent keeps track of what subclass a supply item belongs to. E.g. you have paper parent that has subclasses such as recycled, non-recycled. When someone takes supplies, you want to return the fully qualified name e.g. Paper->Recycled->20 Lb

Below is what the structure of your table looks like.

si_id int, si_parentid int, si_item. In your table are the following entries


si_id si_parentid si_item
1 Paper
2 1 Recycled
3 2 20 lb
4 2 40 lb
5 1 Non-Recycled
6 5 20 lb
7 5 40 lb
8 5 Scraps

si_id	si_parentid	si_item
1		Paper
2	1	Recycled
3	2	20 lb
4	2	40 lb
5	1	Non-Recycled
6	5	20 lb
7	5	40 lb
8	5	Scraps

Solution

CREATE TABLE supplyitem(si_id integer PRIMARY KEY, si_parentid integer, si_item varchar(100));

--load up the table (multirow constructor introduced in 8.2)
INSERT INTO supplyitem(si_id,si_parentid, si_item)
VALUES (1, NULL, 'Paper'),
(2,1, 'Recycled'),
(3,2, '20 lb'),
(4,2, '40 lb'),
(5,1, 'Non-Recycled'),
(6,5, '20 lb'),
(7,5, '40 lb'),
(8,5, 'Scraps');

--Recursive query (introduced in 8.4 returns fully qualified name)
WITH RECURSIVE supplytree AS
(SELECT si_id, si_item, si_parentid, CAST(si_item As varchar(1000)) As si_item_fullname
FROM supplyitem
WHERE si_parentid IS NULL
UNION ALL
SELECT si.si_id,si.si_item,
 si.si_parentid,
 CAST(sp.si_item_fullname || '->' || si.si_item As varchar(1000)) As si_item_fullname
FROM supplyitem As si
 INNER JOIN supplytree AS sp
 ON (si.si_parentid = sp.si_id)
)
SELECT si_id, si_item_fullname
FROM supplytree
ORDER BY si_item_fullname;

Result looks like

si_id |      si_item_fullname
------+-----------------------------
 1    | Paper
 5    | Paper->Non-Recycled
 6    | Paper->Non-Recycled->20 lb
 7    | Paper->Non-Recycled->40 lb
 8    | Paper->Non-Recycled->Scraps
 2    | Paper->Recycled
 3    | Paper->Recycled->20 lb
 4    | Paper->Recycled->40 lb

Oke cukup sekian, nanti kalo nemu trik lagi akan ane update rangkuman ini. Thanks for the authors.

Ahmad Firuze

Wednesday, October 11, 2017

Managing Hierarchical Data In Postgresql