Berikut ane kumpulin rangkuman/artikel/tanya-jawab seputar cara me-manage data hirarkis pada postgresql.
Menggunakan query standard:
This may have been asked/answered a million times already, but what the
heck...
Basically, I have a table that maintains parent<-->child relationships
within itself. The parent_id field points to the collection_id field.
A parent_id id of -1 means it's a root record (ie, no parent). Pretty
simple.
Question is, how do I sort a query so that children follow their parent?
I came up with this, and it works, but I'm sure there's a better way:
SELECT *, CASE WHEN parent_id = -1 THEN collection_id||'' WHEN parent_id
!= -1 THEN parent_id||collection_id END as z FROM collection order by z;
Any advice will be greatly appreciated.
thanks!
eric
Source
Some self join would work best I suppose:
select p.*, c.*
from collection p, collection.c
where c.parent_id = p.collectionid
order by p.collectionid ASC, c.collectionid
Depending on your datset you might need to use an outer join instead.
Jochem
Source
Menggunakan ltree:
The ltree extension
The
ltree extension is a great choice for querying hierarchical data. This is especially true for self-referential relationships.
Lets rebuild the above example using ltree. We'll use the page's primary keys as the "labels" within our ltree paths and a special "root" label to denote the top of the tree.
CREATE EXTENSION ltree;
CREATE TABLE section (
id INTEGER PRIMARY KEY,
name TEXT,
parent_path LTREE
);
CREATE INDEX section_parent_path_idx ON section USING GIST (parent_path);
We'll add in our data again, this time rather than using the id
for the parent, we will construct an ltree path that represents the parent node.
INSERT INTO section (id, name, parent_path) VALUES (1, 'Section 1', 'root');
INSERT INTO section (id, name, parent_path) VALUES (2, 'Section 1.1', 'root.1');
INSERT INTO section (id, name, parent_path) VALUES (3, 'Section 2', 'root');
INSERT INTO section (id, name, parent_path) VALUES (4, 'Section 2.1', 'root.3');
INSERT INTO section (id, name, parent_path) VALUES (4, 'Section 2.2', 'root.3');
INSERT INTO section (id, name, parent_path) VALUES (5, 'Section 2.2.1', 'root.3.4');
Cool. So now we can make use of ltree's operators @>
and <@
to answer our original question like:
SELECT * FROM section WHERE parent_path <@ 'root.3';
However we have introduced a few issues.
- Our simple
parent_id
version ensured referential consistancy by making use of the REFERENCES
constraint. We lost that by switching to ltree paths.
- Ensuring that the ltree paths are valid can be a bit of a pain, and if paths become stale for some reason your queries may return unexpected results or you may "orphan" nodes.
Source
Reference
Menggunakan Array untuk memetakan jalur. Seperti yang dijelaskan pada artikel dibawah ini:
Source
Modeling Hierarchies
There are a number of ways you can represent a tree in a relational database. The most obvious approach is for each node to have a reference to its parent. For instance if we might model a giant eight person company like this:
id | name
---+------------
1 | Harold
2 | Arthur
3 | Alice
4 | Rose
5 | Bob
6 | Sally
7 | Mortimer
8 | Penny
One common approach is to simply store the id
of each person's boss. Although this works for simple cases, it requires multiple queries to get all the parents or children of any node. For instance to find all the people who work under Arthur, you need at least one query per level beneath him.
An alternative is to encode the hierarchy as a materialized path. A materialized path is an array that contains all the ids of the record's parents.
CREATE TABLE employees (
id integer primary key,
path integer[],
name text
);
For example, Alice would have a materialized path of [1,2,3]
since the people above her are Harold (id:1
), Arthur (id:2
), and Alice herself is (id:3
).
In these examples, we also include the current record's id
in its path
, this is optional, but is often helpful if you want to get a record with its parents or children. If you do not encode the current record's id in its path, you may need to cast your empty arrays with ARRAY[]::integer[]
.
Depth
The depth of a record indicates how deeply nested it is. With a materialized path, this is just the length of the array. Postgres provides array_length(array, dim)
, which will return the length of an array
in dimension dim
. Since we are using one dimensional arrays, dim
will always be 1.
We can now find all the people in the top two levels of our massive eight person company:
SELECT id, name, path, array_length(path,1) as depth
FROM employees
WHERE array_length(path,1) <= 2
Doing this will give us back just 3 employees:
id | name | path | depth
----+----------+-------+-------
1 | Harold | {1} | 1
2 | Arthur | {1,2} | 2
6 | Sally | {1,6} | 2
Children
Finding a record's children is easy with a materialized path. We can apply what we learned about the overlap operator from
Tagging in Postgres and ActiveRecord to this problem. Since each record contains the ids of all the parent records, we just look for all the records containing the
id
we are interested in.
For example, if we want to find all the people who work under Arthur (id:2
), we would look for all the paths that overlap him (path && ARRAY[2]
):
SELECT id, name, path FROM employees
WHERE path && ARRAY[2]
As you can see, each of the employees returned have Arthur's id
in their materialized path:
id | name | path
----+------------+-----------
2 | Arthur | {1,2}
3 | Alice | {1,2,3}
4 | Rose | {1,2,3,4}
5 | Bob | {1,2,5}
Parents
Finding the parents of a record is just as easy as finding their children. Instead of looking for a path
that overlaps a given id
, we can look for any id
that overlaps a given path
.
So if we wanted to find all of Alice's superiors, we would look for all the records whose id
overlaps her path
([1,2,3]
).
SELECT id, name, path FROM employees
WHERE ARRAY[id] && ARRAY[1,2,3]
This will give you the three people listed in Alice's path
:
id | name | path
----+------------+---------
1 | Harold | {1}
2 | Arthur | {1,2}
3 | Alice | {1,2,3}
Alternatively, since you already know the ids you are looking for you could also query for those records directly:
SELECT id, name, path
FROM employees
WHERE id in (1,2,3)
Although both approaches yield the same results, Postgres will use different indices, so if this is a performance critical query, benchmark both approaches in your application.
Moving Records
Up to this point, we have focused on querying a hierarchy. Updating a hierarchy can be a little tricky since child records need to be updated as well.
Let's imagine that Alice and everyone working for her are moved under Mortimer (path: [1,6,7]
):
id | name | path
---+--------------+----------
1 | Harold | {1}
2 | Arthur | {1,2}
5 | Bob | {1,2,5}
6 | Sally | {1,6}
7 | Mortimer | {1,6,7}
3 | Alice | {1,6,7,3}
4 | Rose | {1,6,7,3,4}
8 | Penny | {1,6,8}
Alice's and her subordinates's path
will now need to start with Mortimer's path instead of Arthur's. To perform this operation, we will slice off Arthur's path, and replace it with Mortimer's path:
UPDATE employees
SET path = ARRAY[1,6,7] || path[3:array_length(path,1)]
WHERE path && ARRAY[3]
There are a few things going on here, so let's work through this from the bottom up.
We limit the update to the employees who work under Alice by selecting her, and her child records. To slice off Arthur's path
, we get the path
from Alice's depth (3
) to the end of the path
. Now we can tack on Mortimer's path ([1,6,7]
).
Now you can move people around your organization.
Menggunakan CTE (Common Table Expressions) untuk membentuk tree struktur:
Source
Printer Friendly
A very long time ago, we wrote an article on how to use PostgreSQL to show the fully qualified name of an item in an inventory tree. Basically we were modeling a paper products tree. The original article can be found here
Using PostgreSQL User-Defined Functions to solve the Tree Problem and was based on PostgreSQL 7.4 technology.
We'll repeat the text here for completeness and demonstrate the PostgreSQL 8.4 that solves this and more efficiently.
The Problem
Suppose you are tracking supplies and have a field called si_item and another called si_parentid. The parent keeps track of what subclass a supply item belongs to. E.g. you have paper parent that has subclasses such as recycled, non-recycled. When someone takes supplies, you want to return the fully qualified name e.g. Paper->Recycled->20 Lb
Below is what the structure of your table looks like.
si_id int, si_parentid int, si_item. In your table are the following entries
si_id | si_parentid | si_item |
1 | | Paper |
2 | 1 | Recycled |
3 | 2 | 20 lb |
4 | 2 | 40 lb |
5 | 1 | Non-Recycled |
6 | 5 | 20 lb |
7 | 5 | 40 lb |
8 | 5 | Scraps |
Solution
CREATE TABLE supplyitem(si_id integer PRIMARY KEY, si_parentid integer, si_item varchar(100));
--load up the table (multirow constructor introduced in 8.2)
INSERT INTO supplyitem(si_id,si_parentid, si_item)
VALUES (1, NULL, 'Paper'),
(2,1, 'Recycled'),
(3,2, '20 lb'),
(4,2, '40 lb'),
(5,1, 'Non-Recycled'),
(6,5, '20 lb'),
(7,5, '40 lb'),
(8,5, 'Scraps');
--Recursive query (introduced in 8.4 returns fully qualified name)
WITH RECURSIVE supplytree AS
(SELECT si_id, si_item, si_parentid, CAST(si_item As varchar(1000)) As si_item_fullname
FROM supplyitem
WHERE si_parentid IS NULL
UNION ALL
SELECT si.si_id,si.si_item,
si.si_parentid,
CAST(sp.si_item_fullname || '->' || si.si_item As varchar(1000)) As si_item_fullname
FROM supplyitem As si
INNER JOIN supplytree AS sp
ON (si.si_parentid = sp.si_id)
)
SELECT si_id, si_item_fullname
FROM supplytree
ORDER BY si_item_fullname;
Result looks like
si_id | si_item_fullname
------+-----------------------------
1 | Paper
5 | Paper->Non-Recycled
6 | Paper->Non-Recycled->20 lb
7 | Paper->Non-Recycled->40 lb
8 | Paper->Non-Recycled->Scraps
2 | Paper->Recycled
3 | Paper->Recycled->20 lb
4 | Paper->Recycled->40 lb
Oke cukup sekian, nanti kalo nemu trik lagi akan ane update rangkuman ini. Thanks for the authors.