HIERARCHICAL structures are everywhere! Your family structure, Ancestral history, your phone and address books, corporate structures – all resemble a well defined hierarchy.
The simplest representation of hierarchies is in the form of an upside down ‘Tree‘. The ‘root‘ of the tree is where it all begins, the ‘root’ expands into multiple ‘tree nodes‘, these ‘tree nodes’ may contain other tree nodes; and at the end of each ‘tree node’ is a ‘leaf node‘.. for more information on data tree structures please refer to this article: Tree (data structure)
In software development, no doubt you would have come across tree structures, and specifically hierarchical data structures, in your career. One pitfall that you would have experienced is that computing the entire tree structure on the fly for any system is a costly process. As the saying goes ‘..the apple never falls far from the tree’, and thus even computing the structure of a single tree node can be just as costly.
This is where a special data type in SQL comes to play: the hierarchyid type! In this article I will explain what the herierachyid type is, and provide examples on how you can use this in your database design for a clean, succinct data structure, and how you will benefit from the massive performance gains by doing so..
The Basics
Tree Structures
A well defined hierarchical structure can be achieved in many different ways within SQL server; ranging from simple self-referencing tables, to more complex one-to-one, or one-to-many relation tables. No matter what your design is you cannot move away from having an EntityID column and a ParentEntityID column (or a similar field referencing its parent entity). These two fields are the fundamental properties that define your hierarchical structure… think of them as your link between one ‘tree branch’ to the next ‘tree branch’. You can traverse up the tree and ultimately reach the ‘root’ entity (remember the tree is upside down!). Or you can traverse down the tree and ultimately reach the ‘leaf nodes’.
Time consuming computations
Computing the structure on-the-fly is pretty demanding, even if its processing within SQL Server. The idea is that if you have a random tree node, and wish to retrieve all descendants (ie every other tree node and leaf node below the current selected node) then the system needs to traverse quite possibly a large number of nodes.. processing each node recursively as it goes, until it ultimately reaches nodes without any child nodes (aka the leaf node).
In practice, if you consider an organisation structure with Managers (represented by tree nodes) and Employees (represented by leaf nodes).. then this is analogous to retrieving all Employees under a certain Manager.. say the CIO and all employees reporting underneath him/her.
The same logic applies if you wish to go up the tree to find all the ancestors of a given leaf node. With the organisation example above, this is like grabbing all Upper Managers for an Employee.
Whatever your requirement is, there is bound to be a need for tree traversal. Often is the case that if you have a poorly designed database structure then your system will be plagued by poor performance.
The Alternative Solution
One way of achieving great performance in such a scenario is by the use of SQL hierarchyid types. hierarchyid types help alleviate the computation stress on your server by pre-computing the relationship amongst records at the time of insertion.
What is stored in the hierarchyid column is the binary representation of the full entity path from its root entity. This is near-identical to the Unix File system and how they represent file paths:
- ‘/’ – denotes the root entity path
- ‘/Files’ – denotes the path to the Files folder
- ‘/Files/test.log’ – denotes the path to a file called test.log inside the Files folder.. and so on
Instead of using alphanumeric values for pathing, SQL Server hierarchyids are represented by numeric values:
- ‘/’ – denotes the root entity path
- ‘/1/’ – denotes the first child added to the structure
- ‘/2/’ – denotes the second child added to the structure
- ‘/2/1/’ – denotes the first sub child added underneath the second child of the root entity
SQL Server automatically calculates the logical path of the entity being inserted. It does this by inspecting the hierarchyid of the Parent entity for which the new entity is to be attached to.
Hierarchyid Methods
SQL Server has built in functions that help you navigate and retrieve data tied to hierarchyid fields.
- GetAncestor (int n) – Gets the n’th ancestor of the entity
- IsDescendantOf (hierarhcyid h) – Returns true if the current entity is a descendant of ‘h’
- ToString () – Returns the logical path of the entity in string format
- GetDescendant (child1, child2) – Returns a child hierarchyid of the current entity. The output varies based on the value of the two input parameters.
These are some of the more common methods that you will be using. The usage syntax is pretty straight forward. All you need to do is simply call the method on a hierarchyid variable or field.
For example:
DECLARE @Manager hierarchyid SELECT @Manager = CAST('/3/1' AS hierarchyid) -- Returns all Employees that is a descendant of @Manager SELECT * FROM Employee E WHERE E.Hierarchy.IsDescendantOf(@Manager) = 1
Practical Example
Consider the following database schema:
Before we proceed to insert sample data, lets visit the INSERT logic for this table.
Insert Logic
The general idea is as follows:
- Check if this is the first ‘root’ entity to be inserted.
- If true: the hierarchyid = to the root path of ‘/’.
- If false: we need to grab the Manager record and use its hierarchyid to determine the next hierarchyid for its next child.
- Once we determine the new child hierarchyid, we can proceed with the insert.
The complex part is determining the next hierarchyid to insert for this new child record.
- GetAncestor(1) = @ManagerHierarchyID: This formula simply filters the records and returns all the children of the record identified with hierarchyid = @ManagerHierarchyID
- NewHierarchyID = MAX(Hierarchy): Returns the last child record in the list
- GetDescendant(@NewHierarchyID, NULL): Generates the next hierarchyid AFTER that of @NewHierarchyID. In other words, whatever the path is defined as the last child record, generate the subsequent hierarchyid. So if @NewHierarchyID = ‘/1/2/’, this method call will return ‘1/3/’
Insert
CREATE PROC [dbo].[uspEmployee_Insert] @EmployeeID VARCHAR(5) , @Firstname VARCHAR(50) , @Surname VARCHAR(50) , @PositionTitle VARCHAR(50) , @ManagerID VARCHAR(5) AS BEGIN DECLARE @ManagerHierarchyID hierarchyid DECLARE @NewHierarchyID hierarchyid IF @ManagerID IS NULL BEGIN SET @NewHierarchyID = hierarchyid::GetRoot() END ELSE BEGIN -- [1] Identify the Manager Record -- [2] Find the LAST child attached to the Manager Record (we need this when calling the GetDescendant method) -- [3] Generate the next hierarchyid that is AFTER the LAST child retrieved above -- [4] Insert the new Employee with the newly generated hierarchyid -- [1] SELECT @ManagerHierarchyID = E.Hierarchy FROM Employee E WHERE E.EmployeeID = @ManagerID -- [2] SELECT @NewHierarchyID = MAX(Hierarchy) FROM Employee E WHERE E.Hierarchy.GetAncestor(1) = @ManagerHierarchyID -- [3] SELECT @NewHierarchyID = @ManagerHierarchyID.GetDescendant(@NewHierarchyID, NULL) END -- [4] INSERT INTO Employee SELECT @EmployeeID , @Firstname , @Surname , @PositionTitle , @NewHierarchyID END
With this completed, we can now create a hierarchical structure for our table!
Data Insert
The follow SQL will generate a small organisation structure as depicted below:
EXEC [dbo].[uspEmployee_Insert] @EmployeeID = N'E0001', @Firstname = N'David', @Surname = N'Smith', @PositionTitle = N'CEO', @ManagerID = NULL EXEC [dbo].[uspEmployee_Insert] @EmployeeID = N'E0002', @Firstname = N'Peter', @Surname = N'Levy', @PositionTitle = N'COO', @ManagerID = 'E0001' EXEC [dbo].[uspEmployee_Insert] @EmployeeID = N'E0003', @Firstname = N'Trevor', @Surname = N'Bailey', @PositionTitle = N'GTS Manager', @ManagerID = 'E0001' EXEC [dbo].[uspEmployee_Insert] @EmployeeID = N'E0004', @Firstname = N'Kurt', @Surname = N'Douglas', @PositionTitle = N'Sales Director', @ManagerID = 'E0001' EXEC [dbo].[uspEmployee_Insert] @EmployeeID = N'E0005', @Firstname = N'Ken', @Surname = N'Kumar', @PositionTitle = N'CIO', @ManagerID = 'E0002' EXEC [dbo].[uspEmployee_Insert] @EmployeeID = N'E0006', @Firstname = N'Samantha', @Surname = N'Dwellings', @PositionTitle = N'HR General Manager', @ManagerID = 'E0002' EXEC [dbo].[uspEmployee_Insert] @EmployeeID = N'E0007', @Firstname = N'Verity', @Surname = N'McDonald', @PositionTitle = N'IT Manager', @ManagerID = 'E0005' EXEC [dbo].[uspEmployee_Insert] @EmployeeID = N'E0008', @Firstname = N'Douglas', @Surname = N'Morgan', @PositionTitle = N'IT Developer', @ManagerID = 'E0007' EXEC [dbo].[uspEmployee_Insert] @EmployeeID = N'E0009', @Firstname = N'Paul', @Surname = N'Lynch', @PositionTitle = N'BI Developer', @ManagerID = 'E0007' EXEC [dbo].[uspEmployee_Insert] @EmployeeID = N'E0010', @Firstname = N'Jenny', @Surname = N'Davison', @PositionTitle = N'DB Developer', @ManagerID = 'E0007' EXEC [dbo].[uspEmployee_Insert] @EmployeeID = N'E0011', @Firstname = N'Monica', @Surname = N'Adams', @PositionTitle = N'HR Manager', @ManagerID = 'E0006' EXEC [dbo].[uspEmployee_Insert] @EmployeeID = N'E0012', @Firstname = N'Bree', @Surname = N'Lucas', @PositionTitle = N'HR Advisor', @ManagerID = 'E0011'
You should now have the following records in your Employee table:
Retrieval Logic
This is where the fun starts… In the examples below, I will use Employee E0006 (HR General Manager) as a point of reference.
All Child records
Firstly, to retrieve all the Children for an Employee we use the logic:
- item.GetAncestor(1) = @SourceHierarchyID
The GetAncestor() method returns all records that is an ancestor (ie parent) of the current item. The parameter defines how far up the hierarchy you wish to go. So the above logic returns records ‘one level’ above the current item which is our queried E0006 Employee.
DECLARE @SourceHierarchyID AS hierarchyid SELECT @SourceHierarchyID = E.Hierarchy FROM Employee E WHERE E.EmployeeID = 'E0006' SELECT 'E0006' AS EmployeeID , E.EmployeeID AS ChildEmployeeID , E.Firstname AS ChildFirstname , E.Surname AS ChildSurname , E.PositionTitle AS ChildPositionTitle , E.Hierarchy AS ChildHierarchy , E.Hierarchy.GetLevel() AS Level FROM Employee E WHERE E.Hierarchy.GetAncestor(1) = @SourceHierarchyID ORDER BY E.Hierarchy.GetLevel() ASC
Wrapping this into a completed function:
CREATE FUNCTION [dbo].[udfEmployee_GetAllChildrenByEmployeeID] ( @SourceEmployeeID AS VARCHAR(5) ) RETURNS @tbl TABLE (EmployeeID VARCHAR(5) , ChildEmployeeID VARCHAR(5) , ChildFirstname VARCHAR(50) , ChildSurname VARCHAR(50) , ChildPositionTitle VARCHAR(50) , ChildHierarchy hierarchyid , ChildLevel INT ) AS BEGIN DECLARE @SourceHierarchyID AS hierarchyid SELECT @SourceHierarchyID = E.Hierarchy FROM Employee E WHERE E.EmployeeID = @SourceEmployeeID INSERT INTO @tbl SELECT @SourceEmployeeID AS EmployeeID , E.EmployeeID AS ChildEmployeeID , E.Firstname AS ChildFirstname , E.Surname AS ChildSurname , E.PositionTitle AS ChildPositionTitle , E.Hierarchy AS ChildHierarchy , E.Hierarchy.GetLevel() AS Level FROM Employee E WHERE E.Hierarchy.GetAncestor(1) = @SourceHierarchyID ORDER BY E.Hierarchy.GetLevel() ASC RETURN END
To call this new function, pass it the EmployeeID of the reference entity you want to start with; and it should return all child records as requested:
SELECT * FROM dbo.udfEmployee_GetAllChildrenByEmployeeID ('E0006') t
All Child records: output
All Descendant records
Another common function is the retrieval of all child records, done recursively until the bottom of the tree structure is reached. This is referred to as all the descendants of an entity.
note: IsDescendantOf (@SourceHierarchyID) will return true if the parameter object is a sub child, or the parameter object itself. In otherwords, an entity is a descendant of itself…!
CREATE FUNCTION [dbo].[udfEmployee_GetAllDescendantsByEmployeeID] ( @SourceEmployeeID AS VARCHAR(5) ) RETURNS @tbl TABLE (EmployeeID VARCHAR(5) , ChildEmployeeID VARCHAR(5) , ChildFirstname VARCHAR(50) , ChildSurname VARCHAR(50) , ChildPositionTitle VARCHAR(50) , ChildHierarchy hierarchyid , ChildLevel INT ) AS BEGIN DECLARE @SourceHierarchyID AS hierarchyid SELECT @SourceHierarchyID = E.Hierarchy FROM Employee E WHERE E.EmployeeID = @SourceEmployeeID INSERT INTO @tbl SELECT @SourceEmployeeID AS EmployeeID , E.EmployeeID AS ChildEmployeeID , E.Firstname AS ChildFirstname , E.Surname AS ChildSurname , E.PositionTitle AS ChildPositionTitle , E.Hierarchy AS ChildHierarchy , E.Hierarchy.GetLevel() AS Level FROM Employee E WHERE E.Hierarchy.IsDescendantOf(@SourceHierarchyID) = 1 ORDER BY E.Hierarchy.GetLevel() ASC RETURN END
SELECT * FROM dbo.udfEmployee_GetAllDescendantsByEmployeeID ('E0006') t
All Descendant records: output
All Ancestor records
This is the opposite of All Descendants and simply goes up the tree until the root entity is returned.
CREATE FUNCTION [dbo].[udfEmployee_GetAllAncestorsByEmployeeID] ( @SourceEmployeeID AS VARCHAR(5) ) RETURNS @tbl TABLE (EmployeeID VARCHAR(5) , ChildEmployeeID VARCHAR(5) , ChildFirstname VARCHAR(50) , ChildSurname VARCHAR(50) , ChildPositionTitle VARCHAR(50) , ChildHierarchy hierarchyid , ChildLevel INT ) AS BEGIN DECLARE @SourceHierarchyID AS hierarchyid SELECT @SourceHierarchyID = E.Hierarchy FROM Employee E WHERE E.EmployeeID = @SourceEmployeeID INSERT INTO @tbl SELECT @SourceEmployeeID AS EmployeeID , E.EmployeeID AS ChildEmployeeID , E.Firstname AS ChildFirstname , E.Surname AS ChildSurname , E.PositionTitle AS ChildPositionTitle , E.Hierarchy AS ChildHierarchy , E.Hierarchy.GetLevel() AS Level FROM Employee E WHERE @SourceHierarchyID.IsDescendantOf(E.Hierarchy) = 1 ORDER BY E.Hierarchy.GetLevel() DESC RETURN END
All Ancestor records: output
All Sibling records
A Sibling is considered all entities with the same parent entitiy. So in this example, all siblins of E0006 has the same Manager of E0006 which is E0002.
CREATE FUNCTION [dbo].[udfEmployee_GetAllSiblingsByEmployeeID] ( @SourceEmployeeID AS VARCHAR(5) ) RETURNS @tbl TABLE (EmployeeID VARCHAR(5) , ChildEmployeeID VARCHAR(5) , ChildFirstname VARCHAR(50) , ChildSurname VARCHAR(50) , ChildPositionTitle VARCHAR(50) , ChildHierarchy hierarchyid , ChildLevel INT ) AS BEGIN DECLARE @ManagerHierarchyID AS hierarchyid SELECT @ManagerHierarchyID = E.Hierarchy.GetAncestor(1) FROM Employee E WHERE E.EmployeeID = @SourceEmployeeID INSERT INTO @tbl SELECT @SourceEmployeeID AS EmployeeID , E.EmployeeID AS ChildEmployeeID , E.Firstname AS ChildFirstname , E.Surname AS ChildSurname , E.PositionTitle AS ChildPositionTitle , E.Hierarchy AS ChildHierarchy , E.Hierarchy.GetLevel() AS Level FROM Employee E WHERE @ManagerHierarchyID = E.Hierarchy.GetAncestor(1) ORDER BY E.Hierarchy.GetLevel() DESC RETURN END
All Sibling records: output
Update / Move Logic
Finally, one last thing to cover… is the ability to move an entity and its entire sub-hierarchy into a different branch of the tree. E0006 (HR General Manager) currently reports to E0002 (COO). If we wanted to move E0006 to report under say E0005 (CIO), we need to perform the following logic:
- Grab the Manager record and use its hierarchyid to determine the next hierarchyid for its next child.
- Once we determine the new child hierarchyid, we use GetReparentedValue(@oldH, @newH) to restructure the current entity and its entire sub-tree.
The complete logic to achieve this update is as follows:
ALTER PROCEDURE [dbo].[uspEmployee_Move]( @EmployeeID AS VARCHAR(5) , @NewManagerID AS VARCHAR(5) ) AS BEGIN DECLARE @oldH hierarchyid DECLARE @newH hierarchyid DECLARE @ManagerHierarchyID hierarchyid SELECT @oldH = E.Hierarchy FROM Employee E WHERE E.EmployeeID = @EmployeeID SET TRANSACTION ISOLATION LEVEL SERIALIZABLE BEGIN TRANSACTION -- [1] Identify the Manager Record -- [2] Find the LAST child attached to the Manager Record (we need this when calling the GetDescendant method) -- [3] Generate the next hierarchyid that is AFTER the LAST child retrieved above -- [4] Insert the new Employee with the newly generated hierarchyid -- [1] SELECT @ManagerHierarchyID = Hierarchy FROM Employee e WHERE E.EmployeeID = @NewManagerID -- [2] SELECT @newH = MAX(Hierarchy) FROM Employee E WHERE E.Hierarchy.GetAncestor(1) = @ManagerHierarchyID -- [3] SELECT @newH = @ManagerHierarchyID.GetDescendant(@newH, NULL) UPDATE Employee SET Hierarchy = Hierarchy.GetReparentedValue(@oldH, @newH) WHERE Hierarchy.IsDescendantOf(@oldH) = 1 COMMIT END
Once we have this Stored Procedure created, we can now safely test it out. As above, lets move the entire sub-tree of E0006 to report to E0005. Below you will find the snippet that does this, along with the reversal back to the original structure.
-- Move E0006 (HR General Manager) to report under E0005 (CIO) -- Note the recursive native of the update will also change the hierarhcy for descendants of E0006 EXEC [dbo].[uspEmployee_Move] @EmployeeID = N'E0006', @NewManagerID = N'E0005' -- Move back E0006 (HR General Manager) to report under E0002 (COO) EXEC [dbo].[uspEmployee_Move] @EmployeeID = N'E0006', @NewManagerID = N'E0002'
Update / Move: Before the change
Update / Move: After the change
Conclusion
If you’ve come this far, well done! Hope this article has given you insight into the SQL Server hierarchyid types. Along with that, you should now be in a better position to decide whether hierarchyid types is well suited for your system, or whether you prefer traditional data relationships to define your data structure.
If you have any questions, let me know. 🙂
October 4th, 2012 on 2:20 AM
I don’t know who you actually are and I haven’t done a deep dive on the code you’ve posted yet, but you should use your real name… You’re article is nicely laid out with appropriate section headers, the graphics appear to be appropriate and are well done, and the article is a pretty easy read. Even though I don’t personally care for leading comma’s, your code is very nicely formated, you’ve adopted a very nice and easy to read “casing” and “tabbing” style, and you’ve followed it throughout your code.
It’s not my job to judge but I just had to say “Nice Job” on this one, whoever you are.
–Jeff Moden
October 4th, 2012 on 6:36 AM
Hi Jeff, thanks for your compliments I appreciate it 🙂
I take pride in formatting all the code I write. Makes it easier to read and understand – not just for others, but for myself also when I need to refer back to it every now and then lol 😀 There’s a link at the top of my blog that will take you to my LinkedIn profile.. thanks again for visiting my site!
Happy coding..! ! Cheers.
Jason 🙂