HIERARCHICAL structures are everywhere! Your family structure, Ancestral history, your phone and address books, corporate structures – all resemble a well defined hierarchy.

The simplest representation of hierarchies is in the form of an upside down ‘Tree‘.  The ‘root‘ of the tree is where it all begins, the ‘root’ expands into multiple ‘tree nodes‘, these ‘tree nodes’ may contain other tree nodes; and at the end of each ‘tree node’ is a ‘leaf node‘..  for more information on data tree structures please refer to this article: Tree (data structure)

In software development, no doubt you would have come across tree structures, and specifically hierarchical data structures, in your career. One pitfall that you would have experienced is that computing the entire tree structure on the fly for any system is a costly process. As the saying goes ‘..the apple never falls far from the tree’, and thus even computing the structure of a single tree node can be just as costly.

This is where a special data type in SQL comes to play: the hierarchyid type!  In this article I will explain what the herierachyid type is, and provide examples on how you can use this in your database design for a clean, succinct data structure, and how you will benefit from the massive performance gains by doing so..

The Basics

Tree Structures

A well defined hierarchical structure can be achieved in many different ways within SQL server; ranging from simple self-referencing tables, to more complex one-to-one, or one-to-many relation tables. No matter what your design is you cannot move away from having an EntityID column and a ParentEntityID column (or a similar field referencing its parent entity).  These two fields are the fundamental properties that define your hierarchical structure… think of them as your link between one ‘tree branch’ to the next ‘tree branch’. You can traverse up the tree and ultimately reach the ‘root’ entity (remember the tree is upside down!). Or you can traverse down the tree and ultimately reach the ‘leaf nodes’.

Time consuming computations

Computing the structure on-the-fly is pretty demanding, even if its processing within SQL Server. The idea is that if you have a random tree node, and wish to retrieve all descendants (ie every other tree node and leaf node below the current selected node) then the system needs to traverse quite possibly a large number of nodes.. processing each node recursively as it goes, until it ultimately reaches nodes without any child nodes (aka the leaf node).

In practice, if you consider an organisation structure with Managers (represented by tree nodes) and Employees (represented by leaf nodes).. then this is analogous to retrieving all Employees under a certain Manager..  say the CIO and all employees reporting underneath him/her.

The same logic applies if you wish to go up the tree to find all the ancestors of a given leaf node. With the organisation example above, this is like grabbing all Upper Managers for an Employee.

Whatever your requirement is, there is bound to be a need for tree traversal. Often is the case that if you have a poorly designed database structure then your system will be plagued by poor performance.

The Alternative Solution

One way of achieving great performance in such a scenario is by the use of SQL hierarchyid types. hierarchyid types help alleviate the computation stress on your server by pre-computing the relationship amongst records at the time of insertion.

What is stored in the hierarchyid column is the binary representation of the full entity path from its root entity. This is near-identical to the Unix File system and how they represent file paths:

  • ‘/’ – denotes the root entity path
  • ‘/Files’ – denotes the path to the Files folder
  • ‘/Files/test.log’ – denotes the path to a file called test.log inside the Files folder.. and so on

Instead of using alphanumeric values for pathing, SQL Server hierarchyids are represented by numeric values:

  • ‘/’ – denotes the root entity path
  • ‘/1/’ – denotes the first child added to the structure
  • ‘/2/’ – denotes the second child added to the structure
  • ‘/2/1/’ – denotes the first sub child added underneath the second child of the root entity

SQL Server automatically calculates the logical path of the entity being inserted. It does this by inspecting the hierarchyid of the Parent entity for which the new entity is to be attached to.

Hierarchyid Methods

SQL Server has built in functions that help you navigate and retrieve data tied to hierarchyid fields.

  • GetAncestor (int n) – Gets the n’th ancestor of the entity
  • IsDescendantOf (hierarhcyid h) – Returns true if the current entity is a descendant of ‘h’
  • ToString () – Returns the logical path of the entity in string format
  • GetDescendant (child1, child2) – Returns a child hierarchyid of the current entity. The output varies based on the value of the two input parameters.

These are some of the more common methods that you will be using. The usage syntax is pretty straight forward. All you need to do is simply call the method on a hierarchyid variable or field.

For example:

DECLARE @Manager hierarchyid
SELECT @Manager = CAST('/3/1' AS hierarchyid)

-- Returns all Employees that is a descendant of @Manager
SELECT *
FROM Employee E
WHERE E.Hierarchy.IsDescendantOf(@Manager) = 1

Practical Example

Consider the following database schema:

Before we proceed to insert sample data, lets visit the INSERT logic for this table.

Insert Logic

The general idea is as follows:

  1. Check if this is the first ‘root’ entity to be inserted.
  2. If true: the hierarchyid = to the root path of ‘/’.
  3. If false: we need to grab the Manager record and use its hierarchyid to determine the next hierarchyid for its next child.
  4. Once we determine the new child hierarchyid, we can proceed with the insert.

The complex part is determining the next hierarchyid to insert for this new child record.

  • GetAncestor(1) = @ManagerHierarchyID: This formula simply filters the records and returns all the children of the record identified with hierarchyid = @ManagerHierarchyID
  • NewHierarchyID = MAX(Hierarchy): Returns the last child record in the list
  • GetDescendant(@NewHierarchyID, NULL): Generates the next hierarchyid AFTER that of @NewHierarchyID. In other words, whatever the path is defined as the last child record, generate the subsequent hierarchyid. So if @NewHierarchyID = ‘/1/2/’, this method call will return ‘1/3/’

Insert

CREATE PROC [dbo].[uspEmployee_Insert]
	@EmployeeID		VARCHAR(5)
	, @Firstname	VARCHAR(50)
	, @Surname		VARCHAR(50)
	, @PositionTitle	VARCHAR(50)
	, @ManagerID		VARCHAR(5)
AS
BEGIN

	DECLARE @ManagerHierarchyID hierarchyid
	DECLARE @NewHierarchyID hierarchyid

	IF @ManagerID IS NULL
	BEGIN
		SET @NewHierarchyID = hierarchyid::GetRoot()
	END
	ELSE
	BEGIN

		-- [1] Identify the Manager Record
		-- [2] Find the LAST child attached to the Manager Record (we need this when calling the GetDescendant method)
		-- [3] Generate the next hierarchyid that is AFTER the LAST child retrieved above
		-- [4] Insert the new Employee with the newly generated hierarchyid

		-- [1]
		SELECT @ManagerHierarchyID = E.Hierarchy
		FROM Employee E
		WHERE E.EmployeeID = @ManagerID

		-- [2]
		SELECT @NewHierarchyID = MAX(Hierarchy)
		FROM Employee E
		WHERE E.Hierarchy.GetAncestor(1) = @ManagerHierarchyID

		-- [3]
		SELECT @NewHierarchyID = @ManagerHierarchyID.GetDescendant(@NewHierarchyID, NULL)
	END

	-- [4]
	INSERT INTO Employee
	SELECT @EmployeeID
			, @Firstname
			, @Surname
			, @PositionTitle
			, @NewHierarchyID

END

With this completed, we can now create a hierarchical structure for our table!

Data Insert

The follow SQL will generate a small organisation structure as depicted below:

EXEC [dbo].[uspEmployee_Insert] @EmployeeID = N'E0001',	@Firstname = N'David',		@Surname = N'Smith',		@PositionTitle = N'CEO',			@ManagerID = NULL
EXEC [dbo].[uspEmployee_Insert] @EmployeeID = N'E0002',	@Firstname = N'Peter',		@Surname = N'Levy',			@PositionTitle = N'COO',			@ManagerID = 'E0001'
EXEC [dbo].[uspEmployee_Insert] @EmployeeID = N'E0003',	@Firstname = N'Trevor',		@Surname = N'Bailey',		@PositionTitle = N'GTS Manager',	@ManagerID = 'E0001'
EXEC [dbo].[uspEmployee_Insert] @EmployeeID = N'E0004',	@Firstname = N'Kurt',		@Surname = N'Douglas',		@PositionTitle = N'Sales Director',	@ManagerID = 'E0001'
EXEC [dbo].[uspEmployee_Insert] @EmployeeID = N'E0005',	@Firstname = N'Ken',		@Surname = N'Kumar',		@PositionTitle = N'CIO',			@ManagerID = 'E0002'
EXEC [dbo].[uspEmployee_Insert] @EmployeeID = N'E0006',	@Firstname = N'Samantha',	@Surname = N'Dwellings',	@PositionTitle = N'HR General Manager',	@ManagerID = 'E0002'
EXEC [dbo].[uspEmployee_Insert] @EmployeeID = N'E0007',	@Firstname = N'Verity',		@Surname = N'McDonald',		@PositionTitle = N'IT Manager',		@ManagerID = 'E0005'
EXEC [dbo].[uspEmployee_Insert] @EmployeeID = N'E0008',	@Firstname = N'Douglas',	@Surname = N'Morgan',		@PositionTitle = N'IT Developer',	@ManagerID = 'E0007'
EXEC [dbo].[uspEmployee_Insert] @EmployeeID = N'E0009',	@Firstname = N'Paul',		@Surname = N'Lynch',		@PositionTitle = N'BI Developer',	@ManagerID = 'E0007'
EXEC [dbo].[uspEmployee_Insert] @EmployeeID = N'E0010',	@Firstname = N'Jenny',		@Surname = N'Davison',		@PositionTitle = N'DB Developer',	@ManagerID = 'E0007'
EXEC [dbo].[uspEmployee_Insert] @EmployeeID = N'E0011',	@Firstname = N'Monica',		@Surname = N'Adams',		@PositionTitle = N'HR Manager',		@ManagerID = 'E0006'
EXEC [dbo].[uspEmployee_Insert] @EmployeeID = N'E0012',	@Firstname = N'Bree',		@Surname = N'Lucas',		@PositionTitle = N'HR Advisor',		@ManagerID = 'E0011'

You should now have the following records in your Employee table:

Retrieval Logic

This is where the fun starts… In the examples below, I will use Employee E0006 (HR General Manager) as a point of reference.

All Child records

Firstly, to retrieve all the Children for an Employee we use the logic: 

  • item.GetAncestor(1) = @SourceHierarchyID

The GetAncestor() method returns all records that is an ancestor (ie parent) of the current item. The parameter defines how far up the hierarchy you wish to go. So the above logic returns records ‘one level’ above the current item which is our queried E0006 Employee.

	DECLARE @SourceHierarchyID AS hierarchyid

	SELECT @SourceHierarchyID = E.Hierarchy
	FROM Employee E
	WHERE E.EmployeeID = 'E0006'

	SELECT 'E0006'				AS EmployeeID
			, E.EmployeeID		AS ChildEmployeeID
			, E.Firstname		AS ChildFirstname
			, E.Surname			AS ChildSurname
			, E.PositionTitle	AS ChildPositionTitle
			, E.Hierarchy		AS ChildHierarchy
			, E.Hierarchy.GetLevel()	AS Level
	FROM Employee E
	WHERE E.Hierarchy.GetAncestor(1) = @SourceHierarchyID
	ORDER BY E.Hierarchy.GetLevel() ASC

Wrapping this into a completed function:

CREATE FUNCTION [dbo].[udfEmployee_GetAllChildrenByEmployeeID]
(
	@SourceEmployeeID AS VARCHAR(5)
)
RETURNS @tbl TABLE (EmployeeID				VARCHAR(5)
					, ChildEmployeeID		VARCHAR(5)
					, ChildFirstname		VARCHAR(50)
					, ChildSurname			VARCHAR(50)
					, ChildPositionTitle	VARCHAR(50)
					, ChildHierarchy		hierarchyid
					, ChildLevel			INT
					)

AS
BEGIN

	DECLARE @SourceHierarchyID AS hierarchyid

	SELECT @SourceHierarchyID = E.Hierarchy
	FROM Employee E
	WHERE E.EmployeeID = @SourceEmployeeID

	INSERT INTO @tbl
	SELECT @SourceEmployeeID	AS EmployeeID
			, E.EmployeeID		AS ChildEmployeeID
			, E.Firstname		AS ChildFirstname
			, E.Surname			AS ChildSurname
			, E.PositionTitle	AS ChildPositionTitle
			, E.Hierarchy		AS ChildHierarchy
			, E.Hierarchy.GetLevel()	AS Level
	FROM Employee E
	WHERE E.Hierarchy.GetAncestor(1) = @SourceHierarchyID
	ORDER BY E.Hierarchy.GetLevel() ASC

	RETURN

END

To call this new function, pass it the EmployeeID of the reference entity you want to start with; and it should return all child records as requested:

SELECT *
FROM dbo.udfEmployee_GetAllChildrenByEmployeeID ('E0006') t

All Child records: output

All Descendant records

Another common function is the retrieval of all child records, done recursively until the bottom of the tree structure is reached. This is referred to as all the descendants of an entity.

note: IsDescendantOf (@SourceHierarchyID) will return true if the parameter object is a sub child, or the parameter object itself. In otherwords, an entity is a descendant of itself…!

CREATE FUNCTION [dbo].[udfEmployee_GetAllDescendantsByEmployeeID]
(
	@SourceEmployeeID AS VARCHAR(5)
)
RETURNS @tbl TABLE (EmployeeID				VARCHAR(5)
					, ChildEmployeeID		VARCHAR(5)
					, ChildFirstname		VARCHAR(50)
					, ChildSurname			VARCHAR(50)
					, ChildPositionTitle	VARCHAR(50)
					, ChildHierarchy		hierarchyid
					, ChildLevel			INT
					)

AS
BEGIN

	DECLARE @SourceHierarchyID AS hierarchyid

	SELECT @SourceHierarchyID = E.Hierarchy
	FROM Employee E
	WHERE E.EmployeeID = @SourceEmployeeID

	INSERT INTO @tbl
	SELECT @SourceEmployeeID	AS EmployeeID
			, E.EmployeeID		AS ChildEmployeeID
			, E.Firstname		AS ChildFirstname
			, E.Surname			AS ChildSurname
			, E.PositionTitle	AS ChildPositionTitle
			, E.Hierarchy		AS ChildHierarchy
			, E.Hierarchy.GetLevel()	AS Level
	FROM Employee E
	WHERE E.Hierarchy.IsDescendantOf(@SourceHierarchyID) = 1
	ORDER BY E.Hierarchy.GetLevel() ASC

	RETURN

END

 

SELECT *
FROM dbo.udfEmployee_GetAllDescendantsByEmployeeID ('E0006') t

All Descendant records: output

All Ancestor records

This is the opposite of All Descendants and simply goes up the tree until the root entity is returned.

CREATE FUNCTION [dbo].[udfEmployee_GetAllAncestorsByEmployeeID]
(
	@SourceEmployeeID AS VARCHAR(5)
)
RETURNS @tbl TABLE (EmployeeID				VARCHAR(5)
					, ChildEmployeeID		VARCHAR(5)
					, ChildFirstname		VARCHAR(50)
					, ChildSurname			VARCHAR(50)
					, ChildPositionTitle	VARCHAR(50)
					, ChildHierarchy		hierarchyid
					, ChildLevel			INT
					)

AS
BEGIN

	DECLARE @SourceHierarchyID AS hierarchyid

	SELECT @SourceHierarchyID = E.Hierarchy
	FROM Employee E
	WHERE E.EmployeeID = @SourceEmployeeID

	INSERT INTO @tbl
	SELECT @SourceEmployeeID	AS EmployeeID
			, E.EmployeeID		AS ChildEmployeeID
			, E.Firstname		AS ChildFirstname
			, E.Surname			AS ChildSurname
			, E.PositionTitle	AS ChildPositionTitle
			, E.Hierarchy		AS ChildHierarchy
			, E.Hierarchy.GetLevel()	AS Level
	FROM Employee E
	WHERE @SourceHierarchyID.IsDescendantOf(E.Hierarchy) = 1
	ORDER BY E.Hierarchy.GetLevel() DESC

	RETURN

END

All Ancestor records: output

All Sibling records

A Sibling is considered all entities with the same parent entitiy. So in this example, all siblins of E0006 has the same Manager of E0006 which is E0002.

CREATE FUNCTION [dbo].[udfEmployee_GetAllSiblingsByEmployeeID]
(
	@SourceEmployeeID AS VARCHAR(5)
)
RETURNS @tbl TABLE (EmployeeID				VARCHAR(5)
					, ChildEmployeeID		VARCHAR(5)
					, ChildFirstname		VARCHAR(50)
					, ChildSurname			VARCHAR(50)
					, ChildPositionTitle	VARCHAR(50)
					, ChildHierarchy		hierarchyid
					, ChildLevel			INT
					)

AS
BEGIN

	DECLARE @ManagerHierarchyID AS hierarchyid

	SELECT @ManagerHierarchyID = E.Hierarchy.GetAncestor(1)
	FROM Employee E
	WHERE E.EmployeeID = @SourceEmployeeID

	INSERT INTO @tbl
	SELECT @SourceEmployeeID	AS EmployeeID
			, E.EmployeeID		AS ChildEmployeeID
			, E.Firstname		AS ChildFirstname
			, E.Surname			AS ChildSurname
			, E.PositionTitle	AS ChildPositionTitle
			, E.Hierarchy		AS ChildHierarchy
			, E.Hierarchy.GetLevel()	AS Level
	FROM Employee E
	WHERE @ManagerHierarchyID = E.Hierarchy.GetAncestor(1)
	ORDER BY E.Hierarchy.GetLevel() DESC

	RETURN

END

All Sibling records: output

Update / Move Logic

Finally, one last thing to cover… is the ability to move an entity and its entire sub-hierarchy into a different branch of the tree. E0006 (HR General Manager) currently reports to E0002 (COO). If we wanted to move E0006 to report under say E0005 (CIO), we need to perform the following logic:

  1. Grab the Manager record and use its hierarchyid to determine the next hierarchyid for its next child.
  2. Once we determine the new child hierarchyid, we use GetReparentedValue(@oldH, @newH) to restructure the current entity and its entire sub-tree.

The complete logic to achieve this update is as follows:

ALTER PROCEDURE [dbo].[uspEmployee_Move](
	@EmployeeID			AS VARCHAR(5)
	, @NewManagerID		AS VARCHAR(5)
)
AS
BEGIN

	DECLARE @oldH hierarchyid
	DECLARE @newH hierarchyid
	DECLARE @ManagerHierarchyID hierarchyid

	SELECT @oldH = E.Hierarchy
	FROM Employee E
	WHERE E.EmployeeID = @EmployeeID

	SET TRANSACTION ISOLATION LEVEL SERIALIZABLE
	BEGIN TRANSACTION

		-- [1] Identify the Manager Record
		-- [2] Find the LAST child attached to the Manager Record (we need this when calling the GetDescendant method)
		-- [3] Generate the next hierarchyid that is AFTER the LAST child retrieved above
		-- [4] Insert the new Employee with the newly generated hierarchyid

		-- [1]
		SELECT @ManagerHierarchyID = Hierarchy
		FROM Employee e
		WHERE E.EmployeeID = @NewManagerID

		-- [2]
		SELECT @newH = MAX(Hierarchy)
		FROM Employee E
		WHERE E.Hierarchy.GetAncestor(1) = @ManagerHierarchyID

		-- [3]
		SELECT @newH = @ManagerHierarchyID.GetDescendant(@newH, NULL)

		UPDATE Employee
		SET Hierarchy = Hierarchy.GetReparentedValue(@oldH, @newH)
		WHERE Hierarchy.IsDescendantOf(@oldH) = 1

	COMMIT
END

Once we have this Stored Procedure created, we can now safely test it out. As above, lets move the entire sub-tree of E0006 to report to E0005.  Below you will find the snippet that does this, along with the reversal back to the original structure.

-- Move E0006 (HR General Manager) to report under E0005 (CIO)
-- Note the recursive native of the update will also change the hierarhcy for descendants of E0006
EXEC	[dbo].[uspEmployee_Move]
		@EmployeeID = N'E0006',
		@NewManagerID = N'E0005'

-- Move back E0006 (HR General Manager) to report under E0002 (COO)
EXEC	[dbo].[uspEmployee_Move]
		@EmployeeID = N'E0006',
		@NewManagerID = N'E0002'

Update / Move: Before the change

Update / Move: After the change

Conclusion

If you’ve come this far, well done!  Hope this article has given you insight into the SQL Server hierarchyid types. Along with that, you should now be in a better position to decide whether hierarchyid types is well suited for your system, or whether you prefer traditional data relationships to define your data structure.

If you have any questions, let me know. :-)