Re: Help with ROW_NUMBER and recursive query



Well, that's a nice picture, but how do you update the values of lft & rgt when there are n millions of rows in the table? The model also breaks down when multiple concurrent updates are being processed as the worst case update will need to modify n -1 rows (essentially the whole table) and will most efficiently execute when holding the table lock. This model does work reasonably well for queries however.

Also remember that the difference between Ptolemy and Kepler is only in their point of reference (datum) and neither is more true than the other - and both generally acknowledged as false by modern physics.

"--CELKO--" <jcelko212@xxxxxxxxxxxxx> wrote in message news:b9b3ecfc-5889-46ba-ae1d-719d871383f0@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
There are many ways to represent a tree or hierarchy in SQL. This is
called an adjacency list model and it looks like this:

CREATE TABLE OrgChart
(emp_name CHAR(10) NOT NULL PRIMARY KEY,
boss_emp_name CHAR(10) REFERENCES OrgChart(emp_name),
<< horrible cycle constraints >>);

OrgChart
emp_name boss_emp_name
==============================
'Albert' NULL
'Bert' 'Albert'
'Chuck' 'Albert'
'Donna' 'Chuck'
'Eddie' 'Chuck'
'Fred' 'Chuck' 600.00

This approach will wind up with really ugly code -- CTEs hiding
recursive procedures, horrible cycle prevention code, etc. This
matches the way we did it in old file systems with pointer chains.
Non-RDBMS programmers are comfortable with it because it looks
familiar -- it looks like records and not rows.

Another way of representing trees is to show them as nested sets.

Since SQL is a set oriented language, this is a better model than the
adjacency list approach. Let us define a simple OrgChart table like
this.

CREATE TABLE OrgChart
(emp_name CHAR(10) NOT NULL PRIMARY KEY,
lft INTEGER NOT NULL UNIQUE CHECK (lft > 0),
rgt INTEGER NOT NULL UNIQUE CHECK (rgt > 1),
CONSTRAINT order_okay CHECK (lft < rgt),
<<cycle constraint>> );

OrgChart
emp_name lft rgt
======================
'Albert' 1 12
'Bert' 2 3
'Chuck' 4 11
'Donna' 5 6
'Eddie' 7 8
'Fred' 9 10

The (lft, rgt) pairs are like tags in a mark-up language, or parens in
algebra, BEGIN-END blocks in Algol-family programming languages, etc.
-- they bracket a sub-set. This is a set-oriented approach to trees
in a set-oriented language.

The organizational chart would look like this as a directed graph:

Albert (1, 12)
/ \
/ \
Bert (2, 3) Chuck (4, 11)
/ | \
/ | \
/ | \
/ | \
Donna (5, 6) Eddie (7, 8) Fred (9, 10)

The adjacency list table is denormalized in several ways. The
boss_emp_name and employee columns are the same kind of thing (i.e.
identifiers of personnel), and therefore should be shown in only one
column in a normalized base table. To prove that this is not
normalized, assume that "Chuck" changes his name to "Charles"; you
have to change his name in both columns and several places. The
defining characteristic of a normalized table is that you have one
fact, one place, one time. In short this is a relationship which
needs a base table of Personnel.

The final problem is that the adjacency list model does not model
subordination. It is a graph and not a hierarchy. Inheritance flows
downhill in a hierarchy, but If I fire Chuck, I disconnect all of his
subordinates from Albert. There are situations (i.e. water pipes)
where this is true, but that is not the expected situation in this
case.

The tree structure can be kept in one table and all the information
about a node can be put in a second table and they can be joined on
employee number for queries.

The nested sets model has some predictable results that we can use for
building queries. The root is always (left = 1, right = 2 * (SELECT
COUNT(*) FROM TreeTable)); leaf nodes always have (left + 1 = right);
subtrees are defined by the BETWEEN predicate; etc. Here are two
common queries which can be used to build others:

1. An employee and all their Supervisors, no matter how deep the tree.

SELECT O2.*
FROM OrgChart AS O1, OrgChart AS O2
WHERE O1.lft BETWEEN O2.lft AND O2.rgt
AND O1.emp_name = :myemployee;

2. The employee and all their subordinates. There is a nice symmetry
here.

SELECT O1.*
FROM OrgChart AS O1, OrgChart AS O2
WHERE O1.lft BETWEEN O2.lft AND O2.rgt
AND O2.emp_name = :myemployee;

3. Add a GROUP BY and aggregate functions to these basic queries and
you have hierarchical reports. For example, the total salaries which
each employee controls:

SELECT O2.emp_name, SUM(S1.salary_amt)
FROM OrgChart AS O1, OrgChart AS O2,
Salaries AS S1
WHERE O1.lft BETWEEN O2.lft AND O2.rgt
AND O1.emp_name = S1.emp_name
GROUP BY O2.emp_name;

4.The nested set model has an implied ordering of siblings which the
adjacency list model does not. This is often important in overachieves
where the eldest child is promoted in the parent's place, the company
has a “last hired, first fired” rule, etc.

Since you will not state your specs or post DDL, I will guess that
what you are after is a level number in a hierarchical entity in your
schema. Here is a query in nested sets that
a. Will port to any SQL. No proprietary “features” in it.
b. Will run 2 to 3 orders of magnitude faster than your attempt.
Read that again so you understand it. Magnitude, not times.
c. Fits in less than ten lines of code, so people can maintain it.
Unlike your monster.

5. To find the level of each emp_name, so you can print the tree as an
indented listing.

SELECT T1.node,
SUM(CASE WHEN T2.lft <= T1.lft THEN 1 ELSE 0 END
+ CASE WHEN T2.rgt < T1.lft THEN -1 ELSE 0 END) AS lvl
FROM Tree AS T1, Tree AS T2
WHERE T2.lft <= T1.lft
GROUP BY T1.node;

An untested version of this using full OLAP functions might be better
able to use the ordering. This will not run in SQL Server because it
still lacks the RANGE subclause

SELECT T1.node,
SUM(CASE WHEN T2.lft <= T1.lft THEN 1 ELSE 0 END
+ CASE WHEN T2.rgt < T1.lft THEN -1 ELSE 0 END)
OVER (ORDER BY T1.lft
RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS
lvl
FROM Tree AS T1, Tree AS T2
WHERE T2.lft <= T1.lft;

Since I have been teaching SQL for a few decades, I need to know WHY
people write bad code. Most errors are not random; they are the
result of a bad mental model. Very often the bad model can be made to
work, but at great expense.

In 150 AD, Ptolemy of Alexandria published his theory of epicycles--
the idea that the moon, the sun and the planets moved in circles which
were moving in circles which were moving in circles around the Earth.
This theory explained the motion of celestial objects to an
astonishing degree of precision. It was, however, what computer
programmers call a kludge: a dirty, inelegant solution. Some 1,500
years later, Johannes Kepler, a German astronomer, replaced the whole
complex edifice with three simple laws.

If my guess is right,m WHY did you come up with such a screaming
nightmare? Let me go into teacher/doctor mode:

Since you used CROSS APPLY, my guess is that you think in terms of
spreadsheets instead of tables. Their rows and columns are
interchangeable. Sequential numbering is important because of the co-
ordinate system, hence the assumption that you must use ROW_NUMBER().
The name of the spreadsheet is not part of the language like a table
name is in SQL, so you think is not important. At most it is a level
number in a 3D mental model. Spreadsheets are declarative, but
computational. That leads to the same “formula” being replicated in
your query.

Is my diagnosis right? Yes? No? (with an explanation of your mindset)
or "Duh - I never considered my mindset" ?

.



Relevant Pages

  • Re: Self-join question
    ... There are many ways to represent a tree or hierarchy in SQL. ... CREATE TABLE OrgChart ... rgt INTEGER NOT NULL UNIQUE CHECK, ... Here is version with a stack in SQL/PSM. ...
    (microsoft.public.sqlserver.programming)
  • Re: How to normalise this scenario
    ... define a simple OrgChart table like this. ... lft INTEGER NOT NULL UNIQUE CHECK, ... rgt INTEGER NOT NULL UNIQUE CHECK, ... note that the tree structure ...
    (comp.databases.theory)
  • Re: CTE and children count
    ... lft INTEGER NOT NULL UNIQUE CHECK, ... rgt INTEGER NOT NULL UNIQUE CHECK, ... To show a tree as nested sets, replace the nodes with ovals, and then ... Here is version with a stack in SQL/PSM. ...
    (microsoft.public.sqlserver.programming)
  • Re: Generating a tree structure from data
    ... The nodes and they tree ... define a simple OrgChart table like this. ... lft INTEGER NOT NULL UNIQUE CHECK, ... rgt INTEGER NOT NULL UNIQUE CHECK, ...
    (comp.databases.ms-sqlserver)
  • Re: Tree Structure & Triggers.
    ... Keeping the lft and rgt synchronized in a dynamic ... and rgt stay within the "parent" boundaries if a new slot is added. ... > define a simple OrgChart table like this. ...
    (microsoft.public.sqlserver.programming)