This topic introduces the use of recursive common table expressions and recursive views.
Some applications work with data that is recursive in nature. A Bill of Materials (BOM) application, for instance, works with the expansion of parts and its component subparts. For example, a chair might be made of a seat unit and a leg assembly. The seat unit might consist of a seat and two arms. Each of these parts can be further broken down into its subparts until there is a list of all the parts needed to build a chair. This type of query can be handled by using a recursive common table expression or a recursive view.
CREATE TABLE FLIGHTS (DEPARTURE CHAR(20), ARRIVAL CHAR(20), CARRIER CHAR(15), FLIGHT_NUMBER CHAR(5), PRICE INT) INSERT INTO FLIGHTS VALUES('New York', 'Paris', 'Atlantic', '234', 400) INSERT INTO FLIGHTS VALUES('Chicago', 'Miami', 'NA Air', '2334', 300) INSERT INTO FLIGHTS VALUES('New York', 'London', 'Atlantic', '5473', 350) INSERT INTO FLIGHTS VALUES('London', 'Athens' , 'Mediterranean', '247', 340) INSERT INTO FLIGHTS VALUES('Athens', 'Nicosia' , 'Mediterranean', '2356', 280) INSERT INTO FLIGHTS VALUES('Paris', 'Madrid' , 'Euro Air', '3256', 380) INSERT INTO FLIGHTS VALUES('Paris', 'Cairo' , 'Euro Air', '63', 480) INSERT INTO FLIGHTS VALUES('Chicago', 'Frankfurt', 'Atlantic', '37', 480) INSERT INTO FLIGHTS VALUES('Frankfurt', 'Moscow', 'Asia Air', '2337', 580) INSERT INTO FLIGHTS VALUES('Frankfurt', 'Beijing', 'Asia Air', '77', 480) INSERT INTO FLIGHTS VALUES('Moscow', 'Tokyo', 'Asia Air', '437', 680) INSERT INTO FLIGHTS VALUES('Frankfurt', 'Vienna', 'Euro Air', '59', 200) INSERT INTO FLIGHTS VALUES('Paris', 'Rome', 'Euro Air', '534', 340) INSERT INTO FLIGHTS VALUES('Miami', 'Lima', 'SA Air', '5234', 530) INSERT INTO FLIGHTS VALUES('New York', 'Los Angeles', 'NA Air', '84', 330) INSERT INTO FLIGHTS VALUES('Los Angeles', 'Tokyo', 'Pacific Air', '824', 530) INSERT INTO FLIGHTS VALUES('Tokyo', 'Hong Kong', 'Asia Air', '94', 330) INSERT INTO FLIGHTS VALUES('Washington', 'Toronto', 'NA Air', '104', 250) CREATE TABLE TRAINS(DEPARTURE CHAR(20), ARRIVAL CHAR(20), RAILLINE CHAR(15), TRAIN CHAR(5), PRICE INT) INSERT INTO TRAINS VALUES('Chicago', 'Washington', 'UsTrack', '323', 90) INSERT INTO TRAINS VALUES('Madrid', 'Barcelona', 'EuroTrack', '5234', 60) INSERT INTO TRAINS VALUES('Washington' , 'Boston' , 'UsTrack', '232', 50)
WITH destinations (origin, departure, arrival, flight_count) AS (SELECT a.departure, a.departure, a.arrival, 1 FROM flights a WHERE a.departure = 'Chicago' UNION ALL SELECT r.origin, b.departure, b.arrival, r.flight_count + 1 FROM destinations r, flights b WHERE r.arrival = b.departure) SELECT origin, departure, arrival, flight_count FROM destinations
This query returns the following information:
ORIGIN | DEPARTURE | ARRIVAL | FLIGHT_COUNT |
---|---|---|---|
Chicago | Chicago | Miami | 1 |
Chicago | Chicago | Frankfurt | 1 |
Chicago | Miami | Lima | 2 |
Chicago | Frankfurt | Moscow | 2 |
Chicago | Frankfurt | Beijing | 2 |
Chicago | Frankfurt | Vienna | 2 |
Chicago | Moscow | Tokyo | 3 |
Chicago | Tokyo | Hong Kong | 4 |
This recursive query is written in two parts. The first part of the common table expression is called the intialization fullselect. It selects the first rows for the result set of the common table expression. In this example, it selects the two rows in the flights table that get you directly to another location from Chicago. It also initializes the number of flight legs to one for each row it selects.
The second part of the recursive query joins the rows from the current result set of the common table expression with other rows from the original table. It is called the iterative fullselect. This is where the recursion is introduced. Notice that the rows that have already been selected for the result set are referenced by using the name of the common table expression as the table name and the common table expression result column names as the column names.
In this recursive part of the query, any rows from the original table that you can get to from each of the previously selected arrival cities are selected. A previously selected row's arrival city becomes the new departure city. Each row from this recursive select increments the flight count to the destination by one more flight. As these new rows are added to the common table expression result set, they are also fed into the iterative fullselect to generate more result set rows. In the data for the final result, you can see that the total number of flights is actually the total number of recursive joins (plus 1) it took to get to that arrival city.
CREATE VIEW destinations (origin, departure, arrival, flight_count) AS SELECT departure, departure, arrival, 1 FROM flights WHERE departure = 'Chicago' UNION ALL SELECT r.origin, b.departure, b.arrival, r.flight_count + 1 FROM destinations r, flights b WHERE r.arrival = b.departure)
The interactive fullselect part of this view definition refers to the view itself. Selection from this view returns the same rows as you get from the previous recursive common table expression.
Now, to make the query a bit more complicated, suppose you are willing to fly from either Chicago or New York, and you want to know where you could go and how much it would cost.
WITH destinations (departure, arrival, connections, cost) AS (SELECT a.departure, a.arrival, 0, price FROM flights a WHERE a.departure = 'Chicago' OR a.departure = 'New York' UNION ALL SELECT r.departure, b.arrival, r.connections + 1, r.cost + b.price FROM destinations r, flights b WHERE r.arrival = b.departure) SELECT departure, arrival, connections, cost FROM destinations
This query returns the following information:
DEPARTURE | ARRIVAL | CONNECTIONS | COST |
---|---|---|---|
Chicago | Miami | 0 | 300 |
Chicago | Frankfurt | 0 | 480 |
New York | Paris | 0 | 400 |
New York | London | 0 | 350 |
New York | Los Angeles | 0 | 330 |
Chicago | Lima | 1 | 830 |
Chicago | Moscow | 1 | 1,060 |
Chicago | Beijing | 1 | 960 |
Chicago | Vienna | 1 | 680 |
New York | Madrid | 1 | 780 |
New York | Cairo | 1 | 880 |
New York | Rome | 1 | 740 |
New York | Athens | 1 | 690 |
New York | Tokyo | 1 | 860 |
Chicago | Tokyo | 2 | 1,740 |
New York | Nicosia | 2 | 970 |
New York | Hong Kong | 2 | 1,190 |
Chicago | Hong Kong | 3 | 2,070 |
For each returned row, the results show the starting departure city and the final destination city. It counts the number of connections needed rather than the total number of flight and adds up the total cost for all the flights.
Now, suppose you start in Chicago but add in transportation by railway in addition to the airline flights, and you want to know which cities you can go to.
The following query returns that information:
WITH destinations (departure, arrival, connections, flights, trains, cost) AS (SELECT f.departure, f.arrival, 0, 1, 0, price FROM flights f WHERE f.departure = 'Chicago' UNION ALL SELECT t.departure, t.arrival, 0, 0, 1, price FROM trains t WHERE t.departure = 'Chicago' UNION ALL SELECT r.departure, b.arrival, r.connections + 1 , r.flights + 1, r.trains, r.cost + b.price FROM destinations r, flights b WHERE r.arrival = b.departure UNION ALL SELECT r.departure, c.arrival, r.connections + 1 , r.flights, r.trains + 1, r.cost + c.price FROM destinations r, trains c WHERE r.arrival = c.departure) SELECT departure, arrival, connections, flights, trains, cost FROM destinations
This query returns the following information:
DEPARTURE | ARRIVAL | CONNECTIONS | FLIGHTS | TRAINS | COST |
---|---|---|---|---|---|
Chicago | Miami | 0 | 1 | 0 | 300 |
Chicago | Frankfurt | 0 | 1 | 0 | 480 |
Chicago | Washington | 0 | 0 | 1 | 90 |
Chicago | Lima | 1 | 2 | 0 | 830 |
Chicago | Moscow | 1 | 2 | 0 | 1,060 |
Chicago | Beijing | 1 | 2 | 0 | 960 |
Chicago | Vienna | 1 | 2 | 0 | 680 |
Chicago | Toronto | 1 | 1 | 1 | 340 |
Chicago | Boston | 1 | 0 | 2 | 140 |
Chicago | Tokyo | 2 | 3 | 0 | 1,740 |
Chicago | Hong Kong | 3 | 4 | 0 | 2,070 |
In this example, there are two parts of the common table expression that provide initialization values to the query: one for flights and one for trains. For each of the result rows, there are two recursive references to get from the previous arrival location to the next possible destination: one for continuing by air, the other for continuing by train. In the final results, you would see how many connections are needed and how many airline or train trips can be taken.
The two examples here show the difference in the result set row order based on whether the recursion is processed depth first or breadth first.
The option to determine the result using breadth first or depth first is a recursive relationship sort based on the recursive join column specified for the SEARCH BY clause. When the recursion is handled breadth first, all children are processed first, then all grandchildren, then all great grandchildren. When the recursion is handled depth first, the full recursive ancestry chain of one child is processed before going to the next child.
In both of these cases, you specify an extra column name that is used by the recursive process to keep track of the depth first or breadth first ordering. This column must be used in the ORDER BY clause of the outer query to get the rows back in the specified order. If this column is not used in the ORDER BY, the DEPTH FIRST or BREADTH FIRST processing option is ignored.
The selection of which column to use for the SEARCH BY column is important. To have any meaning in the result, it must be the column that is used in the iterative fullselect to join from the initialization fullselect. In this example, ARRIVAL is the column to use.
The following query returns that information:
WITH destinations (departure, arrival, connections, cost) AS (SELECT f.departure, f.arrival, 0, price FROM flights f WHERE f.departure = 'Chicago' UNION ALL SELECT r.departure, b.arrival, r.connections + 1, r.cost + b.price FROM destinations r, flights b WHERE r.arrival = b.departure) SEARCH DEPTH FIRST BY arrival SET ordcol SELECT * FROM destinations ORDER BY ordcol
This query returns the following information:
DEPARTURE | ARRIVAL | CONNECTIONS | COST |
---|---|---|---|
Chicago | Miami | 0 | 300 |
Chicago | Lima | 1 | 830 |
Chicago | Frankfurt | 0 | 480 |
Chicago | Moscow | 1 | 1,060 |
Chicago | Tokyo | 2 | 1,740 |
Chicago | Hong Kong | 3 | 2,070 |
Chicago | Beijing | 1 | 960 |
Chicago | Vienna | 1 | 680 |
In this result data, you can see that all destinations that are generated from the Chicago-to-Miami row are listed before the destinations from the Chicago-to-Frankfort row.
Next, you can run the same query but request the result to be ordered breadth first.
WITH destinations (departure, arrival, connections, cost) AS (SELECT f.departure, f.arrival, 0, price FROM flights f WHERE f.departure='Chicago' UNION ALL SELECT r.departure, b.arrival, r.connections + 1, r.cost + b.price FROM destinations r, flights b WHERE r.arrival = b.departure) SEARCH BREADTH FIRST BY arrival SET ordcol SELECT * FROM destinations ORDER BY ordcol
This query returns the following information:
DEPARTURE | ARRIVAL | CONNECTIONS | COST |
---|---|---|---|
Chicago | Miami | 0 | 300 |
Chicago | Frankfurt | 0 | 480 |
Chicago | Lima | 1 | 830 |
Chicago | Moscow | 1 | 1,060 |
Chicago | Beijing | 1 | 960 |
Chicago | Vienna | 1 | 680 |
Chicago | Tokyo | 2 | 1,740 |
Chicago | Hong Kong | 3 | 2,070 |
In this result data, you can see that all the direct connections from Chicago are listed before the connecting flights. The data is identical to the results from the previous query, but in a breadth first order.
The key to any recursive process, whether it is a recursive programming algorithm or querying recursive data, is that the recursion must be finite. If not, you will get into a never ending loop. The CYCLE option allows you to safeguard against cyclic data. Not only will it terminate repeating cycles but it also allows you to optionally output a cycle mark indicator that may lead you to find cyclic data.
For a final example, suppose we have a cycle in the data. By adding one more row to the table, there is now a flight from Cairo to Paris and one from Paris to Cairo. Without accounting for possible cyclic data like this, it is quite easy to generate a query that will go into an infinite loop processing the data.
The following query returns that information:
INSERT INTO FLIGHTS VALUES('Cairo', 'Paris', 'Euro Air', '1134', 440) WITH destinations (departure, arrival, connections, cost, itinerary) AS (SELECT f.departure, f.arrival, 1, price, CAST(f.departure CONCAT f.arrival AS VARCHAR(2000)) FROM flights f WHERE f.departure = 'New York' UNION ALL SELECT r.departure, b.arrival, r.connections + 1 , r.cost + b.price, CAST(r.itinerary CONCAT b.arrival AS VARCHAR(2000)) FROM destinations r, flights b WHERE r.arrival = b.departure) CYCLE arrival SET cyclic_data TO '1' DEFAULT '0' SELECT departure, arrival, itinerary, cyclic_data FROM destinations ORDER BY cyclic_data
This query returns the following information:
DEPARTURE | ARRIVAL | ITINERARY | CYCLIC_DATA |
---|---|---|---|
New York | Paris | New York Paris | 0 |
New York | London | New York London | 0 |
New York | Los Angeles | New York Los Angeles | 0 |
New York | Madrid | New York Paris Madrid | 0 |
New York | Cairo | New York Paris Cairo | 0 |
New York | Rome | New York Paris Rome | 0 |
New York | Athens | New York London Athens | 0 |
New York | Tokyo | New York Los Angeles Tokyo | 0 |
New York | Paris | New York Paris Cairo Paris | 1 |
New York | Nicosia | New York London Athens Nicosia | 0 |
New York | Hong Kong | New York Los Angeles Tokyo Hong Kong | 0 |
In this example, the ARRIVAL column is defined in the CYCLE clause as the column to use for detecting a cycle in the data. When a cycle is found, a special column, CYCLIC_DATA in this case, is set to the character value of '1' for the cycling row in the result set. All other rows will contain the default value of '0'. When a cycle on the ARRIVAL column is found, processing will not proceed any further in the data so the infinite loop will not happen. To see if your data actually has a cyclic reference, the CYCLIC_DATA column can be referenced in the outer query.