How does SQL recursion actually work?
Coming to SQL from other programming languages, the structure of a recursive query looks rather odd. Walk through it step by step, and it seems to fall apart.
Consider the following simple example:
CREATE TABLE #NUMS
(
N BIGINT
)
;
INSERT INTO #NUMS
VALUES
(3),
(5),
(7)
;
WITH R AS
(
SELECT N FROM #NUMS
UNION ALL
SELECT N*N AS N FROM R WHERE N*N < 10000000
)
SELECT N FROM R ORDER BY N
;
Let's walk through it.
First, the anchor member executes and the result set is put into R. So R is initialized to {3, 5, 7}.
Then, execution drops below the UNION ALL and the recursive member is executed for the first time. It executes on R (that is, on the R that we currently have in hand: {3, 5, 7}). This results in {9, 25, 49}.
What does it do with this new result? Does it append {9, 25, 49} to the existing {3, 5, 7}, label the resulting union R, and then carry on with the recursion from there? Or does it redefine R to be only this new result {9, 25, 49} and do all the union-ing later?
Neither choice makes sense.
If R is now {3, 5, 7, 9, 25, 49} and we execute the next iteration of the recursion, then we will end up with {9, 25, 49, 81, 625, 2401} and we've lost {3, 5, 7}.
If R is now only {9, 25, 49}, then we have a mislabeling problem. R is understood to be the union of the anchor member result set and all the subsequent recursive member result sets. Whereas {9, 25, 49} is only a component of R. It is not the full R that we have accrued so far. Therefore, to write the recursive member as selecting from R makes no sense.
************* EDIT *************
I certainly appreciate what @Max Vernon and @Michael S. have detailed below. Namely, that (1) all the components are created up to the recursion limit or null set, and then (2) all the components are unioned together. This is how I understand SQL recursion to actually work.
If we were redesigning SQL, maybe we would enforce a more clear and explicit syntax, something like this:
WITH R AS
(
SELECT N
INTO R[0]
FROM #NUMS
UNION ALL
SELECT N*N AS N
INTO R[K+1]
FROM R[K]
WHERE N*N < 10000000
)
SELECT N FROM R ORDER BY N
;
Sort of like an inductive proof in mathematics.
The problem with SQL recursion as it currently stands is that it is written in a confusing way. The way it is written says that each component is formed by selecting from R, but it does not mean the full R that has been (or, appears to have been) constructed so far. It just means the previous component.
sql-server cte recursive
add a comment |
Coming to SQL from other programming languages, the structure of a recursive query looks rather odd. Walk through it step by step, and it seems to fall apart.
Consider the following simple example:
CREATE TABLE #NUMS
(
N BIGINT
)
;
INSERT INTO #NUMS
VALUES
(3),
(5),
(7)
;
WITH R AS
(
SELECT N FROM #NUMS
UNION ALL
SELECT N*N AS N FROM R WHERE N*N < 10000000
)
SELECT N FROM R ORDER BY N
;
Let's walk through it.
First, the anchor member executes and the result set is put into R. So R is initialized to {3, 5, 7}.
Then, execution drops below the UNION ALL and the recursive member is executed for the first time. It executes on R (that is, on the R that we currently have in hand: {3, 5, 7}). This results in {9, 25, 49}.
What does it do with this new result? Does it append {9, 25, 49} to the existing {3, 5, 7}, label the resulting union R, and then carry on with the recursion from there? Or does it redefine R to be only this new result {9, 25, 49} and do all the union-ing later?
Neither choice makes sense.
If R is now {3, 5, 7, 9, 25, 49} and we execute the next iteration of the recursion, then we will end up with {9, 25, 49, 81, 625, 2401} and we've lost {3, 5, 7}.
If R is now only {9, 25, 49}, then we have a mislabeling problem. R is understood to be the union of the anchor member result set and all the subsequent recursive member result sets. Whereas {9, 25, 49} is only a component of R. It is not the full R that we have accrued so far. Therefore, to write the recursive member as selecting from R makes no sense.
************* EDIT *************
I certainly appreciate what @Max Vernon and @Michael S. have detailed below. Namely, that (1) all the components are created up to the recursion limit or null set, and then (2) all the components are unioned together. This is how I understand SQL recursion to actually work.
If we were redesigning SQL, maybe we would enforce a more clear and explicit syntax, something like this:
WITH R AS
(
SELECT N
INTO R[0]
FROM #NUMS
UNION ALL
SELECT N*N AS N
INTO R[K+1]
FROM R[K]
WHERE N*N < 10000000
)
SELECT N FROM R ORDER BY N
;
Sort of like an inductive proof in mathematics.
The problem with SQL recursion as it currently stands is that it is written in a confusing way. The way it is written says that each component is formed by selecting from R, but it does not mean the full R that has been (or, appears to have been) constructed so far. It just means the previous component.
sql-server cte recursive
The execution plan lays it out, i just suck at reading it. Seems sql server calculates the predicate (after scanning your table of values) and determines whatWHERE N*N < 10000000would be, how many iterations / rows that is... then computesN * Nfor that many rows which is 9... joins it to the outer query (the R cte in this case),a nd appends to the original 3 values. Meh, someone smarter will answer i'm sure
– scsimon
4 hours ago
Recursion in SQL is indeed rather counter-intuitive. Your second alternative R = { 9, 25, 49 } is likely the way it is implemented in all DBMS I've tried. I tend to think about it as if each iteration is used as a feed for the next one, and then added to the final result. Iteration stops once an empty set is the intermediate result.
– Lennart
3 hours ago
In regards to your last edit, SQL Server is designed for set based logic... so recursion, loops, RBAR methods are outside of it's wheel house. Granted you know that, but not surprised the way it would handle recursion with the same engine is blury.
– scsimon
2 hours ago
"If R is now {3, 5, 7, 9, 25, 49} and we execute the next iteration of the recursion, then we will end up with {9, 25, 49, 81, 625, 2401} and we've lost {3, 5, 7}." I don't see how you lose {3,5,7} if it works like that.
– yper-crazyhat-cubeᵀᴹ
2 hours ago
add a comment |
Coming to SQL from other programming languages, the structure of a recursive query looks rather odd. Walk through it step by step, and it seems to fall apart.
Consider the following simple example:
CREATE TABLE #NUMS
(
N BIGINT
)
;
INSERT INTO #NUMS
VALUES
(3),
(5),
(7)
;
WITH R AS
(
SELECT N FROM #NUMS
UNION ALL
SELECT N*N AS N FROM R WHERE N*N < 10000000
)
SELECT N FROM R ORDER BY N
;
Let's walk through it.
First, the anchor member executes and the result set is put into R. So R is initialized to {3, 5, 7}.
Then, execution drops below the UNION ALL and the recursive member is executed for the first time. It executes on R (that is, on the R that we currently have in hand: {3, 5, 7}). This results in {9, 25, 49}.
What does it do with this new result? Does it append {9, 25, 49} to the existing {3, 5, 7}, label the resulting union R, and then carry on with the recursion from there? Or does it redefine R to be only this new result {9, 25, 49} and do all the union-ing later?
Neither choice makes sense.
If R is now {3, 5, 7, 9, 25, 49} and we execute the next iteration of the recursion, then we will end up with {9, 25, 49, 81, 625, 2401} and we've lost {3, 5, 7}.
If R is now only {9, 25, 49}, then we have a mislabeling problem. R is understood to be the union of the anchor member result set and all the subsequent recursive member result sets. Whereas {9, 25, 49} is only a component of R. It is not the full R that we have accrued so far. Therefore, to write the recursive member as selecting from R makes no sense.
************* EDIT *************
I certainly appreciate what @Max Vernon and @Michael S. have detailed below. Namely, that (1) all the components are created up to the recursion limit or null set, and then (2) all the components are unioned together. This is how I understand SQL recursion to actually work.
If we were redesigning SQL, maybe we would enforce a more clear and explicit syntax, something like this:
WITH R AS
(
SELECT N
INTO R[0]
FROM #NUMS
UNION ALL
SELECT N*N AS N
INTO R[K+1]
FROM R[K]
WHERE N*N < 10000000
)
SELECT N FROM R ORDER BY N
;
Sort of like an inductive proof in mathematics.
The problem with SQL recursion as it currently stands is that it is written in a confusing way. The way it is written says that each component is formed by selecting from R, but it does not mean the full R that has been (or, appears to have been) constructed so far. It just means the previous component.
sql-server cte recursive
Coming to SQL from other programming languages, the structure of a recursive query looks rather odd. Walk through it step by step, and it seems to fall apart.
Consider the following simple example:
CREATE TABLE #NUMS
(
N BIGINT
)
;
INSERT INTO #NUMS
VALUES
(3),
(5),
(7)
;
WITH R AS
(
SELECT N FROM #NUMS
UNION ALL
SELECT N*N AS N FROM R WHERE N*N < 10000000
)
SELECT N FROM R ORDER BY N
;
Let's walk through it.
First, the anchor member executes and the result set is put into R. So R is initialized to {3, 5, 7}.
Then, execution drops below the UNION ALL and the recursive member is executed for the first time. It executes on R (that is, on the R that we currently have in hand: {3, 5, 7}). This results in {9, 25, 49}.
What does it do with this new result? Does it append {9, 25, 49} to the existing {3, 5, 7}, label the resulting union R, and then carry on with the recursion from there? Or does it redefine R to be only this new result {9, 25, 49} and do all the union-ing later?
Neither choice makes sense.
If R is now {3, 5, 7, 9, 25, 49} and we execute the next iteration of the recursion, then we will end up with {9, 25, 49, 81, 625, 2401} and we've lost {3, 5, 7}.
If R is now only {9, 25, 49}, then we have a mislabeling problem. R is understood to be the union of the anchor member result set and all the subsequent recursive member result sets. Whereas {9, 25, 49} is only a component of R. It is not the full R that we have accrued so far. Therefore, to write the recursive member as selecting from R makes no sense.
************* EDIT *************
I certainly appreciate what @Max Vernon and @Michael S. have detailed below. Namely, that (1) all the components are created up to the recursion limit or null set, and then (2) all the components are unioned together. This is how I understand SQL recursion to actually work.
If we were redesigning SQL, maybe we would enforce a more clear and explicit syntax, something like this:
WITH R AS
(
SELECT N
INTO R[0]
FROM #NUMS
UNION ALL
SELECT N*N AS N
INTO R[K+1]
FROM R[K]
WHERE N*N < 10000000
)
SELECT N FROM R ORDER BY N
;
Sort of like an inductive proof in mathematics.
The problem with SQL recursion as it currently stands is that it is written in a confusing way. The way it is written says that each component is formed by selecting from R, but it does not mean the full R that has been (or, appears to have been) constructed so far. It just means the previous component.
sql-server cte recursive
sql-server cte recursive
edited 3 hours ago
UnLogicGuys
asked 5 hours ago
UnLogicGuysUnLogicGuys
925
925
The execution plan lays it out, i just suck at reading it. Seems sql server calculates the predicate (after scanning your table of values) and determines whatWHERE N*N < 10000000would be, how many iterations / rows that is... then computesN * Nfor that many rows which is 9... joins it to the outer query (the R cte in this case),a nd appends to the original 3 values. Meh, someone smarter will answer i'm sure
– scsimon
4 hours ago
Recursion in SQL is indeed rather counter-intuitive. Your second alternative R = { 9, 25, 49 } is likely the way it is implemented in all DBMS I've tried. I tend to think about it as if each iteration is used as a feed for the next one, and then added to the final result. Iteration stops once an empty set is the intermediate result.
– Lennart
3 hours ago
In regards to your last edit, SQL Server is designed for set based logic... so recursion, loops, RBAR methods are outside of it's wheel house. Granted you know that, but not surprised the way it would handle recursion with the same engine is blury.
– scsimon
2 hours ago
"If R is now {3, 5, 7, 9, 25, 49} and we execute the next iteration of the recursion, then we will end up with {9, 25, 49, 81, 625, 2401} and we've lost {3, 5, 7}." I don't see how you lose {3,5,7} if it works like that.
– yper-crazyhat-cubeᵀᴹ
2 hours ago
add a comment |
The execution plan lays it out, i just suck at reading it. Seems sql server calculates the predicate (after scanning your table of values) and determines whatWHERE N*N < 10000000would be, how many iterations / rows that is... then computesN * Nfor that many rows which is 9... joins it to the outer query (the R cte in this case),a nd appends to the original 3 values. Meh, someone smarter will answer i'm sure
– scsimon
4 hours ago
Recursion in SQL is indeed rather counter-intuitive. Your second alternative R = { 9, 25, 49 } is likely the way it is implemented in all DBMS I've tried. I tend to think about it as if each iteration is used as a feed for the next one, and then added to the final result. Iteration stops once an empty set is the intermediate result.
– Lennart
3 hours ago
In regards to your last edit, SQL Server is designed for set based logic... so recursion, loops, RBAR methods are outside of it's wheel house. Granted you know that, but not surprised the way it would handle recursion with the same engine is blury.
– scsimon
2 hours ago
"If R is now {3, 5, 7, 9, 25, 49} and we execute the next iteration of the recursion, then we will end up with {9, 25, 49, 81, 625, 2401} and we've lost {3, 5, 7}." I don't see how you lose {3,5,7} if it works like that.
– yper-crazyhat-cubeᵀᴹ
2 hours ago
The execution plan lays it out, i just suck at reading it. Seems sql server calculates the predicate (after scanning your table of values) and determines what
WHERE N*N < 10000000 would be, how many iterations / rows that is... then computes N * N for that many rows which is 9... joins it to the outer query (the R cte in this case),a nd appends to the original 3 values. Meh, someone smarter will answer i'm sure– scsimon
4 hours ago
The execution plan lays it out, i just suck at reading it. Seems sql server calculates the predicate (after scanning your table of values) and determines what
WHERE N*N < 10000000 would be, how many iterations / rows that is... then computes N * N for that many rows which is 9... joins it to the outer query (the R cte in this case),a nd appends to the original 3 values. Meh, someone smarter will answer i'm sure– scsimon
4 hours ago
Recursion in SQL is indeed rather counter-intuitive. Your second alternative R = { 9, 25, 49 } is likely the way it is implemented in all DBMS I've tried. I tend to think about it as if each iteration is used as a feed for the next one, and then added to the final result. Iteration stops once an empty set is the intermediate result.
– Lennart
3 hours ago
Recursion in SQL is indeed rather counter-intuitive. Your second alternative R = { 9, 25, 49 } is likely the way it is implemented in all DBMS I've tried. I tend to think about it as if each iteration is used as a feed for the next one, and then added to the final result. Iteration stops once an empty set is the intermediate result.
– Lennart
3 hours ago
In regards to your last edit, SQL Server is designed for set based logic... so recursion, loops, RBAR methods are outside of it's wheel house. Granted you know that, but not surprised the way it would handle recursion with the same engine is blury.
– scsimon
2 hours ago
In regards to your last edit, SQL Server is designed for set based logic... so recursion, loops, RBAR methods are outside of it's wheel house. Granted you know that, but not surprised the way it would handle recursion with the same engine is blury.
– scsimon
2 hours ago
"If R is now {3, 5, 7, 9, 25, 49} and we execute the next iteration of the recursion, then we will end up with {9, 25, 49, 81, 625, 2401} and we've lost {3, 5, 7}." I don't see how you lose {3,5,7} if it works like that.
– yper-crazyhat-cubeᵀᴹ
2 hours ago
"If R is now {3, 5, 7, 9, 25, 49} and we execute the next iteration of the recursion, then we will end up with {9, 25, 49, 81, 625, 2401} and we've lost {3, 5, 7}." I don't see how you lose {3,5,7} if it works like that.
– yper-crazyhat-cubeᵀᴹ
2 hours ago
add a comment |
3 Answers
3
active
oldest
votes
The BOL description of recursive CTEs describes the semantics of recursive execution as being as follows:
- Split the CTE expression into anchor and recursive members.
- Run the anchor member(s) creating the first invocation or base result set (T0).
- Run the recursive member(s) with Ti as an input and Ti+1 as an output.
- Repeat step 3 until an empty set is returned.
- Return the result set. This is a UNION ALL of T0 to Tn.
So each level only has as input the level above not the entire result set accumulated so far.
The above is how it works logically. Physically recursive CTEs are currently always implemented with nested loops and a stack spool in SQL Server. This is described here and here.
If you remove the ORDER BY from your query the results are ordered as follows
+---------+
| N |
+---------+
| 3 |
| 5 |
| 7 |
| 49 |
| 2401 |
| 5764801 |
| 25 |
| 625 |
| 390625 |
| 9 |
| 81 |
| 6561 |
+---------+
This is because the execution plan operates very similarly to the following C#
using System;
using System.Collections.Generic;
using System.Diagnostics;
public class Program
{
private static readonly Stack<Tuple<long, int>> StackSpool = new Stack<Tuple<long, int>>();
private static void Main(string args)
{
//temp table #NUMS
var nums = new { 3, 5, 7 };
//Anchor member
foreach (var number in nums)
AddToStackSpoolAndEmit(number, 0);
//Recursive part
ProcessStackSpool();
Console.WriteLine("Finished");
Console.ReadLine();
}
private static void ProcessStackSpool()
{
//recursion base case
if (StackSpool.Count == 0)
return;
var row = StackSpool.Pop();
var thisLevel = row.Item2 + 1;
var thisNum = row.Item1 * row.Item1;
Debug.Assert(thisLevel <= 100, "max recursion level exceeded");
if (thisNum < 10000000)
AddToStackSpoolAndEmit(thisNum, thisLevel);
ProcessStackSpool();
}
private static void AddToStackSpoolAndEmit(long number, int recursionLevel)
{
StackSpool.Push(Tuple.Create(number, recursionLevel));
Console.WriteLine(number);
}
}
add a comment |
This is just a (semi) educated guess, and is probably completely wrong. Interesting question, by the way.
T-SQL is a declarative language; perhaps a recursive CTE is translated into a cursor-style operation where the results from the left side of the UNION ALL is appended into a temporary table, then the right side of the UNION ALL is applied to the values in the left side.
So, first we insert the output of the left side of the UNION ALL into the result set, then we insert the results of the right side of the UNION ALL applied to the left side, and insert that into the result set. The left side is then replaced with the output from the right side, and the right side is applied again to the "new" left side. Something like this:
- {3,5,7} -> result set
- recursive statements applied to {3,5,7}, which is {9,25,49}. {9,25,49} is added to result set, and replaces the left side of the UNION ALL.
- recursive statements applied to {9,25,49}, which is {81,625,2401}. {81,625,2401} is added to result set, and replaces the left side of the UNION ALL.
- recursive statements applied to {81,625,2401}, which is {6561,390625,5764801}. {6561,390625,5764801} is added to result set.
- Cursor is complete, since next iteration results in the WHERE clause returning false.
You can see this behavior in the execution plan for the recursive CTE:

This is step 1 above, where the left side of the UNION ALL is added to the output:

This is the right side of the UNION ALL where the output is concatenated to the result set:

add a comment |
My knowledge is in DB2 specifically but looking at the explain diagrams seems to be the same with SQL Server.
The plan comes from here:
See it on Paste the Plan

The optimizer does not literally run a union all for each recursive query. It takes the structure of the query and assigns the first part of the union all to an "anchor member" then it will run through the second half of the union all (called the "recursive member" recursively until it reaches the limitations defined. After the recursion is complete, the optimizer joins all records together.
The optimizer just takes it as a suggestion to do a pre-defined operation.
add a comment |
Your Answer
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "182"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdba.stackexchange.com%2fquestions%2f226946%2fhow-does-sql-recursion-actually-work%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
3 Answers
3
active
oldest
votes
3 Answers
3
active
oldest
votes
active
oldest
votes
active
oldest
votes
The BOL description of recursive CTEs describes the semantics of recursive execution as being as follows:
- Split the CTE expression into anchor and recursive members.
- Run the anchor member(s) creating the first invocation or base result set (T0).
- Run the recursive member(s) with Ti as an input and Ti+1 as an output.
- Repeat step 3 until an empty set is returned.
- Return the result set. This is a UNION ALL of T0 to Tn.
So each level only has as input the level above not the entire result set accumulated so far.
The above is how it works logically. Physically recursive CTEs are currently always implemented with nested loops and a stack spool in SQL Server. This is described here and here.
If you remove the ORDER BY from your query the results are ordered as follows
+---------+
| N |
+---------+
| 3 |
| 5 |
| 7 |
| 49 |
| 2401 |
| 5764801 |
| 25 |
| 625 |
| 390625 |
| 9 |
| 81 |
| 6561 |
+---------+
This is because the execution plan operates very similarly to the following C#
using System;
using System.Collections.Generic;
using System.Diagnostics;
public class Program
{
private static readonly Stack<Tuple<long, int>> StackSpool = new Stack<Tuple<long, int>>();
private static void Main(string args)
{
//temp table #NUMS
var nums = new { 3, 5, 7 };
//Anchor member
foreach (var number in nums)
AddToStackSpoolAndEmit(number, 0);
//Recursive part
ProcessStackSpool();
Console.WriteLine("Finished");
Console.ReadLine();
}
private static void ProcessStackSpool()
{
//recursion base case
if (StackSpool.Count == 0)
return;
var row = StackSpool.Pop();
var thisLevel = row.Item2 + 1;
var thisNum = row.Item1 * row.Item1;
Debug.Assert(thisLevel <= 100, "max recursion level exceeded");
if (thisNum < 10000000)
AddToStackSpoolAndEmit(thisNum, thisLevel);
ProcessStackSpool();
}
private static void AddToStackSpoolAndEmit(long number, int recursionLevel)
{
StackSpool.Push(Tuple.Create(number, recursionLevel));
Console.WriteLine(number);
}
}
add a comment |
The BOL description of recursive CTEs describes the semantics of recursive execution as being as follows:
- Split the CTE expression into anchor and recursive members.
- Run the anchor member(s) creating the first invocation or base result set (T0).
- Run the recursive member(s) with Ti as an input and Ti+1 as an output.
- Repeat step 3 until an empty set is returned.
- Return the result set. This is a UNION ALL of T0 to Tn.
So each level only has as input the level above not the entire result set accumulated so far.
The above is how it works logically. Physically recursive CTEs are currently always implemented with nested loops and a stack spool in SQL Server. This is described here and here.
If you remove the ORDER BY from your query the results are ordered as follows
+---------+
| N |
+---------+
| 3 |
| 5 |
| 7 |
| 49 |
| 2401 |
| 5764801 |
| 25 |
| 625 |
| 390625 |
| 9 |
| 81 |
| 6561 |
+---------+
This is because the execution plan operates very similarly to the following C#
using System;
using System.Collections.Generic;
using System.Diagnostics;
public class Program
{
private static readonly Stack<Tuple<long, int>> StackSpool = new Stack<Tuple<long, int>>();
private static void Main(string args)
{
//temp table #NUMS
var nums = new { 3, 5, 7 };
//Anchor member
foreach (var number in nums)
AddToStackSpoolAndEmit(number, 0);
//Recursive part
ProcessStackSpool();
Console.WriteLine("Finished");
Console.ReadLine();
}
private static void ProcessStackSpool()
{
//recursion base case
if (StackSpool.Count == 0)
return;
var row = StackSpool.Pop();
var thisLevel = row.Item2 + 1;
var thisNum = row.Item1 * row.Item1;
Debug.Assert(thisLevel <= 100, "max recursion level exceeded");
if (thisNum < 10000000)
AddToStackSpoolAndEmit(thisNum, thisLevel);
ProcessStackSpool();
}
private static void AddToStackSpoolAndEmit(long number, int recursionLevel)
{
StackSpool.Push(Tuple.Create(number, recursionLevel));
Console.WriteLine(number);
}
}
add a comment |
The BOL description of recursive CTEs describes the semantics of recursive execution as being as follows:
- Split the CTE expression into anchor and recursive members.
- Run the anchor member(s) creating the first invocation or base result set (T0).
- Run the recursive member(s) with Ti as an input and Ti+1 as an output.
- Repeat step 3 until an empty set is returned.
- Return the result set. This is a UNION ALL of T0 to Tn.
So each level only has as input the level above not the entire result set accumulated so far.
The above is how it works logically. Physically recursive CTEs are currently always implemented with nested loops and a stack spool in SQL Server. This is described here and here.
If you remove the ORDER BY from your query the results are ordered as follows
+---------+
| N |
+---------+
| 3 |
| 5 |
| 7 |
| 49 |
| 2401 |
| 5764801 |
| 25 |
| 625 |
| 390625 |
| 9 |
| 81 |
| 6561 |
+---------+
This is because the execution plan operates very similarly to the following C#
using System;
using System.Collections.Generic;
using System.Diagnostics;
public class Program
{
private static readonly Stack<Tuple<long, int>> StackSpool = new Stack<Tuple<long, int>>();
private static void Main(string args)
{
//temp table #NUMS
var nums = new { 3, 5, 7 };
//Anchor member
foreach (var number in nums)
AddToStackSpoolAndEmit(number, 0);
//Recursive part
ProcessStackSpool();
Console.WriteLine("Finished");
Console.ReadLine();
}
private static void ProcessStackSpool()
{
//recursion base case
if (StackSpool.Count == 0)
return;
var row = StackSpool.Pop();
var thisLevel = row.Item2 + 1;
var thisNum = row.Item1 * row.Item1;
Debug.Assert(thisLevel <= 100, "max recursion level exceeded");
if (thisNum < 10000000)
AddToStackSpoolAndEmit(thisNum, thisLevel);
ProcessStackSpool();
}
private static void AddToStackSpoolAndEmit(long number, int recursionLevel)
{
StackSpool.Push(Tuple.Create(number, recursionLevel));
Console.WriteLine(number);
}
}
The BOL description of recursive CTEs describes the semantics of recursive execution as being as follows:
- Split the CTE expression into anchor and recursive members.
- Run the anchor member(s) creating the first invocation or base result set (T0).
- Run the recursive member(s) with Ti as an input and Ti+1 as an output.
- Repeat step 3 until an empty set is returned.
- Return the result set. This is a UNION ALL of T0 to Tn.
So each level only has as input the level above not the entire result set accumulated so far.
The above is how it works logically. Physically recursive CTEs are currently always implemented with nested loops and a stack spool in SQL Server. This is described here and here.
If you remove the ORDER BY from your query the results are ordered as follows
+---------+
| N |
+---------+
| 3 |
| 5 |
| 7 |
| 49 |
| 2401 |
| 5764801 |
| 25 |
| 625 |
| 390625 |
| 9 |
| 81 |
| 6561 |
+---------+
This is because the execution plan operates very similarly to the following C#
using System;
using System.Collections.Generic;
using System.Diagnostics;
public class Program
{
private static readonly Stack<Tuple<long, int>> StackSpool = new Stack<Tuple<long, int>>();
private static void Main(string args)
{
//temp table #NUMS
var nums = new { 3, 5, 7 };
//Anchor member
foreach (var number in nums)
AddToStackSpoolAndEmit(number, 0);
//Recursive part
ProcessStackSpool();
Console.WriteLine("Finished");
Console.ReadLine();
}
private static void ProcessStackSpool()
{
//recursion base case
if (StackSpool.Count == 0)
return;
var row = StackSpool.Pop();
var thisLevel = row.Item2 + 1;
var thisNum = row.Item1 * row.Item1;
Debug.Assert(thisLevel <= 100, "max recursion level exceeded");
if (thisNum < 10000000)
AddToStackSpoolAndEmit(thisNum, thisLevel);
ProcessStackSpool();
}
private static void AddToStackSpoolAndEmit(long number, int recursionLevel)
{
StackSpool.Push(Tuple.Create(number, recursionLevel));
Console.WriteLine(number);
}
}
answered 1 hour ago
Martin SmithMartin Smith
61.8k10166247
61.8k10166247
add a comment |
add a comment |
This is just a (semi) educated guess, and is probably completely wrong. Interesting question, by the way.
T-SQL is a declarative language; perhaps a recursive CTE is translated into a cursor-style operation where the results from the left side of the UNION ALL is appended into a temporary table, then the right side of the UNION ALL is applied to the values in the left side.
So, first we insert the output of the left side of the UNION ALL into the result set, then we insert the results of the right side of the UNION ALL applied to the left side, and insert that into the result set. The left side is then replaced with the output from the right side, and the right side is applied again to the "new" left side. Something like this:
- {3,5,7} -> result set
- recursive statements applied to {3,5,7}, which is {9,25,49}. {9,25,49} is added to result set, and replaces the left side of the UNION ALL.
- recursive statements applied to {9,25,49}, which is {81,625,2401}. {81,625,2401} is added to result set, and replaces the left side of the UNION ALL.
- recursive statements applied to {81,625,2401}, which is {6561,390625,5764801}. {6561,390625,5764801} is added to result set.
- Cursor is complete, since next iteration results in the WHERE clause returning false.
You can see this behavior in the execution plan for the recursive CTE:

This is step 1 above, where the left side of the UNION ALL is added to the output:

This is the right side of the UNION ALL where the output is concatenated to the result set:

add a comment |
This is just a (semi) educated guess, and is probably completely wrong. Interesting question, by the way.
T-SQL is a declarative language; perhaps a recursive CTE is translated into a cursor-style operation where the results from the left side of the UNION ALL is appended into a temporary table, then the right side of the UNION ALL is applied to the values in the left side.
So, first we insert the output of the left side of the UNION ALL into the result set, then we insert the results of the right side of the UNION ALL applied to the left side, and insert that into the result set. The left side is then replaced with the output from the right side, and the right side is applied again to the "new" left side. Something like this:
- {3,5,7} -> result set
- recursive statements applied to {3,5,7}, which is {9,25,49}. {9,25,49} is added to result set, and replaces the left side of the UNION ALL.
- recursive statements applied to {9,25,49}, which is {81,625,2401}. {81,625,2401} is added to result set, and replaces the left side of the UNION ALL.
- recursive statements applied to {81,625,2401}, which is {6561,390625,5764801}. {6561,390625,5764801} is added to result set.
- Cursor is complete, since next iteration results in the WHERE clause returning false.
You can see this behavior in the execution plan for the recursive CTE:

This is step 1 above, where the left side of the UNION ALL is added to the output:

This is the right side of the UNION ALL where the output is concatenated to the result set:

add a comment |
This is just a (semi) educated guess, and is probably completely wrong. Interesting question, by the way.
T-SQL is a declarative language; perhaps a recursive CTE is translated into a cursor-style operation where the results from the left side of the UNION ALL is appended into a temporary table, then the right side of the UNION ALL is applied to the values in the left side.
So, first we insert the output of the left side of the UNION ALL into the result set, then we insert the results of the right side of the UNION ALL applied to the left side, and insert that into the result set. The left side is then replaced with the output from the right side, and the right side is applied again to the "new" left side. Something like this:
- {3,5,7} -> result set
- recursive statements applied to {3,5,7}, which is {9,25,49}. {9,25,49} is added to result set, and replaces the left side of the UNION ALL.
- recursive statements applied to {9,25,49}, which is {81,625,2401}. {81,625,2401} is added to result set, and replaces the left side of the UNION ALL.
- recursive statements applied to {81,625,2401}, which is {6561,390625,5764801}. {6561,390625,5764801} is added to result set.
- Cursor is complete, since next iteration results in the WHERE clause returning false.
You can see this behavior in the execution plan for the recursive CTE:

This is step 1 above, where the left side of the UNION ALL is added to the output:

This is the right side of the UNION ALL where the output is concatenated to the result set:

This is just a (semi) educated guess, and is probably completely wrong. Interesting question, by the way.
T-SQL is a declarative language; perhaps a recursive CTE is translated into a cursor-style operation where the results from the left side of the UNION ALL is appended into a temporary table, then the right side of the UNION ALL is applied to the values in the left side.
So, first we insert the output of the left side of the UNION ALL into the result set, then we insert the results of the right side of the UNION ALL applied to the left side, and insert that into the result set. The left side is then replaced with the output from the right side, and the right side is applied again to the "new" left side. Something like this:
- {3,5,7} -> result set
- recursive statements applied to {3,5,7}, which is {9,25,49}. {9,25,49} is added to result set, and replaces the left side of the UNION ALL.
- recursive statements applied to {9,25,49}, which is {81,625,2401}. {81,625,2401} is added to result set, and replaces the left side of the UNION ALL.
- recursive statements applied to {81,625,2401}, which is {6561,390625,5764801}. {6561,390625,5764801} is added to result set.
- Cursor is complete, since next iteration results in the WHERE clause returning false.
You can see this behavior in the execution plan for the recursive CTE:

This is step 1 above, where the left side of the UNION ALL is added to the output:

This is the right side of the UNION ALL where the output is concatenated to the result set:

edited 4 hours ago
answered 4 hours ago
Max VernonMax Vernon
49.9k13111220
49.9k13111220
add a comment |
add a comment |
My knowledge is in DB2 specifically but looking at the explain diagrams seems to be the same with SQL Server.
The plan comes from here:
See it on Paste the Plan

The optimizer does not literally run a union all for each recursive query. It takes the structure of the query and assigns the first part of the union all to an "anchor member" then it will run through the second half of the union all (called the "recursive member" recursively until it reaches the limitations defined. After the recursion is complete, the optimizer joins all records together.
The optimizer just takes it as a suggestion to do a pre-defined operation.
add a comment |
My knowledge is in DB2 specifically but looking at the explain diagrams seems to be the same with SQL Server.
The plan comes from here:
See it on Paste the Plan

The optimizer does not literally run a union all for each recursive query. It takes the structure of the query and assigns the first part of the union all to an "anchor member" then it will run through the second half of the union all (called the "recursive member" recursively until it reaches the limitations defined. After the recursion is complete, the optimizer joins all records together.
The optimizer just takes it as a suggestion to do a pre-defined operation.
add a comment |
My knowledge is in DB2 specifically but looking at the explain diagrams seems to be the same with SQL Server.
The plan comes from here:
See it on Paste the Plan

The optimizer does not literally run a union all for each recursive query. It takes the structure of the query and assigns the first part of the union all to an "anchor member" then it will run through the second half of the union all (called the "recursive member" recursively until it reaches the limitations defined. After the recursion is complete, the optimizer joins all records together.
The optimizer just takes it as a suggestion to do a pre-defined operation.
My knowledge is in DB2 specifically but looking at the explain diagrams seems to be the same with SQL Server.
The plan comes from here:
See it on Paste the Plan

The optimizer does not literally run a union all for each recursive query. It takes the structure of the query and assigns the first part of the union all to an "anchor member" then it will run through the second half of the union all (called the "recursive member" recursively until it reaches the limitations defined. After the recursion is complete, the optimizer joins all records together.
The optimizer just takes it as a suggestion to do a pre-defined operation.
edited 4 hours ago
answered 4 hours ago
Michael S.Michael S.
14
14
add a comment |
add a comment |
Thanks for contributing an answer to Database Administrators Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdba.stackexchange.com%2fquestions%2f226946%2fhow-does-sql-recursion-actually-work%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
The execution plan lays it out, i just suck at reading it. Seems sql server calculates the predicate (after scanning your table of values) and determines what
WHERE N*N < 10000000would be, how many iterations / rows that is... then computesN * Nfor that many rows which is 9... joins it to the outer query (the R cte in this case),a nd appends to the original 3 values. Meh, someone smarter will answer i'm sure– scsimon
4 hours ago
Recursion in SQL is indeed rather counter-intuitive. Your second alternative R = { 9, 25, 49 } is likely the way it is implemented in all DBMS I've tried. I tend to think about it as if each iteration is used as a feed for the next one, and then added to the final result. Iteration stops once an empty set is the intermediate result.
– Lennart
3 hours ago
In regards to your last edit, SQL Server is designed for set based logic... so recursion, loops, RBAR methods are outside of it's wheel house. Granted you know that, but not surprised the way it would handle recursion with the same engine is blury.
– scsimon
2 hours ago
"If R is now {3, 5, 7, 9, 25, 49} and we execute the next iteration of the recursion, then we will end up with {9, 25, 49, 81, 625, 2401} and we've lost {3, 5, 7}." I don't see how you lose {3,5,7} if it works like that.
– yper-crazyhat-cubeᵀᴹ
2 hours ago