Use-define chain
A Use-Definition Chain (UD Chain) is a data structure that consists of a use, U, of a variable, and all the definitions, D, of that variable that can reach that use without any other intervening definitions. A UD Chain generally means the assignment of some value to a variable.
A counterpart of a UD Chain is a Definition-Use Chain (DU Chain), which consists of a definition, D, of a variable and all the uses, U, reachable from that definition without any other intervening definitions.
Both UD and DU chains are created by using a form of static code analysis known as data flow analysis. Knowing the use-def and def-use chains for a program or subprogram is a prerequisite for many compiler optimizations, including constant propagation and common subexpression elimination.
Purpose
Making the use-define or define-use chains is a step in liveness analysis, so that logical representations of all the variables can be identified and tracked through the code.
Consider the following snippet of code:
int x = 0; /* A */
x = x + y; /* B */
/* 1, some uses of x */
x = 35; /* C */
/* 2, some more uses of x */
Notice that x
is assigned a value at three points (marked A, B, and C). However, at the point marked "1", the use-def chain for x
should indicate that its current value must have come from line B (and its value at line B must have come from line A). Contrariwise, at the point marked "2", the use-def chain for x
indicates that its current value must have come from line C. Since the value of the x
in block 2 does not depend on any definitions in block 1 or earlier, x
might as well be a different variable there; practically speaking, it is a different variable — call it x2
.
int x = 0; /* A */
x = x + y; /* B */
/* 1, some uses of x */
int x2 = 35; /* C */
/* 2, some uses of x2 */
The process of splitting x
into two separate variables is called live range splitting. See also static single assignment form.
Setup
The list of statements determines a strong order among statements.
- Statements are labeled using the following conventions: , where i is an integer in ; and n is the number of statements in the basic block
- Variables are identified in italic (e.g., v,u and t)
- Every variable is assumed to have a definition in the context or scope. (In static single assignment form, use-define chains are explicit because each chain contains a single element.)
For a variable, such as v, its declaration is identified as V (italic capital letter), and for short, its declaration is identified as . In general, a declaration of a variable can be in an outer scope (e.g., a global variable).
Definition of a Variable
When a variable, v, is on the LHS of an assignment statement, such as , then is a definition of v. Every variable (v) has at least one definition by its declaration (V) (or initialization).
Use of a Variable
If variable, v, is on the RHS of statement , there is a statement, with i < j and , that it is a definition of v and it has a use at (or, in short, when a variable, v, is on the RHS of a statement , then v has a use at statement ).
Execution
Consider the sequential execution of the list of statements, , and what can now be observed as the computation at statement, j:
- A definition at statement with i < j is alive at j, if it has a use at a statement with k ≥ j. The set of alive definitions at statement i is denoted as and the number of alive definitions as . ( is a simple but powerful concept: theoretical and practical results in space complexity theory, access complexity(I/O complexity), register allocation and cache locality exploitation are based on .)
- A definition at statement kills all previous definitions ( with k < i) for the same variables.
Execution example for def-use-chain
This example is based on a Java algorithm for finding the gcd. (It is not important to understand what this function does.)
1 /**
2 * @param(a, b) The values used to calculate the divisor.
3 * @return The greatest common divisor of a and b.
4 */
5 int gcd(int a, int b) {
6 int c = a;
7 int d = b;
8 if (c == 0)
9 return d;
10 while (d != 0) {
11 if (c > d)
12 c = c - d;
13 else
14 d = d - c;
15 }
16 return c;
17 }
To find out all def-use-chains for variable d, do the following steps:
- Search for the first time the variable is defined (write access).
- In this case it is "
d=b
" (l.7)
- In this case it is "
- Search for the first time the variable is read.
- In this case it is "
return d
"
- In this case it is "
- Write down this information in the following style: [name of the variable you are creating a def-use-chain for, the concrete write access, the concrete read access]
- In this case it is:
[d, d=b, return d]
- In this case it is:
Repeat these steps in the following style: combine each write access with each read access (but NOT the other way round).
The result should be:
1 [d, d=b, return d]
2 [d, d=b, while(d!=0)]
3 [d, d=b, if(c>d)]
4 [d, d=b, c=c-d]
5 [d, d=b, d=d-c]
6 [d, d=d-c, while(d!=0)]
7 [d, d=d-c, if(c>d)]
8 [d, d=d-c, c=c-d]
9 [d, d=d-c, d=d-c]
You have to take care, if the variable is changed by the time.
For example: From line 7 down to line 13 in the source code, d is not redefined / changed. At line 14, d could be redefined, this is, why you have to recombine this write access on d with all possible read access, which could be reached. In this case, only the code beyond line 10 is relevant. Line 7 for example cannot be reached again. For your understanding, you can imagine 2 different variables d:
1 [d1, d1=b, return d1]
2 [d1, d1=b, while(d1!=0)]
3 [d1, d1=b, if(c>d1)]
4 [d1, d1=b, c=c-d1]
5 [d1, d1=b, d1=d1-c]
6 [d2, d2=d2-c, while(d2!=0)]
7 [d2, d2=d2-c, if(c>d2)]
8 [d2, d2=d2-c, c=c-d2]
9 [d2, d2=d2-c, d2=d2-c]
As result you could get something like this. The variable d1 would be replaced by b
1 /**
2 * @param(a, b) The values used to calculate the divisor.
3 * @return The greatest common divisor of a and b.
4 **/
5 int gcd(int a, int b) {
6 int c = a;
7 int d;
8 if (c == 0)
9 return b;
10 if (b != 0) {
11 if (c > b) {
12 c = c - b;
13 d = b;
14 }
15 else
16 d = b - c;
17 while (d != 0) {
18 if (c > d)
19 c = c - d;
20 else
21 d = d - c;
22 }
23 }
24 return c;
25 }
Method of building a use-def (or ud) chain
- Set definitions in statement
- For each i in , find live definitions that have use in statement
- Make a link among definitions and uses
- Set the statement , as definition statement
- Kill previous definitions
With this algorithm, two things are accomplished:
- A directed acyclic graph (DAG) is created on the variable uses and definitions. The DAG specifies a data dependency among assignment statements, as well as a partial order (therefore parallelism among statements).
- When statement is reached, there is a list of live variable assignments. If only one assignment is live, for example, constant propagation might be used.