3.6 Control
Machine code provides two basic low-level mechanisms for implementing conditional behavior: it tests data values and then alters either the control flow or the data flow based on the results of these tests.
Data-dependent control flow is the more general and more common approach for implementing conditional behavior
[TOC]
3.6.1 Condition Codes
In addition to the integer registers, the CPU maintains a set of single-bit condition code registers describing attributes of the most recent arithmetic or logical operation.
CF
: Carry flag. The most recent operation generated a carry out of the most significant bit. Used to detect overflow for unsigned operations.The carry flag is also set if the subtraction of two numbers requires a borrow into the most significant (leftmost) bits subtracted.
1
0000 - 0001 = 1111 (carry flag is turned on)
ZF
: Zero flag. The most recent operation yielded zero.SF
: Sign flag. The most recent operation yielded a negative value.OF
: Overflow flag. The most recent operation caused a two’s-complement overflow—either negative or positive.
For example, suppose we used one of the add instructions to perform the equivalent of the C assignment t = a+b
, where variables a, b, and t are integers.
1 | CF (unsigned) t < (unsigned) a //Unsigned overflow |
The leaq
instruction does NOT alter any condition codes, since it is intended to be used in address computations.
Otherwise, ALL of the instructions listed in Figure 3.10 cause the condition codes to be set
- For the logical operations, such as
xor
, theCF
andOF
are set to zero - For the shift operations, the
CF
is set to the last bit shifted out, while theOF
is set to zero - For reasons that we will not delve into, the
inc
anddec
instructions set theOF
andZF
, but they leave theCF
unchanged.
cmp
classes
They behave in the same way as the sub
instructions, except that they set the condition codes without updating their destinations.
- These instructions set the zero flag if the two operands are equal.
- The other flags can be used to determine ordering relations between the two operands.
test
classes
The test instructions behave in the same manner as the and
instructions, except that they set the condition codes without altering their destinations.
- Typically, the same operand is repeated (e.g.,
testq %rax,%rax
to see whether%rax
is negative, zero, or positive) - Or one of the operands is a mask indicating which bits should be tested.
3.6.2 Accessing the Condition Codes
Rather than reading the condition codes directly, there are three common ways of using the condition codes:
- we can set a single byte to 0 or 1 depending on some combination of the condition codes
- We refer to this entire class of instructions as the
set
instructions - they differ from one another based on which combinations of condition codes they consider, as indicated by the different suffixes for the instruction names.
- For example, instructions
setl
andsetb
denote “set less” and “set below,” not “set long word” or “set byte.”
- We refer to this entire class of instructions as the
- we can conditionally jump to some other part of the program
- we can conditionally transfer data
set
classes
A set
instruction has either one of the low-order single-byte register elements or a single-byte memory location as its destination, setting this byte to either 0 or 1.
To generate a 32-bit or 64-bit result, we must also clear the high-order bits.
For example
1
2
3int comp(data_t a, data_t b){
return a < b;
}1
2
3
4
5
6# a in %rdi, b in %rsi
comp:
cmpq %rsi, %rdi # Compare a:b
setl %al # Set low-order byte of %eax to 0 or 1
movzbl %al, %eax # Clear rest of %eax (and rest of %rax)
retNote the comparison order of the cmpq instruction . Although the arguments are listed in the order
%rsi
(b), then%rdi
(a), the comparison is really between a and b(subtract b from a)For some of the instructions, there are multiple possible names, which listed as “synonyms.”
For example, both
setg
andsetnle
refer to the same machine instruction. Compilers and disassemblers make arbitrary choices of which names to use.Although all arithmetic and logical operations set the condition codes, the descriptions of the different set instructions apply to the case where a comparison instruction has been executed, setting the condition codes according to the computation
t = a-b
.
Machine codes do not distinguish signed and unsigned value because the arithmetic of them are the same at bit level. If distinguish do matter, it use different instructions, such as using different versions of right shifts, division and multiplication instructions, and different combinations of condition codes.
Practice Problem 3.13
The C code
1
2
3 int comp(data_t a, data_t b) {
return a COMP b;
}shows a general comparison between arguments a and b, where
data_t
, the data type of the arguments, is defined (viatypedef
) to be one of the integer data types listed in Figure 3.1 and either signed or unsigned. The comparison COMP is defined via#define
.Suppose a is in some portion of
%rdx
while b is in some portion of%rsi
. For each of the following instruction sequences, determine which data typesdata_t
and which comparisonsCOMP
could cause the compiler to generate this code. (There can be multiple correct answers; you should list them all.)
1
2
3
4
5
6
7
8
9
10
11
12 A.
cmpl %esi, %edi
setl %al
B.
cmpw %si, %di
setge %al
C.
cmpb %sil, %dil
setbe %al
D.
cmpq %rsi, %rdi
setne %al
My solution: :white_check_mark:
1 | //A |
Practice Problem 3.14
The C code
1
2
3 int test(data_t a) {
return a TEST 0;
}shows a general comparison between argument
a
and0
, where we can set the data type of the argument by declaringdata_t
with atypedef
, and the nature of the comparison by declaringTEST
with a#define
declaration. The following instruction sequences implement the comparison, where a is held in some portion of register%rdi
. For each sequence, determine which data typesdata_t
and which comparisonsTEST
could cause the compiler to generate this code. (There can be multiple correct answers; list all correct ones.)
1
2
3
4
5
6
7
8
9
10
11
12 A.
testq %rdi, %rdi
setge %al
B.
testw %di, %di
sete %al
C.
testb %dil, %dil
seta %al
D.
testl %edi, %edi
setle %al
My solution: :white_check_mark:
1 | //A |
3.6.3 Jump Instructions
A jump
instruction can cause the execution to switch to a completely new position in the program.
These jump destinations are generally indicated in assembly code by a Label
For example
1 | movq $0,%rax #Set %rax to 0 |
In generating the object-code file, the assembler determines the addresses of all labeled instructions and encodes the jump targets (the addresses of the destination instructions) as part of the jump instructions.
The
jmp
instruction jumps unconditionally.It can be either a direct jump, where the jump target is encoded as part of the instruction
- Direct jumps are written in assembly code by giving a label as the jump target, for example, the label
.L1
in the code shown.
- Direct jumps are written in assembly code by giving a label as the jump target, for example, the label
or an indirect jump, where the jump target is read from a register or a memory location
Indirect jumps are written using ‘*’ followed by an operand specifier, for exmple
1
2jmp *%rax # uses the value in register %rax as the jump target
jmp *(%rax)# reads the jump target from memory, using the value in %rax as the read address
The remaining jump instructions in the table are conditional—they either jump or continue executing at the next instruction in the code sequence, depending on some combination of the condition codes
- Conditional jumps can only be direct
3.6.4 Jump Instruction Encodings
understanding how the targets of jump instructions are encoded will become important when we study linking in Chapter 7
There are several different encodings for jumps, but some of the most commonly used ones are PC relative
- That is, they encode the difference between the address of the target instruction and the address of the instruction immediately following the jump.
- These offsets can be encoded using 1, 2, or 4 bytes.
A second encoding method is to give an “absolute” address, using 4 bytes to directly specify the target.
The assembler and linker select the appropriate encodings of the jump destinations.
Example of PC relative
1 | movq %rdi, %rax |
1 | 0: 48 89 f8 mov %rdi,%rax |
loop
is the name of this session, indicating the address of start point of these instructions<loop + x>
indicates the instructionx
bytes away from the start pointon line 3,
<loop + 8>
is encoded to03
calculated bytarget address - next instruction address = 0x8 - 0x5 = 0x3
on line b,
<loop + 0x5
is encoded tof8
calculated by0x5 - 0xd = -8 = 0b11111000 = 0xf8
1 | # after linking |
The instructions have been relocated to different addresses, but the encodings of the jump targets in lines 2 and 5 remain unchanged.
Advantages of PC relative encoding
- the instructions can be compactly encoded (requiring just 2 bytes)
- the object code can be shifted to different positions in memory without alteration.
Practice Problem 3.15
In the following excerpts from a disassembled binary, some of the information has been replaced by X’s. Answer the following questions about these instructions.
A. What is the target of the
je
instruction below? (You do not need to know anything about thecallq
instruction here
1
2 4003fa: 74 02 je XXXXXX
4003fc: ff d0 callq *%raxB. What is the target of the
je
instruction below?
1
2 40042f: 74 f4 je XXXXXX
400431: 5d pop %rbpC. What is the address of the
ja
andpop
instructions?
1
2 XXXXXX: 77 02 ja 400547
XXXXXX: 5d pop %rbpD. In the code that follows, the jump target is encoded in PC-relative form as a 4- byte two’s-complement number. The bytes are listed from least significant to most, reflecting the little-endian byte ordering of x86-64. What is the address of the jump target?
1
2 4005e8: e9 73 ff ff ff jmpq XXXXXXX
4005ed: 90 nop
My solution : :white_check_mark:
1 | # A |
3.6.5 Implementing Conditional Branches with Conditional Control
The most general way to translate conditional expressions and statements from C into machine code is to use combinations of conditional and unconditional jumps.
goto
statement in C is similar to unconditional jump in assembly. Using goto
statements is generally considered a bad programming style, since their use can make code very difficult to read and debug.
Assembly if-else
constructure in C style codes
1 | t = test-expr; |
Practice Problem 3.16
When given the C code
1
2
3
4
5 void cond(short a, short *p)
{
if (a && *p < a)
*p = a;
}gcc generates the following assembly code:
1
2
3
4
5
6
7
8
9
10 # void cond(short a, short *p)
# a in %rdi, p in %rsi
cond:
testw %di, %di
je .L1
cmpw %di, (%rsi)
jge .L1
movw %di, (%rsi)
.L1:
rep retA. Write a goto version in C that performs the same computation and mimics the control flow of the assembly code, in the style shown in Figure 3.16(b). You might find it helpful to first annotate the assembly code as we have done in our examples.
B. Explain why the assembly code contains two conditional branches, even though the C code has only one if statement.
My solution : :white_check_mark:
1 | //A |
Practice Problem 3.17
An alternate rule for translating if statements into goto code is as follows:
1
2
3
4
5
6
7
8 t = test-expr;
if (t)
goto true;
else-statement
goto done;
true:
then-statement
done:A. Rewrite the goto version of absdiff_se based on this alternate rule.
B. Can you think of any reasons for choosing one rule over the other?
My solution : :warning:
1 | long lt_cnt = 0; |
Put else-statement
before then-statement
can drop a !
Instruction.
Solution on the book :
In most respects, the choice is arbitrary. But the original rule works better for the common case where there is no else statement
Practice Problem 3.18
Starting with C code of the form
1
2
3
4
5
6
7
8
9
10
11
12 short test(short x, short y, short z) {
short val = ________;
if (______) {
if (_______)
val = ________;
else
val = ________;
}
else if (________)
val = ________;
return val;
}gcc generates the following assembly code:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29 #short test(short x, short y, short z)
#x in %rdi, y in %rsi, z in %rdx
test:
leal (%rdx,%rsi), %eax
subl %edi, %eax
cmpw $5, %dx
jle .L2
cmpw $2, %si
jle .L3
movswl %di, %eax
movswl %dx, %ecx
cltd
idivl %ecx
ret
.L3:
movswl %di, %eax
movswl %si, %esi
cltd
idivl %esi
ret
.L2:
cmpw $2, %dx
jg .L1
movswl %dx, %eax
movswl %si, %esi
cltd
idivl %esi
.L1:
rep retFill in the missing expressions in the C code.
My solution: :white_check_mark:
1 | short test(short x, short y, short z) { |
Verification
1 | $ diff -s my.s test.s |
3.6.6 Implementing Conditional Branches with Conditional Moves
The transfer of control is simple but also inefficient. Modern processors prefer a conditional move of data
- This approach computes both outcomes of a conditional operation and then selects one based on whether or not the condition holds.
- This strategy makes sense only in restricted cases
The key is that the single cmovge
instruction of the assembly code. It will transfer the data from the source register to the destination, only if the cmpq
instruction indicates that the condition holds (as indicated by the suffix)
Why data-transfer outperform control-transfer ?
Modern computers use pipline to execute instructions. To do this requires being able to determine the sequence of instructions to be executed well ahead of time in order to keep the pipeline full of instructions to be executed.
Data transfer do not have
jmp
When a
jmp
instruction appears, the processor do not know the following instruction sequenceAlthough the processor can guess the condition will hold or not via “branch prediction logic”, once the prediction is wrong, all previous work based on the prediction is wasted.
The flow of control does not depend on data, and this makes it easier for the processor to keep its pipeline full.
Practice Problem 3.19
Running on a new processor model, our code required around 45 cycles when the branching pattern was random, and around 25 cycles when the pattern was highly predictable.
A. What is the approximate miss penalty?
B. How many cycles would the function require when the branch is mispredicted?
My solution: :white_check_mark:
$$
Tavg(0.5) = Tok+0.5Tmp =45
$$
$$
Tok=25
$$
$$
\therefore Tmp = 40 (\text{clock cycle})
$$
A : approximate miss penalty is 40 clock
$$
Tran = Tavg(1) = Tok + Tmp = 65
$$
B : 65 clock cycles needed.
Conditional move instructions
The source value is read from either memory or the source register, but it is copied to the destination only if the specified condition holds.
The source and destination values can be 16, 32, or 64 bits long. Singlebyte conditional moves are not supported.
The processor can infer the data size from the destination, so the same instruction name can be used for all operand lengths.
Unlike conditional jumps, the processor can execute conditional move instructions without having to predict the outcome of the test. The processor simply reads the source value (possibly from memory), checks the condition code, and then either updates the destination register or keeps it the same.
Standard form
1 | v = test-expr ? then-expr : else-expr; |
will be converted to
1 | v = then-expr; |
The final statement in this sequence is implemented with a conditional move— value ve
is copied to v
only if test condition t does not hold.
Two disadvantages
data-transfer evalute both expressions regardless of the condition, if one of those two expressions could possibly generate an error condition or a side effect, this could lead to invalid behavior.
For example
1
2
3long cread(long *xp) {
return (xp ? *xp : 0);
}1
2
3
4
5
6
7# Invalid implementation of function cread !!!
cread:
movq (%rdi), %rax # v = *xp
testq %rdi, %rdi # Test x
movl $0, %edx # Set ve = 0
cmove %rdx, %rax # If x==0, v = ve
retActual
gcc
output1
2
3
4
5
6
7testq %rdi, %rdi
je .L3
movq (%rdi), %rax
ret
.L3:
movl $0, %eax
retData-transfer do not always improve code efficiency, if either the then-expr or the else-expr evaluation requires a significant computation, then this effort is wasted when the corresponding condition does not hold
gcc
only uses conditional moves when the two expressions can be computed very easily, for example, with single add instructions.
Practice Problem 3.20
In the following C function, we have left the definition of operation
OP
incomplete:
1
2
3
4
short arith(short x) {
return x OP 16;
}When compiled, gcc generates the following assembly code
1
2
3
4
5
6
7
8 # short arith(short x)
# x in %rdi
arith:
leaq 15(%rdi), %rbx
testq %rdi, %rdi
cmovns %rdi, %rbx
sarq $4, %rbx
retA. What operation is OP?
B. Annotate the code to explain how it works
My solution : :x:
1 | # short arith(short x) |
1 | short arith(short x){ |
1 |
Solution on the book:
The operator is ‘/’. The program creates a temporary value equal to x + 15, in anticipation of x being negative and therefore requiring biasing.
Practice Problem 3.21
Starting with C code of the form
1
2
3
4
5
6
7
8
9
10
11
12 short test(short x, short y) {
short val = _______;
if (_______) {
if (_______)
val = ________;
else
val = ________;
}
else if (________)
val = __________;
return val;
}gcc generates the following assembly code:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24 # short test(short x, short y)
# x in %rdi, y in %rsi
test:
leal 12(%rsi), %eax
testw %di, %di
js .L5
cmpw $10, %si
jle .L1
movswl %di, %eax
movswl %si, %esi
cltd
idivl %esi
.L1:
rep ret
.L5:
cmpw %di, %si
jle .L3
movl %edi, %eax
imull %esi, %eax
ret
.L3:
movl %esi, %eax
orl %edi, %eax
retFill in the missing expressions in the C code.
My solution : :white_check_mark:
1 | short test(short x, short y) { |
fuck you, global edition, but I am too poor to buy the NA version…
3.6.7 Loops
combinations of conditional tests and jumps are used to implement the effect of loops
Do-While Loops
1 | do{ |
Practice Problem 3.22 :white_check_mark:
A. Try to calculate 14! with a 32-bit int. Verify whether the computation of 14! overflows.
B. What if the computation is done with a 64-bit long int?
1 |
|
1 | $ ./test 14 |
1 | 14) realres = math.factorial( |
Overflow happens.
When using long
, the result is correct
1 | $ ./test 14 |
1 | 14) < 2**64 factorial( |
Practice Problem 3.23
For the C code
1
2
3
4
5
6
7
8
9
10
11 short dw_loop(short x) {
short y = x/9;
short *p = &x;
short n = 4*x;
do {
x += y;
(*p) += 5;
n -= 2;
} while (n > 0);
return x;
}gcc generates the following assembly code:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16 dw_loop:
movl %edi, %eax
movswl %di, %ecx
imull $7282, %ecx, %ecx
shrl $16, %ecx
movl %edi, %edx
sarw $15, %dx
subl %edx, %ecx
sall $2, %edi
.L2:
leal 5(%rcx,%rax), %eax
leal -2(%rdi), %edx
movl %edx, %edi
testw %dx, %dx
jg .L2
rep retA. Which registers are used to hold program values
x
,y
, andn
?B. How has the compiler eliminated the need for pointer variable
p
and the pointer dereferencing implied by the expression(*p)+=5
?C. Add annotations to the assembly code describing the operation of the program, similar to those shown in Figure 3.19(c).
My solution: :white_check_mark:
1 | # short dw_loop(short x) |
A: As we can see that y = cot
, in %ecx
,n = d
in%edx
, and finally x
in %edi
B: Since the pointer holds the address of a local variable, there is no need to really access memory, so just another temporary variable saver is OK. In this case, x += y; (*p) += 5
is merged into x = x + y + 5
While Loops
There are a number of ways to translate a while loop into machine code, two of which are used in code generated by gcc. Same structure but differ in how to implement the initial test.
jump to middle
The first translation method, which we refer to as jump to middle, performs the initial test by performing an unconditional jump to the test at the end of the loop.
1 | goto test; |
Practice Problem 3.24
For C code having the general form
1
2
3
4
5
6
7
8
9 short loop_while(short a, short b)
{
short result = _________;
while (______) {
result = _________;
a = ___________;
}
return result;
}gcc, run with command-line option
-Og
, produces the following code
1
2
3
4
5
6
7
8
9
10
11
12 while_loop:
movl $0, %eax
jmp .L2
.L3:
movl %esi, %edx
imull %edi, %edx
addl %edx, %eax
subl $1, %edi
.L2:
cmpw %si, %di
jg .L3
rep retWe can see that the compiler used a jump-to-middle translation, using the
jmp
instruction to jump to the test starting with label .L2. Fill in the missing parts of the C code.
My solution : :white_check_mark:
1 | short loop_while(short a, short b) |
guarded do
first transforms the code into a do-while loop by using a conditional branch to skip over the loop if the initial test fails, used by higher optimization option
Using this implementation strategy, the compiler can often optimize the initial test, for example, determining that the test condition will always hold.
1 | t = test-expr; |
Practice Problem 3.25
For C code having the general form
1
2
3
4
5
6
7
8
9 long loop_while2(long a, long b)
{
long result = _______;
while (________) {
result = _________;
b = _______;
}
return result;
}gcc, run with command-line option
-O1
, produces the following code:
1
2
3
4
5
6
7
8
9
10
11
12
13 loop_while2:
testq %rsi, %rsi
jle .L4
movq %rsi, %rax
.L3:
imulq %rdi, %rax
subq %rdi, %rsi
testq %rsi, %rsi
jg .L3
rep ret
.L4:
movq %rsi, %rax
retFill in the missing parts of the C code. Note that the control structure in the assembly code does not exactly match what would be obtained by a direct translation of the C code according to our translation rules. However, you can fill out the missing portions of the C code in a way that it will have equivalent behavior to the assembly code.
My solution: :white_check_mark:
1 | long loop_while2(long a, long b) |
Practice Problem 3.26
A function
test_one
has the following overall structure:
1
2
3
4
5
6
7 short test_one(unsigned short x) {
short val = 1;
while ( ... ) {
...
}
return ...;
}The gcc C compiler generates the following assembly code:
1
2
3
4
5
6
7
8
9 test_one:
jmp .L2
.L3:
shrw %di
.L2:
testw %di, %di
jne .L3
movl $0, %eax
retReverse engineer the operation of this code and then do the following:
A. Determine what loop translation method was used.
B. Use the assembly-code version to fill in the missing parts of the C code.
C. Describe in English what this function computes
My solution : :white_check_mark:
1 | // shr with only one operand implies that you are shifting only one bit. |
A : jump-to-middle
For Loops
The C language standard states (with one exception, highlighted in Problem 3.29) that the behavior of such a loop is identical to the codes using a while loop
1 | for (init_expression ; test_expr ; update_expr){ |
The code generated by gcc
for a for loop then follows one of our two translation strategies for while
loops, depending on the optimization level.
1 | //jump to middle strategy |
Practice Problem 3.27
Write
goto
code for a function calledfibonacci
to print fibonacci numbers using a while loop. Apply the guarded-do transformation.
My solution : :white_check_mark:
1 | void fibonacci(int cot){ |
We see from this presentation that all three forms of loops in C—do-while, while, and for—can be translated by a simple strategy, generating code that contains one or more conditional branches.
Practice Problem 3.28
A function
test_two
has the following overall structure:
1
2
3
4
5
6
7
8 short test_two(unsigned short x) {
short val = 0;
short i;
for ( ... ; ... ; ... ) {
...
}
return val;
}The gcc C compiler generates the following assembly code:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15 test_two:
movl $64, %edx
movl $0, %eax
jmp .L2
.L3:
addq %rax, %rax
movq %rdi, %rcx
andl $1, %ecx
orq %rcx, %rax
shrq %rdi
subq $1, %rdx
.L2:
testq %rdx, %rdx
jne .L3
rep retReverse engineer the operation of this code and then do the following:
A. Use the assembly-code version to fill in the missing parts of the C code.
B. Explain why there is neither an initial test before the loop nor an initial jump to the test portion of the loop.
C. Describe in English what this function computes.
My solution : :white_check_mark:
1 | short test_two(unsigned short x) { |
This function reverse the bits of x, which means 0b111001 -> 0b100111
Practice Problem 3.29
Executing a
continue
statement in C causes the program to jump to the end of the current loop iteration. The stated rule for translating afor
loop into awhile
loop needs some refinement when dealing withcontinue
statements. For example, consider the following code:
1
2
3
4
5
6
7
8
9 /* Example of for loop containing a continue statement */
/* Sum even numbers between 0 and 9 */
long sum = 0;
long i;
for (i = 0; i < 10; i++) {
if (i & 1)
continue;
sum += i;
}A. What would we get if we naively applied our rule for translating the for loop into a while loop? What would be wrong with this code?
B. How could you replace the continue statement with a goto statement to ensure that the while loop correctly duplicates the behavior of the for loop?
My solution : :white_check_mark:
1 | //naive translation |
1 | //My translation |
Solution on the book:
The general solution is to replace the continue
statement with a goto
statement that skips the rest of the loop body and goes directly to the update portion
3.6.8 Switch Statements
A switch statement provides a multiway branching capability based on the value of an integer index.
Not only do they make the C code more readable, but they also allow an efficient implementation using a data structure called a jump table
- A jump table is an array where entry
i
is the address of a code segment implementing the action the program should take when the switch index equalsi
. - The advantage of using a jump table over a long sequence of
if-else
statements is that the time taken to perform the switch is independent of the number of switch cases - in
gcc
jump tables are used when there are a number of cases (e.g., four or more) and they span a small range of values.
In making this extension, the authors of gcc created a new operator
&&
to create a pointer for a code location.The compiler first shifts the range to between 0 and 6 by subtracting 100 from n
It further simplifies the branching possibilities by treating index as an unsigned value, making use of the fact that negative numbers in a two’s-complement representation map to large positive numbers in an unsigned representation
This computed goto is supported by gcc as an extension to the C language in line 16
Observe that the jump table handles duplicate cases by simply having the same code label (
loc_D
) for entries 4 and 6, and it handles missing cases by using the label for the default case (loc_def
) as entries 1 and 5.
- These declarations state that within the segment of the object-code file called
.rodata
(for “read-only data”), there should be a sequence of seven “quad” (8- byte) words, where the value of each word is given by the instruction address associated with the indicated assembly-code labels (e.g., .L3)
Practice Problem 3.30
In the C function that follows, we have omitted the body of the switch statement. In the C code, the case labels did not span a contiguous range, and some cases had multiple labels.
1
2
3
4
5
6
7 void switch2(short x, short *dest) {
short val = 0;
switch (x) {
//Body of switch statement omitted
}
*dest = val;
}In compiling the function, gcc generates the assembly code that follows for the initial part of the procedure, with variable
x
in%rdi
:
1
2
3
4
5
6
7 #void switch2(short x, short *dest)
#x in %rdi
switch2:
addq $2, %rdi
cmpq $8, %rdi
ja .L2
jmp *.L4(,%rdi,8)It generates the following code for the jump table:
1
2
3
4
5
6
7
8
9
10 .L4:
.quad .L9
.quad .L5
.quad .L6
.quad .L7
.quad .L2
.quad .L7
.quad .L8
.quad .L2
.quad .L5Based on this information, answer the following questions:
A. What were the values of the case labels in the switch statement?
B. What cases had multiple labels in the C code?
My solution : :white_check_mark:
1 | #void switch2(short x, short *dest) |
1 | void switch2(short x, short *dest) { |
Practice Problem 3.31
For a C function switcher with the general structure
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22 void switcher(long a, long b, long c, long *dest)
{
long val;
switch(a) {
case : /* Case A */
c = ;
/* Fall through */
case : /* Case B */
val = ;
break;
case : /* Case C */
case : /* Case D */
val = ;
break;
case : /* Case E */
val = ;
break;
default:
val = ;
}
*dest = val;
}gcc generates the assembly code and jump table shown below. Fill in the missing parts of the C code. Except for the ordering of case labels C and D, there is only one way to fit the different cases into the template.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35 switcher:
cmpq $7, %rdi
ja .L2
leaq .L4(%rip), %r8
movslq (%r8,%rdi,4), %rax
addq %r8, %rax
jmp *%rax
.section .rodata
.align 4
.align 4
.L4:
.long .L3-.L4
.long .L2-.L4
.long .L5-.L4
.long .L2-.L4
.long .L6-.L4
.long .L7-.L4
.long .L2-.L4
.long .L5-.L4
.text
.L6:
movq %rdi, %rsi
jmp .L2
.L7:
movq %rsi, %rdx
xorq $15, %rdx
.L3:
leaq 112(%rdx), %rsi
.L2:
movq %rsi, (%rcx)
ret
.L5:
addq %rdx, %rsi
salq $2, %rsi
jmp .L2
My solution : :white_check_mark:
1 | cmpq $7, %rdi # if ( (unsigned long)a > 7 ) default |
1 | void switcher(long a, long b, long c, long *dest) |
Verification:
1 | $ diff -s my.s test.s |
Recapitulation
For the logical operations, such as
xor
, theCF
andOF
are set to zerolea
do NOT change the conditional flagsset
instruction only change 8 bits(can only have 8 bits operand), setting it either0x00
or0x01
often used like this
1
2setl %al # other bits in %rax remain unchanged
movzbl %al, %eaxWhen result of computation is 32 bits, the high order 32 bits of a register are set to 0.
for
loop is equivalent towhile
loop, you can make conversion between them- in
for
loop, the first test of condition almost always holds, so inO1
optimization, the compiler will figure this out and throw away the “guard part” of a “guarded-do” implementation continue
infor
loop will NOT ignore the update-expression
- in
Case code in
switch
do NOT need curly brackets to specify the regionHowever, the value to be switched must be numerical value.
Tips
- In IA32, all arguments are passed by stack
- In IA32,
%rbp
is considered to be the bottom of the current stack, but now it is optional in x86, it can be used as a common register(if the caller function do not have any local variable on stack)