Ch3PartII

3.6 Control

Machine code provides two basic low-level mechanisms for implementing conditional behavior: it tests data values and then alters either the control flow or the data flow based on the results of these tests.

Data-dependent control flow is the more general and more common approach for implementing conditional behavior

[TOC]

3.6.1 Condition Codes

In addition to the integer registers, the CPU maintains a set of single-bit condition code registers describing attributes of the most recent arithmetic or logical operation.

  • CF: Carry flag. The most recent operation generated a carry out of the most significant bit. Used to detect overflow for unsigned operations.

    The carry flag is also set if the subtraction of two numbers requires a borrow into the most significant (leftmost) bits subtracted.

    1
    0000 - 0001 = 1111 (carry flag is turned on)
  • ZF: Zero flag. The most recent operation yielded zero.

  • SF: Sign flag. The most recent operation yielded a negative value.

  • OF: Overflow flag. The most recent operation caused a two’s-complement overflow—either negative or positive.

For example, suppose we used one of the add instructions to perform the equivalent of the C assignment t = a+b, where variables a, b, and t are integers.

1
2
3
4
CF (unsigned) t < (unsigned) a 			//Unsigned overflow
ZF (t == 0) //Zero
SF (t < 0) //Negative
OF (a < 0 == b < 0) && (t < 0 != a < 0) //Signed overflow

The leaq instruction does NOT alter any condition codes, since it is intended to be used in address computations.

Otherwise, ALL of the instructions listed in Figure 3.10 cause the condition codes to be set

  • For the logical operations, such as xor, the CF and OF are set to zero
  • For the shift operations, the CF is set to the last bit shifted out, while the OF is set to zero
  • For reasons that we will not delve into, the inc and dec instructions set the OF and ZF, but they leave the CF unchanged.
cmp classes

They behave in the same way as the sub instructions, except that they set the condition codes without updating their destinations.

  • These instructions set the zero flag if the two operands are equal.
  • The other flags can be used to determine ordering relations between the two operands.
test classes

The test instructions behave in the same manner as the and instructions, except that they set the condition codes without altering their destinations.

  • Typically, the same operand is repeated (e.g., testq %rax,%rax to see whether %rax is negative, zero, or positive)
  • Or one of the operands is a mask indicating which bits should be tested.

compare and test

3.6.2 Accessing the Condition Codes

Rather than reading the condition codes directly, there are three common ways of using the condition codes:

  • we can set a single byte to 0 or 1 depending on some combination of the condition codes
    • We refer to this entire class of instructions as the set instructions
    • they differ from one another based on which combinations of condition codes they consider, as indicated by the different suffixes for the instruction names.
    • For example, instructions setl and setb denote “set less” and “set below,” not “set long word” or “set byte.”
  • we can conditionally jump to some other part of the program
  • we can conditionally transfer data
set classes

set instructions

A set instruction has either one of the low-order single-byte register elements or a single-byte memory location as its destination, setting this byte to either 0 or 1.

  • To generate a 32-bit or 64-bit result, we must also clear the high-order bits.

    For example

    1
    2
    3
    int comp(data_t a, data_t b){
    return a < b;
    }
    1
    2
    3
    4
    5
    6
    # a in %rdi, b in %rsi
    comp:
    cmpq %rsi, %rdi # Compare a:b
    setl %al # Set low-order byte of %eax to 0 or 1
    movzbl %al, %eax # Clear rest of %eax (and rest of %rax)
    ret

    Note the comparison order of the cmpq instruction . Although the arguments are listed in the order %rsi (b), then %rdi (a), the comparison is really between a and b(subtract b from a)

  • For some of the instructions, there are multiple possible names, which listed as “synonyms.”

    For example, both setg and setnle refer to the same machine instruction. Compilers and disassemblers make arbitrary choices of which names to use.

  • Although all arithmetic and logical operations set the condition codes, the descriptions of the different set instructions apply to the case where a comparison instruction has been executed, setting the condition codes according to the computation t = a-b.

Machine codes do not distinguish signed and unsigned value because the arithmetic of them are the same at bit level. If distinguish do matter, it use different instructions, such as using different versions of right shifts, division and multiplication instructions, and different combinations of condition codes.

Practice Problem 3.13

The C code

1
2
3
int comp(data_t a, data_t b) {
return a COMP b;
}

shows a general comparison between arguments a and b, where data_t, the data type of the arguments, is defined (via typedef) to be one of the integer data types listed in Figure 3.1 and either signed or unsigned. The comparison COMP is defined via #define.

Suppose a is in some portion of %rdx while b is in some portion of %rsi. For each of the following instruction sequences, determine which data types data_t and which comparisons COMP could cause the compiler to generate this code. (There can be multiple correct answers; you should list them all.)

1
2
3
4
5
6
7
8
9
10
11
12
A. 
cmpl %esi, %edi
setl %al
B.
cmpw %si, %di
setge %al
C.
cmpb %sil, %dil
setbe %al
D.
cmpq %rsi, %rdi
setne %al

My solution: :white_check_mark:

1
2
3
4
5
6
7
8
9
10
11
12
//A
typedef data_t int
#define COMP <
//B
typedef data_t short
#define COMP >=
//C
typedef data_t unsigned char
#define COMP <=
//D
typedef data_t unsigned long //or long or pointer of any type
#define COMP !=

Practice Problem 3.14

The C code

1
2
3
int test(data_t a) {
return a TEST 0;
}

shows a general comparison between argument a and 0, where we can set the data type of the argument by declaring data_t with a typedef, and the nature of the comparison by declaring TEST with a #define declaration. The following instruction sequences implement the comparison, where a is held in some portion of register %rdi. For each sequence, determine which data types data_t and which comparisons TEST could cause the compiler to generate this code. (There can be multiple correct answers; list all correct ones.)

1
2
3
4
5
6
7
8
9
10
11
12
A. 
testq %rdi, %rdi
setge %al
B.
testw %di, %di
sete %al
C.
testb %dil, %dil
seta %al
D.
testl %edi, %edi
setle %al

My solution: :white_check_mark:

1
2
3
4
5
6
7
8
9
10
11
12
//A
typedef data_t long
#define TEST >=
//B
typedef data_t short // or unsigned short
#define TEST ==
//C
typedef data_t unsigned char
#define TEST >
//D
typedef data_t int
#define TEST <=

3.6.3 Jump Instructions

A jump instruction can cause the execution to switch to a completely new position in the program.

These jump destinations are generally indicated in assembly code by a Label

For example

1
2
3
4
5
	movq $0,%rax 	#Set %rax to 0
jmp .L1 #Goto .L1
movq (%rax),%rdx#Null pointer dereference (skipped)
.L1:
popq %rdx #Jump target

In generating the object-code file, the assembler determines the addresses of all labeled instructions and encodes the jump targets (the addresses of the destination instructions) as part of the jump instructions.

jump instructions

  • The jmp instruction jumps unconditionally.

    • It can be either a direct jump, where the jump target is encoded as part of the instruction

      • Direct jumps are written in assembly code by giving a label as the jump target, for example, the label .L1 in the code shown.
    • or an indirect jump, where the jump target is read from a register or a memory location

      • Indirect jumps are written using ‘*’ followed by an operand specifier, for exmple

        1
        2
        jmp *%rax  # uses the value in register %rax as the jump target
        jmp *(%rax)# reads the jump target from memory, using the value in %rax as the read address
  • The remaining jump instructions in the table are conditional—they either jump or continue executing at the next instruction in the code sequence, depending on some combination of the condition codes

    • Conditional jumps can only be direct

3.6.4 Jump Instruction Encodings

understanding how the targets of jump instructions are encoded will become important when we study linking in Chapter 7

There are several different encodings for jumps, but some of the most commonly used ones are PC relative

  • That is, they encode the difference between the address of the target instruction and the address of the instruction immediately following the jump.
  • These offsets can be encoded using 1, 2, or 4 bytes.

A second encoding method is to give an “absolute” address, using 4 bytes to directly specify the target.

The assembler and linker select the appropriate encodings of the jump destinations.

Example of PC relative
1
2
3
4
5
6
7
8
movq %rdi, %rax
jmp .L2
.L3:
sarq %rax
.L2:
testq %rax, %rax
jg .L3
rep; ret
1
2
3
4
5
6
0: 48 89 f8 	mov %rdi,%rax    
3: eb 03 jmp 8 <loop+0x8>
5: 48 d1 f8 sar %rax
8: 48 85 c0 test %rax,%rax
b: 7f f8 jg 5 <loop+0x5>
d: f3 c3 repz retq
  • loop is the name of this session, indicating the address of start point of these instructions

  • <loop + x> indicates the instruction x bytes away from the start point

  • on line 3, <loop + 8> is encoded to 03 calculated by target address - next instruction address = 0x8 - 0x5 = 0x3

    on line b, <loop + 0x5 is encoded to f8 calculated by 0x5 - 0xd = -8 = 0b11111000 = 0xf8

1
2
3
4
5
6
7
# after linking
4004d0: 48 89 f8 mov %rdi,%rax
4004d3: eb 03 jmp 4004d8 <loop+0x8>
4004d5: 48 d1 f8 sar %rax
4004d8: 48 85 c0 test %rax,%rax
4004db: 7f f8 jg 4004d5 <loop+0x5>
4004dd: f3 c3 repz retq

The instructions have been relocated to different addresses, but the encodings of the jump targets in lines 2 and 5 remain unchanged.

Advantages of PC relative encoding
  • the instructions can be compactly encoded (requiring just 2 bytes)
  • the object code can be shifted to different positions in memory without alteration.

Practice Problem 3.15

In the following excerpts from a disassembled binary, some of the information has been replaced by X’s. Answer the following questions about these instructions.

A. What is the target of the je instruction below? (You do not need to know anything about the callq instruction here

1
2
4003fa: 74 02 je XXXXXX
4003fc: ff d0 callq *%rax

B. What is the target of the je instruction below?

1
2
40042f: 74 f4 je XXXXXX
400431: 5d pop %rbp

C. What is the address of the ja and pop instructions?

1
2
XXXXXX: 77 02 ja 400547
XXXXXX: 5d pop %rbp

D. In the code that follows, the jump target is encoded in PC-relative form as a 4- byte two’s-complement number. The bytes are listed from least significant to most, reflecting the little-endian byte ordering of x86-64. What is the address of the jump target?

1
2
4005e8: e9 73 ff ff ff jmpq XXXXXXX
4005ed: 90 nop

My solution : :white_check_mark:

1
2
3
4
5
6
7
8
9
10
11
12
# A
4003fa: 74 02 je 0x4003fe # 0x4003fc + 0x02 = 0x4003fe
4003fc: ff d0 callq *%rax
# B
40042f: 74 f4 je 0x400425 # 0x400431 + -0xc = 0x400425
400431: 5d pop %rbp
# C
400543: 77 02 ja 400547 # xxxxxx+0x02 = 0x400547
400545: 5d pop %rbp # 0x400545 - 0x2 (bytes) = 0x400543
# D
4005e8: e9 73 ff ff ff jmpq 0x400560# 0x4005ed + -0x8d = 0x400560
4005ed: 90 nop

3.6.5 Implementing Conditional Branches with Conditional Control

The most general way to translate conditional expressions and statements from C into machine code is to use combinations of conditional and unconditional jumps.

goto statement in C is similar to unconditional jump in assembly. Using goto statements is generally considered a bad programming style, since their use can make code very difficult to read and debug.

example codes

Assembly if-else constructure in C style codes

1
2
3
4
5
6
7
8
t = test-expr;
if (!t)
goto false;
then-statement
goto done;
false:
else-statement
done:

Practice Problem 3.16

When given the C code

1
2
3
4
5
void cond(short a, short *p)
{
if (a && *p < a)
*p = a;
}

gcc generates the following assembly code:

1
2
3
4
5
6
7
8
9
10
# void cond(short a, short *p)
# a in %rdi, p in %rsi
cond:
testw %di, %di
je .L1
cmpw %di, (%rsi)
jge .L1
movw %di, (%rsi)
.L1:
rep ret

A. Write a goto version in C that performs the same computation and mimics the control flow of the assembly code, in the style shown in Figure 3.16(b). You might find it helpful to first annotate the assembly code as we have done in our examples.

B. Explain why the assembly code contains two conditional branches, even though the C code has only one if statement.

My solution : :white_check_mark:

1
2
3
4
5
6
7
8
9
10
//A
void cond(short a, short *p){
if (a == 0) goto L1;
if (*p >= a ) goto L1;
else *p = a;
L1:
return;
}
//B
/* Because there are two conditions to be judged */

Practice Problem 3.17

An alternate rule for translating if statements into goto code is as follows:

1
2
3
4
5
6
7
8
t = test-expr;
if (t)
goto true;
else-statement
goto done;
true:
then-statement
done:

A. Rewrite the goto version of absdiff_se based on this alternate rule.

B. Can you think of any reasons for choosing one rule over the other?

My solution : :warning:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
long lt_cnt = 0;
long ge_cnt = 0;
long gotodiff_se(long x , long y ){
long result ;
if (x < y) goto True;
else {
ge_cnt++;
result = x - y;
}
goto Done;
True:
lt_cnt++;
result = y - x ;
Done:
return result ;
}

Put else-statement before then-statement can drop a ! Instruction.

Solution on the book :

In most respects, the choice is arbitrary. But the original rule works better for the common case where there is no else statement

Practice Problem 3.18

Starting with C code of the form

1
2
3
4
5
6
7
8
9
10
11
12
short test(short x, short y, short z) {
short val = ________;
if (______) {
if (_______)
val = ________;
else
val = ________;
}
else if (________)
val = ________;
return val;
}

gcc generates the following assembly code:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
#short test(short x, short y, short z)
#x in %rdi, y in %rsi, z in %rdx
test:
leal (%rdx,%rsi), %eax
subl %edi, %eax
cmpw $5, %dx
jle .L2
cmpw $2, %si
jle .L3
movswl %di, %eax
movswl %dx, %ecx
cltd
idivl %ecx
ret
.L3:
movswl %di, %eax
movswl %si, %esi
cltd
idivl %esi
ret
.L2:
cmpw $2, %dx
jg .L1
movswl %dx, %eax
movswl %si, %esi
cltd
idivl %esi
.L1:
rep ret

Fill in the missing expressions in the C code.

My solution: :white_check_mark:

1
2
3
4
5
6
7
8
9
10
11
12
short test(short x, short y, short z) {
short val = y + z - x;
if ( z > 5 ) {
if ( y > 2)
val = x / z;
else
val = x / y;
}
else if (z <= 2)
val = z / y;
return val;
}

Verification

1
2
3
4
5
$ diff -s my.s test.s
1c1
< .file "my.c"
---
> .file "test.c"

3.6.6 Implementing Conditional Branches with Conditional Moves

The transfer of control is simple but also inefficient. Modern processors prefer a conditional move of data

  • This approach computes both outcomes of a conditional operation and then selects one based on whether or not the condition holds.
  • This strategy makes sense only in restricted cases

transfer data codes

The key is that the single cmovge instruction of the assembly code. It will transfer the data from the source register to the destination, only if the cmpq instruction indicates that the condition holds (as indicated by the suffix)

Why data-transfer outperform control-transfer ?
  • Modern computers use pipline to execute instructions. To do this requires being able to determine the sequence of instructions to be executed well ahead of time in order to keep the pipeline full of instructions to be executed.

    Data transfer do not have jmp

  • When a jmp instruction appears, the processor do not know the following instruction sequence

  • Although the processor can guess the condition will hold or not via “branch prediction logic”, once the prediction is wrong, all previous work based on the prediction is wasted.

The flow of control does not depend on data, and this makes it easier for the processor to keep its pipeline full.

Practice Problem 3.19

Running on a new processor model, our code required around 45 cycles when the branching pattern was random, and around 25 cycles when the pattern was highly predictable.

A. What is the approximate miss penalty?

B. How many cycles would the function require when the branch is mispredicted?

aside

My solution: :white_check_mark:

$$
Tavg(0.5) = Tok+0.5Tmp =45
$$

$$
Tok=25
$$

$$
\therefore Tmp = 40 (\text{clock cycle})
$$

A : approximate miss penalty is 40 clock
$$
Tran = Tavg(1) = Tok + Tmp = 65
$$
B : 65 clock cycles needed.


Conditional move instructions

cmov

  • The source value is read from either memory or the source register, but it is copied to the destination only if the specified condition holds.

  • The source and destination values can be 16, 32, or 64 bits long. Singlebyte conditional moves are not supported.

    The processor can infer the data size from the destination, so the same instruction name can be used for all operand lengths.

  • Unlike conditional jumps, the processor can execute conditional move instructions without having to predict the outcome of the test. The processor simply reads the source value (possibly from memory), checks the condition code, and then either updates the destination register or keeps it the same.

Standard form
1
v = test-expr ? then-expr : else-expr;

will be converted to

1
2
3
4
v = then-expr;
ve = else-expr;
t = test-expr;
if (!t) v = ve;

The final statement in this sequence is implemented with a conditional move— value ve is copied to v only if test condition t does not hold.

Two disadvantages
  • data-transfer evalute both expressions regardless of the condition, if one of those two expressions could possibly generate an error condition or a side effect, this could lead to invalid behavior.

    For example

    1
    2
    3
    long cread(long *xp) {
    return (xp ? *xp : 0);
    }
    1
    2
    3
    4
    5
    6
    7
    # Invalid implementation of function cread !!!
    cread:
    movq (%rdi), %rax # v = *xp
    testq %rdi, %rdi # Test x
    movl $0, %edx # Set ve = 0
    cmove %rdx, %rax # If x==0, v = ve
    ret

    Actual gcc output

    1
    2
    3
    4
    5
    6
    7
    	testq	%rdi, %rdi
    je .L3
    movq (%rdi), %rax
    ret
    .L3:
    movl $0, %eax
    ret
  • Data-transfer do not always improve code efficiency, if either the then-expr or the else-expr evaluation requires a significant computation, then this effort is wasted when the corresponding condition does not hold

gcc only uses conditional moves when the two expressions can be computed very easily, for example, with single add instructions.

Practice Problem 3.20

In the following C function, we have left the definition of operation OP incomplete:

1
2
3
4
#define OP _________ /* Unknown operator */
short arith(short x) {
return x OP 16;
}

When compiled, gcc generates the following assembly code

1
2
3
4
5
6
7
8
# short arith(short x)
# x in %rdi
arith:
leaq 15(%rdi), %rbx
testq %rdi, %rdi
cmovns %rdi, %rbx
sarq $4, %rbx
ret

A. What operation is OP?

B. Annotate the code to explain how it works

My solution : :x:

1
2
3
4
5
6
7
8
# short arith(short x)
# x in %rdi
arith:
leaq 15(%rdi), %rbx # %rbx = 15 + x
testq %rdi, %rdi # x & x
cmovns %rdi, %rbx # if (x >= 0) %rbx = x
sarq $4, %rbx # %rbx >> 4
ret
1
2
3
4
5
6
7
short arith(short x){
short res1 = (15 + x ) >> 4 ;
short res2 = (x) >> 4;
if (x >= 0 ) res1 = res2 ;
return res1;
}
//return x < 0 ? (15 + x) >> 4 : x/ 16;
1
#define OP <0?(15+x)>>4:x/

Solution on the book:

The operator is ‘/’. The program creates a temporary value equal to x + 15, in anticipation of x being negative and therefore requiring biasing.

Practice Problem 3.21

Starting with C code of the form

1
2
3
4
5
6
7
8
9
10
11
12
short test(short x, short y) {
short val = _______;
if (_______) {
if (_______)
val = ________;
else
val = ________;
}
else if (________)
val = __________;
return val;
}

gcc generates the following assembly code:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
# short test(short x, short y)
# x in %rdi, y in %rsi
test:
leal 12(%rsi), %eax
testw %di, %di
js .L5
cmpw $10, %si
jle .L1
movswl %di, %eax
movswl %si, %esi
cltd
idivl %esi
.L1:
rep ret
.L5:
cmpw %di, %si
jle .L3
movl %edi, %eax
imull %esi, %eax
ret
.L3:
movl %esi, %eax
orl %edi, %eax
ret

Fill in the missing expressions in the C code.

My solution : :white_check_mark:

1
2
3
4
5
6
7
8
9
10
11
12
short test(short x, short y) {
short val = y + 12 ;
if ( x < 0 ) {
if ( x < y)
val = y * x;
else
val = x | y;
}
else if ( y > 10 )
val = x / y;
return val;
}

fuck you, global edition, but I am too poor to buy the NA version…

3.6.7 Loops

combinations of conditional tests and jumps are used to implement the effect of loops

Do-While Loops

1
2
3
4
5
6
7
8
9
10
do{
body-statement;
}
while (test-expr);
//equivalent to
loop:
body-statement
t = test-expr;
if (t)
goto loop;

dowhileloop

Practice Problem 3.22 :white_check_mark:

A. Try to calculate 14! with a 32-bit int. Verify whether the computation of 14! overflows.

B. What if the computation is done with a 64-bit long int?

1
2
3
4
5
6
7
8
9
10
11
12
13
#include <stdio.h>
#include <stdlib.h>
int factorial(int n){
int res = 1;
do{
res *= n;
n--;
}while(n > 1 );
return res;
}
int main(int argc , char * argv []){
printf("%d\n" , factorial(atoi(argv[1])));
}
1
2
3
4
$ ./test 14
1278945280
$ python3 -c "import math;print(math.factorial(14))"
87178291200
1
2
3
4
5
>>> realres = math.factorial(14)
>>> m = 2**32
>>> intres = realres % m
>>> intres
1278945280

Overflow happens.

When using long, the result is correct

1
2
$ ./test 14
87178291200
1
2
>>> factorial(14) < 2**64
True

Practice Problem 3.23

For the C code

1
2
3
4
5
6
7
8
9
10
11
short dw_loop(short x) {
short y = x/9;
short *p = &x;
short n = 4*x;
do {
x += y;
(*p) += 5;
n -= 2;
} while (n > 0);
return x;
}

gcc generates the following assembly code:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
dw_loop:
movl %edi, %eax
movswl %di, %ecx
imull $7282, %ecx, %ecx
shrl $16, %ecx
movl %edi, %edx
sarw $15, %dx
subl %edx, %ecx
sall $2, %edi
.L2:
leal 5(%rcx,%rax), %eax
leal -2(%rdi), %edx
movl %edx, %edi
testw %dx, %dx
jg .L2
rep ret

A. Which registers are used to hold program values x, y, and n?

B. How has the compiler eliminated the need for pointer variable p and the pointer dereferencing implied by the expression (*p)+=5?

C. Add annotations to the assembly code describing the operation of the program, similar to those shown in Figure 3.19(c).

My solution: :white_check_mark:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
# short dw_loop(short x)
# x in %di
# The three operand form multiplies %ecx and 7282 together and stores the result in %ecx
# 7282 here is a magic number to replace division
# see more details at https://homepage.divms.uiowa.edu/~jones/bcd/divide.html
dw_loop:
movl %edi, %eax # int res = x;
movswl %di, %ecx # int cot = x;
imull $7282, %ecx, %ecx # cot *= 7282
shrl $16, %ecx # cot /= 65536
movl %edi, %edx # int d = x;
sarw $15, %dx # d /= 32768
subl %edx, %ecx # cot -= d
sall $2, %edi # x *= 4
.L2:
leal 5(%rcx,%rax), %eax # res = 5 + cot + res
leal -2(%rdi), %edx # d -= 2
movl %edx, %edi # x = d
testw %dx, %dx # if ( d > 0) goto L2;
jg .L2
rep ret

A: As we can see that y = cot, in %ecx,n = d in%edx, and finally x in %edi

B: Since the pointer holds the address of a local variable, there is no need to really access memory, so just another temporary variable saver is OK. In this case, x += y; (*p) += 5 is merged into x = x + y + 5

While Loops

There are a number of ways to translate a while loop into machine code, two of which are used in code generated by gcc. Same structure but differ in how to implement the initial test.

jump to middle

The first translation method, which we refer to as jump to middle, performs the initial test by performing an unconditional jump to the test at the end of the loop.

1
2
3
4
5
6
7
	goto test;
loop:
body-statement
test:
t = test-expr;
if (t)
goto loop;

while loop codes example

Practice Problem 3.24

For C code having the general form

1
2
3
4
5
6
7
8
9
short loop_while(short a, short b)
{
short result = _________;
while (______) {
result = _________;
a = ___________;
}
return result;
}

gcc, run with command-line option -Og, produces the following code

1
2
3
4
5
6
7
8
9
10
11
12
while_loop:
movl $0, %eax
jmp .L2
.L3:
movl %esi, %edx
imull %edi, %edx
addl %edx, %eax
subl $1, %edi
.L2:
cmpw %si, %di
jg .L3
rep ret

We can see that the compiler used a jump-to-middle translation, using the jmp instruction to jump to the test starting with label .L2. Fill in the missing parts of the C code.

My solution : :white_check_mark:

1
2
3
4
5
6
7
8
9
short loop_while(short a, short b)
{
short result = 0;
while (a > b) {
result = result + a*b;
a = a - 1;
}
return result;
}

guarded do

first transforms the code into a do-while loop by using a conditional branch to skip over the loop if the initial test fails, used by higher optimization option

Using this implementation strategy, the compiler can often optimize the initial test, for example, determining that the test condition will always hold.

1
2
3
4
5
6
7
	t = test-expr;
if (!t)
goto done;
do
body-statement
while (test-expr);
done:

guardeddo

Practice Problem 3.25

For C code having the general form

1
2
3
4
5
6
7
8
9
long loop_while2(long a, long b)
{
long result = _______;
while (________) {
result = _________;
b = _______;
}
return result;
}

gcc, run with command-line option -O1, produces the following code:

1
2
3
4
5
6
7
8
9
10
11
12
13
loop_while2:
testq %rsi, %rsi
jle .L4
movq %rsi, %rax
.L3:
imulq %rdi, %rax
subq %rdi, %rsi
testq %rsi, %rsi
jg .L3
rep ret
.L4:
movq %rsi, %rax
ret

Fill in the missing parts of the C code. Note that the control structure in the assembly code does not exactly match what would be obtained by a direct translation of the C code according to our translation rules. However, you can fill out the missing portions of the C code in a way that it will have equivalent behavior to the assembly code.

My solution: :white_check_mark:

1
2
3
4
5
6
7
8
9
long loop_while2(long a, long b)
{
long result = b;
while (b > 0) {
result = result * a;
b = b - a;
}
return result;
}

Practice Problem 3.26

A function test_one has the following overall structure:

1
2
3
4
5
6
7
short test_one(unsigned short x) {
short val = 1;
while ( ... ) {
...
}
return ...;
}

The gcc C compiler generates the following assembly code:

1
2
3
4
5
6
7
8
9
test_one:
jmp .L2
.L3:
shrw %di
.L2:
testw %di, %di
jne .L3
movl $0, %eax
ret

Reverse engineer the operation of this code and then do the following:

A. Determine what loop translation method was used.

B. Use the assembly-code version to fill in the missing parts of the C code.

C. Describe in English what this function computes

My solution : :white_check_mark:

1
2
3
4
5
6
7
8
// shr with only one operand implies that you are shifting only one bit.
short test_one(unsigned short x) {
short val = 1;
while ( x != 0 ) {
x >>= 1;
}
return 0;
}

A : jump-to-middle

For Loops

The C language standard states (with one exception, highlighted in Problem 3.29) that the behavior of such a loop is identical to the codes using a while loop

1
2
3
4
5
6
7
8
9
for (init_expression ; test_expr ; update_expr){
body_expressions ;
}
// equivalent to
init_expression ;
while(test_expr){
body_expression ;
update_expr;
}

The code generated by gcc for a for loop then follows one of our two translation strategies for while loops, depending on the optimization level.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
//jump to middle strategy
init-expr;
goto test;
loop:
body-statement
update-expr;
test:
t = test-expr;
if (t)
goto loop;
//guarded do strategy
init-expr;
t = test-expr;
if (!t)
goto done;
loop:
body-statement
update-expr;
t = test-expr;
if (t)
goto loop;
done:

Practice Problem 3.27

Write goto code for a function called fibonacci to print fibonacci numbers using a while loop. Apply the guarded-do transformation.

My solution : :white_check_mark:

1
2
3
4
5
6
7
8
9
10
11
12
13
void fibonacci(int cot){
int one = 0 ;
int two = 1 ;
if (cot <= 0 ) goto Done ; //guard
Loop:
printf("%d\n" , one);
two += one ;
one = two - one ;
cot--;
if (cot != 0) goto Loop;
Done:
return ;
}

We see from this presentation that all three forms of loops in C—do-while, while, and for—can be translated by a simple strategy, generating code that contains one or more conditional branches.

Practice Problem 3.28

A function test_two has the following overall structure:

1
2
3
4
5
6
7
8
short test_two(unsigned short x) {
short val = 0;
short i;
for ( ... ; ... ; ... ) {
...
}
return val;
}

The gcc C compiler generates the following assembly code:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
test_two:
movl $64, %edx
movl $0, %eax
jmp .L2
.L3:
addq %rax, %rax
movq %rdi, %rcx
andl $1, %ecx
orq %rcx, %rax
shrq %rdi
subq $1, %rdx
.L2:
testq %rdx, %rdx
jne .L3
rep ret

Reverse engineer the operation of this code and then do the following:

A. Use the assembly-code version to fill in the missing parts of the C code.

B. Explain why there is neither an initial test before the loop nor an initial jump to the test portion of the loop.

C. Describe in English what this function computes.

My solution : :white_check_mark:

1
2
3
4
5
6
7
8
9
10
short test_two(unsigned short x) {
short val = 0;
short i;
for ( i = 64 ; i != 0 ; i-- ) {
val <<= 1 ; //val = val + val
val = (x & 1) | val;
x >>= 1;
}
return val;
}

This function reverse the bits of x, which means 0b111001 -> 0b100111

Practice Problem 3.29

Executing a continue statement in C causes the program to jump to the end of the current loop iteration. The stated rule for translating a for loop into a while loop needs some refinement when dealing with continue statements. For example, consider the following code:

1
2
3
4
5
6
7
8
9
/* Example of for loop containing a continue statement */
/* Sum even numbers between 0 and 9 */
long sum = 0;
long i;
for (i = 0; i < 10; i++) {
if (i & 1)
continue;
sum += i;
}

A. What would we get if we naively applied our rule for translating the for loop into a while loop? What would be wrong with this code?

B. How could you replace the continue statement with a goto statement to ensure that the while loop correctly duplicates the behavior of the for loop?

My solution : :white_check_mark:

1
2
3
4
5
6
7
8
9
10
11
12
13
//naive translation
//The updata-expression won't be executed !
long sum = 0;
long i ;
i = 0;
goto Test;
Loop:
if (i & 1) goto Test;
sum += i;
i++;
Test:
if (i < 10) goto Loop;
return
1
2
3
4
5
6
7
8
9
10
11
//My translation
long sum = 0;
long i = 0;
if (i >= 10) goto Done;
Loop:
if (i & 1) goto Continue;
sum += i;
Continue:
i++;
Done:
return

Solution on the book:

The general solution is to replace the continue statement with a goto statement that skips the rest of the loop body and goes directly to the update portion


3.6.8 Switch Statements

A switch statement provides a multiway branching capability based on the value of an integer index.

Not only do they make the C code more readable, but they also allow an efficient implementation using a data structure called a jump table

  • A jump table is an array where entry i is the address of a code segment implementing the action the program should take when the switch index equals i.
  • The advantage of using a jump table over a long sequence of if-else statements is that the time taken to perform the switch is independent of the number of switch cases
  • in gcc jump tables are used when there are a number of cases (e.g., four or more) and they span a small range of values.

switch case codes

  • In making this extension, the authors of gcc created a new operator && to create a pointer for a code location.

  • The compiler first shifts the range to between 0 and 6 by subtracting 100 from n

  • It further simplifies the branching possibilities by treating index as an unsigned value, making use of the fact that negative numbers in a two’s-complement representation map to large positive numbers in an unsigned representation

  • This computed goto is supported by gcc as an extension to the C language in line 16

  • Observe that the jump table handles duplicate cases by simply having the same code label (loc_D) for entries 4 and 6, and it handles missing cases by using the label for the default case (loc_def) as entries 1 and 5.

switch case assembly codes

jump table

  • These declarations state that within the segment of the object-code file called .rodata (for “read-only data”), there should be a sequence of seven “quad” (8- byte) words, where the value of each word is given by the instruction address associated with the indicated assembly-code labels (e.g., .L3)

Practice Problem 3.30

In the C function that follows, we have omitted the body of the switch statement. In the C code, the case labels did not span a contiguous range, and some cases had multiple labels.

1
2
3
4
5
6
7
void switch2(short x, short *dest) {
short val = 0;
switch (x) {
//Body of switch statement omitted
}
*dest = val;
}

In compiling the function, gcc generates the assembly code that follows for the initial part of the procedure, with variable x in %rdi:

1
2
3
4
5
6
7
#void switch2(short x, short *dest)
#x in %rdi
switch2:
addq $2, %rdi
cmpq $8, %rdi
ja .L2
jmp *.L4(,%rdi,8)

It generates the following code for the jump table:

1
2
3
4
5
6
7
8
9
10
.L4:
.quad .L9
.quad .L5
.quad .L6
.quad .L7
.quad .L2
.quad .L7
.quad .L8
.quad .L2
.quad .L5

Based on this information, answer the following questions:

A. What were the values of the case labels in the switch statement?

B. What cases had multiple labels in the C code?

My solution : :white_check_mark:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
#void switch2(short x, short *dest)
#x in %rdi
switch2:
addq $2, %rdi # x += 2
cmpq $8, %rdi # if(x > 8) return
ja .L2
jmp *.L4(,%rdi,8) # goto (.L4 + 8 * x)

# x+2 == 0 : .L9
# x+2 == 1 : .L5
# x+2 == 2 : .L6
# x+2 == 3 : .L7
# x+2 == 4 : .L2
# x+2 == 5 : .L7
# x+2 == 6 : .L8
# x+2 == 7 : .L2
# x+2 == 8 : .L5
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
void switch2(short x, short *dest) {
short val = 0;
switch (x) {
case -2:{
....
}
case 6 :
case -1:{
...
}
case 0:{
....
}
case 3:
case 1:{
....
}
case 4:{
...
}
default:{
....
}
}
*dest = val;
}
//case 6 and -1
//case 3 and 1

Practice Problem 3.31

For a C function switcher with the general structure

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
void switcher(long a, long b, long c, long *dest)
{
long val;
switch(a) {
case : /* Case A */
c = ;
/* Fall through */
case : /* Case B */
val = ;
break;
case : /* Case C */
case : /* Case D */
val = ;
break;
case : /* Case E */
val = ;
break;
default:
val = ;
}
*dest = val;
}

gcc generates the assembly code and jump table shown below. Fill in the missing parts of the C code. Except for the ordering of case labels C and D, there is only one way to fit the different cases into the template.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
switcher:
cmpq $7, %rdi
ja .L2
leaq .L4(%rip), %r8
movslq (%r8,%rdi,4), %rax
addq %r8, %rax
jmp *%rax
.section .rodata
.align 4
.align 4
.L4:
.long .L3-.L4
.long .L2-.L4
.long .L5-.L4
.long .L2-.L4
.long .L6-.L4
.long .L7-.L4
.long .L2-.L4
.long .L5-.L4
.text
.L6:
movq %rdi, %rsi
jmp .L2
.L7:
movq %rsi, %rdx
xorq $15, %rdx
.L3:
leaq 112(%rdx), %rsi
.L2:
movq %rsi, (%rcx)
ret
.L5:
addq %rdx, %rsi
salq $2, %rsi
jmp .L2

My solution : :white_check_mark:

1
2
3
4
5
6
cmpq	$7, %rdi		# if ( (unsigned long)a > 7 ) default 
ja .L2
leaq .L4(%rip), %r8 # %r8 = .L4 + %rip
movslq (%r8,%rdi,4), %rax # %rax = .L4 + %rip + 4*a
addq %r8, %rax # %rax = (.L4 + %rip)*2 + 4*a
jmp *%rax
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
void switcher(long a, long b, long c, long *dest)
{
long val;
switch(a) {
case 5: /* Case A */
c = b ^ 15;
/* Fall through */
case 0: /* Case B */
val = 112 + c;
break;
case 2: /* Case C */
case 7: /* Case D */
val = (b+c) << 2;
break;
case 4: /* Case E */
val = a;
break;
default:
val = b;
}
*dest = val;
}

Verification:

1
2
3
4
5
$ diff -s my.s test.s
1c1
< .file "my.c"
---
> .file "test.c"

Recapitulation

  • For the logical operations, such as xor, the CF and OF are set to zero

  • lea do NOT change the conditional flags

  • set instruction only change 8 bits(can only have 8 bits operand), setting it either 0x00 or 0x01

    often used like this

    1
    2
    setl %al # other bits in %rax remain unchanged
    movzbl %al, %eax
  • When result of computation is 32 bits, the high order 32 bits of a register are set to 0.

  • for loop is equivalent to while loop, you can make conversion between them

    • in for loop, the first test of condition almost always holds, so in O1 optimization, the compiler will figure this out and throw away the “guard part” of a “guarded-do” implementation
    • continue in for loop will NOT ignore the update-expression
  • Case code in switch do NOT need curly brackets to specify the region

    However, the value to be switched must be numerical value.

Tips

  • In IA32, all arguments are passed by stack
  • In IA32,%rbp is considered to be the bottom of the current stack, but now it is optional in x86, it can be used as a common register(if the caller function do not have any local variable on stack)