Ch2

Representing and Manipulating Information

[TOC]

In isolation, a single bit is not very useful. When we group bits together and apply some interpretation that gives meaning to the different possible bit patterns, however, we can represent the elements of any finite set.

We will mainly focus on three most important representations of numbers:

  • Unsigned –> traditional binary notation
  • Signed –> 2’s complement
  • Floating-point –> base-2 version of scientific notation(IEEE)

Intro

just get some feeling of our dear computer

  • numbers in computer is finite, so sometimes result will overflow

    1
    2
    3
    4
    5
    6
    7
    #include <stdio.h>
    int main(){
    printf("200*300*400*500=%d\n" ,200*300*400*500 );
    printf("(3.14+1e20)-1e20=%f\n" ,(3.14+1e20)-1e20);
    printf("3.14+(1e20-1e20)=%f\n" ,3.14+(1e20-1e20));
    return 0;
    }
    1
    2
    3
    200*300*400*500=-884901888
    (3.14+1e20)-1e20=0.000000
    3.14+(1e20-1e20)=3.140000

    The different mathematical properties of integer versus floating-point arithmetic stem from the difference in how they handle the finiteness of their representations

    • integer representations can encode a comparatively small range of values, but do so precisely
    • floating-point representations can encode a wide range of values, but only approximately.
  • A number of computer security vulnerabilities have arisen due to some of the subtleties of computer arithmetic.

  • Computers use several different binary representations to encode numeric values. You will need to be familiar with these representations as you progress into machine-level programming

  • programmers need to have a clear understanding of how computer arithmetic relates to the more familiar integer and real arithmetic.

  • Everything said in this chapter about C also holds for C++. The Java language definition, on the other hand, created a new set of standards for numeric representations and operations.

2.1 Information Storage

Rather than modifying individual bit, most modern computers use blocks of 8 bits, or bytes, as the smallest addressable unit of memory

2.1.1 Hexadecimal Notation

hex

One simple trick for doing the conversion in your head is to memorize the decimal equivalents of hex digits A, C, and F. The hex values B, D, and E can be translated to decimal by computing their values relative to the first three.

If the total number of bits is not a multiple of 4, you should make the leftmost group be the one with fewer than 4 bits, effectively padding the number with leading zeros.


Practice Problem 2.1

Perform the following number conversions:

A. 0x25B9D2 to binary

B. binary 1010111001001001 to hexadecimal

C. 0xA8B3D to binary

D. binary 1100100010110110010110 to hexadecimal

My solution::white_check_mark:

$$
A:(25B9D2)_{16} = (0010 0101 1011 1001 1101 0010)_2
$$

$$
B : (1010 1110 0100 1001)2 = (AE49){16}
$$

$$
C: (A8B3D)_{16} = (1010 1000 1011 0011 1101)_2
$$

$$
D : (11 0010 0010 1101 1001 0110)_2 = (0011 0010 0010 1101 1001 0110)2 = (322D96){16}
$$

(solution on the book is wrong)

wrong


When a value x is a power of 2, that is, $x = 2^n$ for some nonnegative integer n, we can readily write x in hexadecimal form by remembering that the binary representation of x is simply 1 followed by n zeros.

The hexadecimal digit 0 represents 4 binary zeros.

1 hexadecimal 0 is equals to $2^4$

So, for n written in the form i + 4j , where 0 ≤ i ≤ 3, we can write x with a leading hex digit of 1 (i = 0), 2 (i = 1), 4 (i = 2), or 8 (i = 3), followed by j hexadecimal 0s.
$$
(2^i)\underbrace{000…000}_\text{j zeros}
$$


Practice Problem 2.2

Fill in the blank entries in the following table, giving the decimal and hexadecimal representations of different powers of 2:

n $2^n$(decimal) $2^n$(hexadecimal)
5 32 0x20
23
32768
0x2000
12
64
0x100

My solution: :white_check_mark:

$$
2^{23} = 2^3\times2^{10} \times 2^{10}= 8 \times 1048576 = 8388608
$$

$$
23 = 3 + 4 \times 5 , 2^{23} = (800000)_{16}
$$

$$
\log_2 (32768) = 15
$$

$$
15 = 3 + 3\times4 , 2^{15} = (8000)_{16}
$$

$$
(2000){16} = 2^{1+3\times4} = 2^{13} = 2^3 \times 2^{10} = (8192){10}
$$

$$
2^{12} = 4 \times 1024 = 4096
$$

$$
12 = 0 + 3\times 4, 2^{12} = (1000)_{16}
$$

$$
\log_2 {64} = 6
$$

$$
6 = 2+1\times 4 , 2^{6} = (40)_{16}
$$

$$
(100)_{16} = 2^{0+2\times4} = 256
$$

n $2^n$(decimal) $2^n$(hexadecimal)
5 32 0x20
23 8388608 0x800000
15 32768 0x8000
13 8192 0x2000
12 4096 0x1000
6 64 0x40
8 256 0x100

Converting between decimal and hexadecimal representations requires using multiplication or division to handle the general case.

(just the same method used when we convert between binary and decimal)


Practice Problem 2.3

A single byte can be represented by 2 hexadecimal digits.

Fill in the missing entries in the following table, giving the decimal, binary, and hexadecimal values of different byte patterns:

Decimal Binary Hexadecimal
0 0000 0000 0x00
158
76
145
1010 1110
0011 1100
1111 0001
0x75
0xBD
0xF5

My solution: :white_check_mark:

$$
158 = 9 \times 16 + 14(E)
$$

$$
9 = 0 \times 16 + 9 (9)
$$

$$
158 = (9E)_{16}
$$

Decimal Binary Hexadecimal
0 0000 0000 0x00
158 1001 1110 0x9E
76 0100 1100 0x4C
145 1001 0001 0x91
174 1010 1110 0xAE
60 0011 1100 0x3C
241 1111 0001 0xF1
117 0111 0101 0x75
189 1011 1101 0xBD
245 1111 0101 0xF5

Practice Problem 2.4

Without converting the numbers to decimal or binary, try to solve the following arithmetic problems, giving the answers in hexadecimal. Hint: Just modify the methods you use for performing decimal addition and subtraction to use base 16.

A. 0x605c + 0x5 =

B. 0x605c − 0x20 =

C. 0x605c + 32 =

D. 0x60fa − 0x605c =

My solution: :white_check_mark:

$$
A : (605C){16} + (5){16} = (6061)_{16}
$$

B. 0x605c − 0x20 = 0x603c

C. 0x605c + 32 = 0x605c + 0x20 = 0x607c
$$
+60FA\
-605C\
\rule{4cm}{0.4pt}\
+009E
$$
D. 0x60fa − 0x605c = 0x9E

just normal arithmetic with base 16


2.1.2 Data Sizes

  • Every computer has a word size, and a virtual address is encoded by such a word, so word size determine the maximum size of the virtual address space.

  • For a machine with a w-bit word size, the virtual addresses can range from 0 to $2^w − 1$, giving the program access to at most $2^w$ bytes.

  • We will therefore refer to programs as being either “32-bit programs” or “64-bit programs,” since the distinction lies in how a program is compiled, rather than the type of machine on which it runs.

    • 32-bits program : linux>gcc -m32 program.c
    • 64-bits program : linux>gcc -m64 program.c
    • Nowadays 32-bits seems to be deprecated and gcc default compile it as 64-bits, the -m32 parameter will result in missing file error on 64 bits machine.
  • To avoid the vagaries of relying on “typical” sizes and different compiler settings, ISO C99 introduced a class of data types where the data sizes are fixed regardless of compiler and machine settings : int32_t and int64_t,having exactly 4 and 8 bytes, respectively.

    Using fixed-size integer types is the best way for programmers to have close control over data representations.

    csize

  • One aspect of portability is to make the program insensitive to the exact sizes of the different data types.

    For example, many programmers historically assumed that an object declared as type int could be used to store a pointer. This works fine for most 32-bit programs, but it leads to problems for 64-bit programs.

    (the size of virtual address space differs!)

    (check size by using sizeof() before use it may be a good trick.)

2.1.3 Addressing and Byte Ordering

Byte ordering

  • where the least significant byte comes first—is referred to as little endian.

    where the most significant byte comes first—is referred to as big endian.

  • Example

    endian

  • Many recent microprocessor chips are bi-endian, meaning that they can be configured to operate as either little- or big-endian machines. In practice, however, byte ordering becomes fixed once a particular operating system is chosen.

    For example, ARM microprocessors, used in many cell phones, have hardware that can operate in either little- or big-endian mode, but the two most common operating systems for these chips—Android (from Google) and IOS (from Apple)—operate only in little-endian mode.

  • Just like the egg issue, there is no technological reason to choose one byte ordering convention over the other

Three case when byte ordering matters
  • code written for networking applications must follow established conventions for byte ordering to make sure the sending machine converts its internal representation to the network standard, while the receiving machine converts the network standard to its internal representation.

  • when inspecting machine-level programs, especially codes provided by a disassemble, the byte sequences representing integer data are related to endian.

    For example

    1
    4004d3: 01 05 43 0b 20 00 add %eax,0x200b43(%rip)

    little endian reads it as add eax,0x00200b43, while big endian will reads it as add eax,0x430b2000

  • when programs are written that circumvent the normal type system.

    In the C language, this can be done using a cast or a union to allow an object to be referenced according to a different data type from which it was created.

    Such coding tricks are strongly discouraged for most application programming, but they can be quite useful and even necessary for system-level programming.

    machines


Practice Problem 2.5

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
#include <stdio.h>

typedef unsigned char *byte_pointer;

void show_bytes(byte_pointer start, size_t len) {
int i;
for (i = 0; i < len; i++)
printf(" %.2x", start[i]);
printf("\n");
}

void show_int(int x) {
show_bytes((byte_pointer) &x, sizeof(int));
}

void show_float(float x) {
show_bytes((byte_pointer) &x, sizeof(float));
}

void show_pointer(void *x) {
show_bytes((byte_pointer) &x, sizeof(void *));
}

Consider the following three calls to show_bytes:

1
2
3
4
5
int a = 0x12345678; 
byte_pointer ap = (byte_pointer) &a;
show_bytes(ap, 1); /* A. */
show_bytes(ap, 2); /* B. */
show_bytes(ap, 3); /* C. */

Indicate the values that will be printed by each call on a little-endian machine and on a big-endian machine:

A. Little endian: ? Big endian: ?

B. Little endian: ? Big endian: ?

C. Little endian: ? Big endian: ?

My solution: :white_check_mark:

A. Little endian: 78 Big endian: 12

B. Little endian: 78 56 Big endian: 12 34

C. Little endian: 78 56 34 Big endian: 12 34 56

Practice Problem 2.6

Using show_int and show_float, we determine that the integer 2607352 has hexadecimal representation 0x0027C8F8, while the floating-point number 3510593.0 has hexadecimal representation 0x4A1F23E0.

A. Write the binary representations of these two hexadecimal values.

B. Shift these two strings relative to one another to maximize the number of matching bits. How many bits match?

C. What parts of the strings do not match?

My solution: :white_check_mark:

A:
$$
(0027C8F8)_{16} = (0000 0000 0010 0111 1100 1000 1111 1000)_2
$$

$$
(4A1F23E0)_{16} = (0100 1010 0001 1111 0010 0011 1110 0000)_2
$$

B:

right shift the second one 2 bits

1
2
00000000001001111100100011111000
01001010000111110010001111100000

21 bits match

C:

(according to IEEE the signed bit , exponent bits and the hidden 1 in mantissa do not match)

2.1.4 Representing Strings

A string in C is encoded by an array of characters terminated by the null (having value 0) character.

  • ASCII code for decimal digit $x$ happens to be 0x3$x$, and that the terminating byte has the hex representation 0x00
  • Text data are more platform independent than binary data. It has nothing to do with byte ordering. It is always a sequence.
  • It is someting can be viewed as big endian.

Practice Problem 2.7

What would be printed as a result of the following call to show_bytes?

1
const char *m = "mnopqr"; show_bytes((byte_pointer) m, strlen(m));

Note that letters ‘a’ through ‘z’ have ASCII codes 0x61 through 0x7A

My solution:

1
6d 6e 6f 70 71 72

Recall also that the library routine strlen does NOT count the terminating null character


2.1.5 Representing Code

Consider the following C function:

1
2
3
int sum(int x , int y){
return x + y ;
}

When compiled on our sample machines, we generate machine code having the following byte representations:

1
2
3
4
Linux32 	55 89 e5 8b 45 0c 03 45 08 c9 c3
Windows 55 89 e5 8b 45 0c 03 45 08 5d c3
Sun 81 c3 e0 08 90 02 00 09
Linux64 55 48 89 e5 89 7d fc 89 75 f8 03 45 fc c9 c3

Different machine types use different and incompatible instructions and encodings. Even identical processors running different operating systems have differences in their coding conventions and hence are not binary compatible.

A fundamental concept of computer systems is that a program, from the perspective of the machine, is simply a sequence of bytes.

2.1.6 Introduction to Boolean Algebra


Practice Problem 2.8

Fill in the following table showing the results of evaluating Boolean operations on bit vectors.

Operation Result
a [01001110]
b [11100001]
~a
~b
a&b
a|b
a^b

My solution: :white_check_mark:

Operation Result
a [01001110]
b [11100001]
~a [10110001]
~b [00011110]
a&b [01000000]
a|b [11101111]
a^b [10101111]

Practice Problem 2.9

Computers generate color pictures on a video screen or liquid crystal display by mixing three different colors of light: red, green, and blue. Imagine a simple scheme, with three different lights, each of which can be turned on or off, projecting onto a glass screen:

light

We can then create eight different colors based on the absence (0) or presence (1) of light sources R, G, and B:

Red Green Blue Color
0 0 0 Black
0 0 1 Blue
0 1 0 Green
0 1 1 Cyan
1 0 0 Red
1 0 1 Magenta
1 1 0 Yellow
1 1 1 White

Each of these colors can be represented as a bit vector of length 3, and we can apply Boolean operations to them.

A. The complement of a color is formed by turning off the lights that are on and turning on the lights that are off. What would be the complement of each of the eight colors listed above?

B. Describe the effect of applying Boolean operations on the following colors:

Blue | Green =

Yellow & Cyan =

Red ^ Magenta =

My solution: :white_check_mark:

A:

Red Green Blue Color Complement Color
0 0 0 Black White
0 0 1 Blue Yellow
0 1 0 Green Magenta
0 1 1 Cyan Red
1 0 0 Red Cyan
1 0 1 Magenta Green
1 1 0 Yellow Blue
1 1 1 White Black

B:

Blue | Green = 001 | 010 = 011 = Cyan

Yellow & Cyan = 110 & 011 = 010 = Green

Red ^ Magenta = 100 ^ 101 = 001 = Blue


2.1.7 Bit-Level Operations in C

1
2
3
4
5
void inplace_swap(int *x, int *y) {
*y = *x ^ *y; /* Step 1 */
*x = *x ^ *y; /* Step 2 */
*y = *x ^ *y; /* Step 3 */
}

There is NO performance advantage to this way of swapping; it is merely an intellectual amusement!!!

Step *x *y
Initial a b
Step 1 a a^b
Step 2 a^a^b=b a^b
Step 3 b b^a^b=a

Practice Problem 2.11

Armed with the function inplace_swap from Problem 2.10, you decide to write code that will reverse the elements of an array by swapping elements from opposite ends of the array, working toward the middle. You arrive at the following function:

1
2
3
4
5
void reverse_array(int a[], int cnt) {    
int first, last;
for (first = 0, last = cnt-1; first <= last;first++,last--)
inplace_swap(&a[first], &a[last]);
}

When you apply your function to an array containing elements 1, 2, 3, and 4, you find the array now has, as expected, elements 4, 3, 2, and 1. When you try it on an array with elements 1, 2, 3, 4, and 5, however, you are surprised to see that the array now has elements 5, 4, 0, 2, and 1. In fact, you discover that the code always works correctly on arrays of even length, but it sets the middle element to 0 whenever the array has odd length.

A. For an array of odd length cnt = 2k + 1, what are the values of variables first and last in the final iteration of function reverse_array?

B. Why does this call to function inplace_swap set the array element to 0?

C. What simple modification to the code for reverse_array would eliminate this problem?

My solution: :white_check_mark:

A: first = last = (2k+1-1)/2 = k

B: because inplace_swap use ^ to facilitate swapping rather than more reliable int temp variable, at the first step we set the value to a^a=0, and then we play with 0 all the time.

C: change first <= last to first < last


Mask

One common use of bit-level operations is to implement masking operations, where a mask is a bit pattern that indicates a selected set of bits within a word.

As an example, the mask 0xFF (having ones for the least significant 8 bits) indicates the low-order byte of a word. The bit-level operation x & 0xFF yields a value consisting of the least significant byte of x, but with all other bytes set to 0.

Practice Problem 2.12

Write C expressions, in terms of variable x, for the following values. Your code should work for any word size w ≥ 8. For reference, we show the result of evaluating the expressions for x = 0x87654321, with w = 32.

A. The least significant byte of x, with all other bits set to 0. [0x00000021]

B. All but the least significant byte of x complemented, with the least significant byte left unchanged. [0x789ABC21]

C. The least significant byte set to all ones, and all other bytes of x left unchanged. [0x876543FF]

My solution: :warning:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
#include <stdint.h>    
int main(){
unsigned int x = 0x87654321 ; //w = 32
uint64_t y = 0xFEDCBA9876543210; // w = 64

printf("32 bits : 0x%x , 64 bits : 0x%llx\n" , x , y);

//A:
x &= 0xff; y &= 0xff;
printf("32 bits : 0x%x , 64 bits : 0x%llx\n" , x , y);
x = 0x87654321;
y = 0xFEDCBA9876543210;
//B:
x = ((x & 0xff) | ~0xff) & ~(x & ~0xff);
y = ((y & 0xff) | ~0xff) & ~(y & ~0xff);
printf("32 bits : 0x%x , 64 bits : 0x%llx\n" , x , y);
x = 0x87654321;
y = 0xFEDCBA9876543210;
//C:
x |= 0xff;
y |= 0xff;
printf("32 bits : 0x%x , 64 bits : 0x%llx\n" , x , y);

return 0;
}
1
2
3
4
32 bits : 0x87654321 , 64 bits : 0xfedcba9876543210
32 bits : 0x21 , 64 bits : 0x10
32 bits : 0x789abc21 , 64 bits : 0x123456789abcd10
32 bits : 0x876543ff , 64 bits : 0xfedcba98765432ff

solution B can be simplified
$$
f = ((x \cdot FF)+\overline{FF}) \cdot \overline{x \cdot \overline {FF}}
$$

$$
f = ((x+\overline{FF}) \cdot (FF + \overline{FF})) \cdot (\bar{x}+FF)
$$

$$
f = (x+\overline{FF})\cdot (\bar{x}+FF)
$$

$$
f = x\cdot \bar{x} + (x\cdot FF)+(\overline{FF}\cdot \bar{x})+FF\cdot\overline{FF}
$$

$$
f = (x\cdot FF)+(\bar{x}\cdot\overline{FF} )=\overline{x\oplus FF}
$$

solution to question B:

1
x ^ ~0xFF
1
2
a ^ 0 = a
a ^ 1 = ~a
  • The expression ~0 will yield a mask of all ones, regardless of the size of the data representation.
  • when your expression become very long, try to simplify it.

Practice Problem 2.13

The Digital Equipment VAX computer was a very popular machine from the late 1970s until the late 1980s. Rather than instructions for Boolean operations and and or, it had instructions bis (bit set) and bic (bit clear). Both instructions take a data word x and a mask word m. They generate a result z consisting of the bits of x modified according to the bits of m. With bis, the modification involves setting z to 1 at each bit position where m is 1. With bic, the modification involves setting z to 0 at each bit position where m is 1.

To see how these operations relate to the C bit-level operations, assume we have functions bis and bic implementing the bit set and bit clear operations, and that we want to use these to implement functions computing bitwise operations | and ^, without using any other C operations. Fill in the missing code below.

Hint: Write C expressions for the operations bis and bic.

1
2
3
4
5
6
7
8
9
10
11
12
13
/* Declarations of functions implementing operations bis and bic */
int bis(int x, int m);
int bic(int x, int m);
/* Compute x|y using only calls to functions bis and bic */
int bool_or(int x, int y) {
int result = ;
return result;
}
/* Compute x^y using only calls to functions bis and bic */
int bool_xor(int x, int y) {
int result = ;
return result;
}

My solution : :white_check_mark:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
#include <stdio.h>
/* Declarations of functions implementing operations bis and bic */
int bis(int x, int m);
int bic(int x, int m);

/* Compute x|y using only calls to functions bis and bic */
int bool_or(int x, int y) {
int result = bis(x , y);
return result;
}

/* Compute x^y using only calls to functions bis and bic */
int bool_xor(int x, int y) {
int result = bis(bic(x, y) , bic(y, x));
return result;
}

int main(){
int a = 0 , b = 0 ;
printf("a|b test : %d\n" , (a|b) == bool_or(a, b) );
printf("a^b test : %d\n" , (a^b) == bool_xor(a, b) );
b = 1;
printf("a|b test : %d\n" , (a|b) == bool_or(a, b) );
printf("a^b test : %d\n" , (a^b) == bool_xor(a, b) );
a = 1;
printf("a|b test : %d\n" , (a|b) == bool_or(a, b) );
printf("a^b test : %d\n" , (a^b) == bool_xor(a, b) );
b = 0;
printf("a|b test : %d\n" , (a|b) == bool_or(a, b) );
printf("a^b test : %d\n" , (a^b) == bool_xor(a, b) );
}
int bis(int x , int m){
return x | m;
}
int bic(int x , int m){
return x & ~m;
}
1
2
3
4
5
6
7
8
a|b test : 1
a^b test : 1
a|b test : 1
a^b test : 1
a|b test : 1
a^b test : 1
a|b test : 1
a^b test : 1

when doing boolean algebra, truth table and sum-of-product expression are very useful!!!

also , don’t forget xor property : x ^ y = (x & ~y) | (~x & y)​


2.1.8 Logical Operations in C

  • The logical operations treat any nonzero argument as representing true and argument 0 as representing false
  • logical operators do not evaluate their second argument if the result of the expression can be determined by evaluating the first argument. Thus, for example, the expression a && 5/a will never cause a division by zero, and the expression p && *p++ will never cause the dereferencing of a null pointer.

Practice Problem 2.14

Suppose that a and b have byte values 0x55 and 0x46, respectively. Fill in the following table indicating the byte values of the different C expressions:

Expression Value
a & b
a | b
~a | ~b
a & !b
a && b
a || b
!a || !b
a && ~b

My solution : :white_check_mark:

Expression Value
a & b 0x44
a | b 0x57
~a | ~b 0xbb
a & !b 0x00
a && b 0x01
a || b 0x01
!a || !b 0x00
a && ~b 0x01

Practice Problem 2.15

Using only bit-level and logical operations, write a C expression that is equivalent to x == y. In other words, it will return 1 when x and y are equal and 0 otherwise.

My solution :

1
2
3
4
5
6
7
8
9
10
11
12
13
#include <stdio.h>
#include <stdbool.h>
bool equal(int a , int b){
return (bool)(!(a ^ b));
}
int main(){
int a = 3242542;
int b = 3213434;
int c = a;
printf("a == b test : %d %d\n" , a == b , equal(a, b));
printf("a == c test : %d %d\n" , a == c , equal(a, c));
return 0;
}

There is NO real reason to use this expression rather than simply writing x == y


2.1.9 Shift Operations in C

Left shift

Left shift is simple and stable.

That is x << k means x is shifted k bits to the left, dropping off the k most significant bits and filling the right end with k zeros.

Right shift
Logical Right shift

A logical right shift fills the left end with k zeros, giving a result $[0,…, 0, x_{w−1}, x_{w−2},…,x_k].$

Arithmetic left shift

An arithmetic right shift fills the left end with k repetitions of the most significant bit, giving a result$[x_{w−1},…,x_{w−1}, x_{w−1}, x_{w−2},…,x_k]$.

  • In C, almost all compiler/machine combinations use arithmetic right shifts for signed data, and many programmers assume this to be the case. For unsigned data, on the other hand, right shifts must be logical.
  • Java has a precise definition of how right shifts should be performed. The expression x >> k shifts x arithmetically by k positions, while x >>> k shifts it logically

Practice Problem 2.16

Fill in the table below showing the effects of the different shift operations on singlebyte quantities. The best way to think about shift operations is to work with binary representations. Convert the initial values to binary, perform the shifts, and then convert back to hexadecimal. Each of the answers should be 8 binary digits or 2 hexadecimal digits.

shift

My solution: :white_check_mark:

Hex Binary Binary Hex Binary Hex Binary Hex
0xd4 [11010100] [01010000] 0x50 [00011010] 0x1a [11111010] 0xfa
0x64 [01100100] [10010000] 0x90 [00001100] 0x0c [00001100] 0x0c
0x72 [01110010] [11001000] 0xc8 [00001110] 0x0e [00001110] 0x0e
0x44 [01000100] [00010000] 0x10 [00001000] 0x08 [00001000] 0x08

What if shift length k is larger than the word width w?

  • The C standards carefully avoid stating what should be done in such a case. On many machines, the shift amount is computed as $k \mod w$
  • Java, on the other hand, specifically requires that shift amounts should be computed in the modular fashion we have shown.

Programming tip : When in doubt about operator precedence, put in parentheses!

2.2 Integer Representations

terminology

2.2.1 Integral Data Types

intergral range

2.2.2 Unsigned Encodings

The most common way you do with binary to interger.
$$
x_{w-1}…x_1x_0 = x_{w-1}\cdot2^{w-1}+…+x_1\cdot2^1+x_0\cdot 2^0
$$

principle: Uniqueness of unsigned encoding

Function $B2U_w$ is a bijection.

The mathematical term bijection refers to a function f that goes two ways: it maps a value x to a value y where y = f (x), but it can also operate in reverse, since for every y, there is a unique value x such that f (x) = y.

2.2.3 Two’s-Complement Encodings

The most common computer representation of signed numbers is known as two’s-complement form. This is defined by interpreting the most significant bit of the word to have negative weight.

  • The most significant bit $x_{w−1}$ is also called the sign bit

  • Range of Two’s complement : $[100….00 , 011..11]$

  • the two’s-complement range is asymmetric: |TMin| = |TMax| + 1; that is, there is no positive counterpart to TMin.

principle: Uniqueness of two’s-complement encoding

Function $B2T_w$ is a bijection.


Practice Problem 2.17

Assuming w = 4, we can assign a numeric value to each possible hexadecimal digit, assuming either an unsigned or a two’s-complement interpretation. Fill in the following table according to these interpretations by writing out the nonzero powers of 2 in the summations shown in Equations 2.1 and 2.3:

Hexadecimal Binary B2U B2T
0xa [1010] $2^3+2^1=10$ $-2^{3}+2^1=-6$
0x1
0xb
0x2
0x7
0xc

My solution: :white_check_mark:

Hexadecimal Binary B2U B2T
0xa [1010] $2^3+2^1=10$ $-2^{3}+2^1=-6$
0x1 [0001] $2^0=1$ $2^0=1$
0xb [1011] $2^3+2^1+2^0=11$ $-2^3+2^1+2^0=-5$
0x2 [0010] $2^1=2$ $2^1=2$
0x7 [0111] $2^2+2^1+2^0=7$ $2^2+2^1+2^0=7$
0xc [1100] $2^3+2^2=12$ $-2^3+2^2=-4$

Programming tips:

  • use int32_t, uint64_t ,…., to get exact size you want.

  • Formatted printing with fixed-width types requires use of macros that expand into format strings in a system-dependent manner.

    So, for example, the values of variables x and y of type int32_t and uint64_t can be printed by the following call to printf: printf("x = %" PRId32 ", y = %" PRIu64 "\n", x, y);

    When compiled as a 64-bit program, macro PRId32 expands to the string “d”, while PRIu64 expands to the pair of strings “l” “u”. When the C preprocessor encounters a sequence of string constants separated only by spaces (or other whitespace characters), it concatenates them together.

    Thus, the above call to printf becomes printf("x = %d, y = %lu\n", x, y); Using the macros ensures that a correct format string will be generated regardless of how the code is compiled.

important numbers

  • The C standards do NOT require signed integers to be represented in two’scomplement form, but nearly all machines do so.

    So if you want max portability, do NOT assume any “typical” ranges beyond the minimum range and do NOT assume signed numbers are encoded by 2’s complement. Use <limit.h> which defines constants INT_MAX, INT_ MIN, and UINT_MAX describing the ranges of signed and unsigned integers.

  • The Java standard is quite specific about integer data type ranges and representations.These detailed requirements are intended to enable Java programs to behave identically regardless of the machines or operating systems running them.


Practice Problem 2.18

In Chapter 3, we will look at listings generated by a disassembler, a program that converts an executable program file back to a more readable ASCII form.

These files contain many hexadecimal numbers, typically representing values in two’scomplement form. Being able to recognize these numbers and understand their significance (for example, whether they are negative or positive) is an important skill.

For the lines labeled A–I (on the right) in the following listing, convert the hexadecimal values (in 32-bit two’s-complement form) shown to the right of the instruction names (sub, mov, and add) into their decimal equivalents:

1
2
3
4
5
6
7
8
9
10
11
12
4004d0: 48 81 ec e0 02 00 00 	sub $0x2e0,%rsp 			A.
4004d7: 48 8b 44 24 a8 mov -0x58(%rsp),%rax B.
4004dc: 48 03 47 28 add 0x28(%rdi),%rax C.
4004e0: 48 89 44 24 d0 mov %rax,-0x30(%rsp) D.
4004e5: 48 8b 44 24 78 mov 0x78(%rsp),%rax E.
4004ea: 48 89 87 88 00 00 00 mov %rax,0x88(%rdi) F.
4004f1: 48 8b 84 24 f8 01 00 mov 0x1f8(%rsp),%rax G.
4004f8: 00
4004f9: 48 03 44 24 08 add 0x8(%rsp),%rax
4004fe: 48 89 84 24 c0 00 00 mov %rax,0xc0(%rsp) H.
400505: 00
400506: 48 8b 44 d4 b8 mov -0x48(%rsp,%rdx,8),%rax I.

My solution: :white_check_mark:

Index hexadecimal decimal
A 0x2e0 736
B -0x58 -88
C 0x28 40
D -0x30 -48
E 0x78 120
F 0x88 136
G 0x1f8 504
H 0xc0 192
I -0x48 -72

2.2.4 Conversions between Signed and Unsigned

the effect of casting is to keep the bit values identical but change how these bits are interpreted.


Practice Problem 2.19

Using the table you filled in when solving Problem 2.17, fill in the following table describing the function $T2U_4$:

x $T2U_4(x)$
-1
-5
-6
-4
1
8

My solution: :white_check_mark:

x $T2U_4(x)$
-1 15
-5 11
-6 10
-4 12
1 1
8 ?
1
2
bitstring.CreationError: 8 is too large a signed integer for a bitstring of length 4. The
allowed range is [-8, 7].

when mapping a signed number to its unsigned counterpart, negative numbers are converted to large positive numbers, while nonnegative numbers remain unchanged.

principle: Conversion from two’s complement to unsigned
$$
\forall x , TMin\le x \le TMax,T2U_w(x) = \begin{cases}
x,&x\ge0\
x+2^w,&x<0
\end{cases}
$$
Derivation:
$$
B2U_w(\vec{x}) - B2T_w(\vec{x}) = x_{w-1}\cdot2^{w-1}-(-x_{w-1}\cdot2^{w-1}) = x_{w-1}\cdot 2^w
$$

$$
T2U_w(x) = B2U_w(T2B_w(x))=x+x_{w-1}\cdot 2^w
$$


Practice Problem 2.20

Explain how Equation 2.5 applies to the entries in the table you generated when solving Problem 2.19.

My solution : :white_check_mark:

if the value is nonnegative, leave it unchanged, otherwise add it with $2^4=16$


Going in the other direction:

principle: Unsigned to two’s-complement conversion
$$
\forall u , 0\le u \le UMax,U2T_w(u) = \begin{cases}
u , & u\le TMax\
u - 2^{w} , & u > TMax
\end{cases}
$$

Summary:

  • Numbers in this range have identical unsigned and two’s-complement representations.
  • For values outside of this range, the conversions either add or subtract $2^w$

2.2.5 Signed versus Unsigned in C

When an operation is performed where one operand is signed and the other is unsigned, C implicitly casts the signed argument to unsigned and performs the operations assuming the numbers are nonnegative.

versus

Practice Problem 2.21

Assuming the expressions are evaluated when executing a 32-bit program on a machine that uses two’s-complement arithmetic, fill in the following table describing the effect of casting and relational operations, in the style of Figure 2.19:

Expression Type Evaluation
-2147483647-1 == 2147483648U
-2147483647-1 < 2147483647
-2147483647-1U < 2147483647
-2147483647-1 < -2147483647
-2147483647-1U < -2147483647

My solution: :white_check_mark:

Expression Type Evaluation
-2147483647-1 == 2147483648U Unsigned 1
-2147483647-1 < 2147483647 Signed 1
-2147483647-1U < 2147483647 Unsigned 0
-2147483647-1 < -2147483647 Signed 1
-2147483647-1U < -2147483647 Unsigned 1

2.2.6 Expanding the Bit Representation of a Number

One common operation is to convert between integers having different word sizes while retaining the same numeric value.

  • When the destination data type is too small to represent the desired value, this may not be possible.
  • Converting from a smaller to a larger data type, however, should always be possible.
Convert an unsigned number to a larger data type:
  • simply add leading zeros to the representation
  • this operation is known as zero extension
Convert a two’s-complement number to a larger data type:
  • adding copies of the most significant bit to the representation
  • this operation is known as sign extension
Why sign extension pad 1 ? How this perseves the value ?

Derivation:

assume we expend by k bits
$$
\forall k \in Z^*,w’=w+k
$$
we only need to prove when k=1 the value holds, then all the situations are the same.
$$
B2T_{w+1}([x_{w−1}, x_{w−1}, x_{w−2},…,x_0]) = -x_{w−1}2^{w}+x_{w−1}2^{w-1}+…+x_02^0\
=-x_{w−1}2^{w-1}+…+x_0\
=B2T_{w}([ x_{w−1}, x_{w−2},…,x_0])
$$
The key property we exploit is that $2^w − 2^{w−1} = 2^{w−1}$


Practice Problem 2.22

Show that each of the following bit vectors is a two’s-complement representation of −4 by applying Equation 2.3:

A. [1100]

B. [11100]

C. [111100]

Observe that the second and third bit vectors can be derived from the first by sign extension

My solution : :white_check_mark:

A : $-2^3+2^2=-4$

B : $-2^4+2^3+2^2=-4$

C : $-2^5+2^4+2^3+2^2=-4$


When converting from short to unsigned, the program first changes the size and then the type. This convention is required by the C standards.


Practice Problem 2.23

Consider the following C functions:

1
2
3
4
5
6
int fun1(unsigned word) {
return (int) ((word << 24) >> 24);
}
int fun2(unsigned word) {
return ((int) word << 24) >> 24;
}

Assume these are executed as a 32-bit program on a machine that uses two’scomplement arithmetic. Assume also that right shifts of signed values are performed arithmetically, while right shifts of unsigned values are performed logically.

A. Fill in the following table showing the effect of these functions for several example arguments. You will find it more convenient to work with a hexadecimal representation. Just remember that hex digits 8 through F have their most significant bits equal to 1.

w fun1(w) fun2(2)
0x00000076
0x87654321
0x000000C9
0xEDCBA987

My solution: :white_check_mark:

w fun1(w) fun2(2)
0x00000076 0x00000076 0x00000076
0x87654321 0x00000021 0x00000021
0x000000C9 0x000000C9 0xFFFFFFC9
0xEDCBA987 0x00000087 0xFFFFFF87

2.2.7 Truncating Numbers

Truncate unsigned number

principle: Truncation of an unsigned number

Let x be the bit vector $[x_{w−1}, x_{w−2},…,x_0]$, and let $x’$  be the result of truncating it to k bits: $x’  = [x_{k−1}, x_{k−2},…,x_0]$. Let $x = B2U_w(\vec{x})$ and $x’ = B2U_k(\vec{x’} )$.

Then $x’ = x \mod 2^k$.

The intuition behind this principle is simply that all of the bits that were truncated have weights of the form $2^i$ , where i ≥ k, and therefore each of these weights reduces to zero under the modulus operation.

Truncate signed number

principle: Truncation of a two’s-complement number

Let x be the bit vector $[x_{w−1}, x_{w−2},…,x_0]$, and let  $x’$ be the result of truncating it to k bits: $x’  = [x_{k−1}, x_{k−2},…,x_0]$. Let $x = B2T_w(\vec{x})$ and $x’ = B2T_k(\vec{x ’})$.

Then $x’ = U2T_k(x \mod 2^k)$.

Practice Problem 2.24

Suppose we truncate a 4-bit value (represented by hex digits 0 through F) to a 3- bit value (represented as hex digits 0 through 7.) Fill in the table below showing the effect of this truncation for some cases, in terms of the unsigned and two’scomplement interpretations of those bit patterns

Unsigned:

Original Truncated
1
3
5
12
14

Two’s complement:

Original Truncated
1
3
5
-4
-2

My solution: :white_check_mark:

Unsigned:

Original Truncated
1 1
3 3
5 5
12 4
14 6

Two’s complement:

Original Truncated
1 1
3 3
5 -3
-4 -4
-2 -2
1
2
3
4
5
ls = [5 , -4 , -2]
for x in ls:
a = ba(int=x,length=4)
truncateda = ba(bin=a.bin[1:])
print("original:" , a.int, "after:" , truncateda.int)
1
2
3
original: 5 after: -3
original: -4 after: -4
original: -2 after: -2

2.2.8 Advice on Signed versus Unsigned

Since the casting takes place without any clear indication in the code, programmers often overlook its effects.Be careful!

Practice Problem 2.25

Consider the following code that attempts to sum the elements of an array a, where the number of elements is given by parameter length:

1
2
3
4
5
6
7
8
9
/* WARNING: This is buggy code */    
float sum_elements(float a[], unsigned length) {
int i;
float result = 0;

for (i = 0; i <= length-1; i++)
result += a[i];
return result;
}

When run with argument length equal to 0, this code should return 0.0. Instead, it encounters a memory error. Explain why this happens. Show how this code can be corrected.

My solution: :white_check_mark:

length is an unsigned number, while i is a signed one. When evaluate i<= length-1 , i will be converted to unsigned.

When length equals 0, length-1 gives the bit string 0b111111....111, then any value in unsigned is less than length-1

  • the comparison always holds

  • expression a[i] will access a forbidden memory area.

1
2
3
4
5
6
7
8
float sum_elements(float a[], unsigned length) {      
unsigned i;
float result = 0;

for (i = 0; i < length; i++)
result += a[i];
return result;
}

Practice Problem 2.26

You are given the assignment of writing a function that determines whether one string is longer than another. You decide to make use of the string library function strlen having the following declaration:

1
2
/* Prototype for library function strlen */
size_t strlen(const char *s);

Here is your first attempt at the function:

1
2
3
4
5
/* Determine whether string s is longer than string t */
/* WARNING: This function is buggy */
int strlonger(char *s, char *t) {
return strlen(s) - strlen(t) > 0;
}

When you test this on some sample data, things do not seem to work quite right. You investigate further and determine that, when compiled as a 32-bit program, data type size_t is defined (viatypedef) in header file stdio.h to be unsigned.

A. For what cases will this function produce an incorrect result?

B. Explain how this incorrect result comes about.

C. Show how to fix the code so that it will work reliably.

My solution: :white_check_mark:

A : if string s doesn’t have the same length with string t, the result will always be true

B : when compare unsigned with 0, the result will always be true

C:

1
2
3
int strlonger(char *s, char *t) {
return strlen(s) > strlen(t);
}

We have seen multiple ways in which the subtle features of unsigned arithmetic, and especially the implicit conversion of signed to unsigned, can lead to errors or vulnerabilities.

  • One way to avoid such bugs is to never use unsigned numbers. (In fact, few languages other than C support unsigned integers.)
  • Unsigned values are very useful when we want to think of words as just collections of bits with no numeric interpretation.
    • when packing a word with flags describing various Boolean conditions.
    • Addresses are naturally unsigned, so systems programmers find unsigned types to be helpful.
    • Unsigned values are also useful when implementing mathematical packages for modular arithmetic and for multiprecision arithmetic, in which numbers are represented by arrays of words.