Representing and Manipulating Information
[TOC]
In isolation, a single bit is not very useful. When we group bits together and apply some interpretation that gives meaning to the different possible bit patterns, however, we can represent the elements of any finite set.
We will mainly focus on three most important representations of numbers:
- Unsigned –> traditional binary notation
- Signed –> 2’s complement
- Floating-point –> base-2 version of scientific notation(IEEE)
Intro
just get some feeling of our dear computer
numbers in computer is finite, so sometimes result will overflow
1
2
3
4
5
6
7
int main(){
printf("200*300*400*500=%d\n" ,200*300*400*500 );
printf("(3.14+1e20)-1e20=%f\n" ,(3.14+1e20)-1e20);
printf("3.14+(1e20-1e20)=%f\n" ,3.14+(1e20-1e20));
return 0;
}1
2
3200*300*400*500=-884901888
(3.14+1e20)-1e20=0.000000
3.14+(1e20-1e20)=3.140000The different mathematical properties of integer versus floating-point arithmetic stem from the difference in how they handle the finiteness of their representations
- integer representations can encode a comparatively small range of values, but do so precisely
- floating-point representations can encode a wide range of values, but only approximately.
A number of computer security vulnerabilities have arisen due to some of the subtleties of computer arithmetic.
Computers use several different binary representations to encode numeric values. You will need to be familiar with these representations as you progress into machine-level programming
programmers need to have a clear understanding of how computer arithmetic relates to the more familiar integer and real arithmetic.
Everything said in this chapter about C also holds for C++. The Java language definition, on the other hand, created a new set of standards for numeric representations and operations.
2.1 Information Storage
Rather than modifying individual bit, most modern computers use blocks of 8 bits, or bytes, as the smallest addressable unit of memory
2.1.1 Hexadecimal Notation
One simple trick for doing the conversion in your head is to memorize the decimal equivalents of hex digits A, C, and F. The hex values B, D, and E can be translated to decimal by computing their values relative to the first three.
If the total number of bits is not a multiple of 4, you should make the leftmost group be the one with fewer than 4 bits, effectively padding the number with leading zeros.
Practice Problem 2.1
Perform the following number conversions:
A. 0x25B9D2 to binary
B. binary 1010111001001001 to hexadecimal
C. 0xA8B3D to binary
D. binary 1100100010110110010110 to hexadecimal
My solution::white_check_mark:
$$
A:(25B9D2)_{16} = (0010 0101 1011 1001 1101 0010)_2
$$
$$
B : (1010 1110 0100 1001)2 = (AE49){16}
$$
$$
C: (A8B3D)_{16} = (1010 1000 1011 0011 1101)_2
$$
$$
D : (11 0010 0010 1101 1001 0110)_2 = (0011 0010 0010 1101 1001 0110)2 = (322D96){16}
$$
(solution on the book is wrong)
When a value x is a power of 2, that is, $x = 2^n$ for some nonnegative integer n, we can readily write x in hexadecimal form by remembering that the binary representation of x is simply 1 followed by n zeros.
The hexadecimal digit 0 represents 4 binary zeros.
1 hexadecimal 0 is equals to $2^4$
So, for n written in the form i + 4j , where 0 ≤ i ≤ 3, we can write x with a leading hex digit of 1 (i = 0), 2 (i = 1), 4 (i = 2), or 8 (i = 3), followed by j hexadecimal 0s.
$$
(2^i)\underbrace{000…000}_\text{j zeros}
$$
Practice Problem 2.2
Fill in the blank entries in the following table, giving the decimal and hexadecimal representations of different powers of 2:
n $2^n$(decimal) $2^n$(hexadecimal) 5 32 0x20 23 32768 0x2000 12 64 0x100
My solution: :white_check_mark:
$$
2^{23} = 2^3\times2^{10} \times 2^{10}= 8 \times 1048576 = 8388608
$$
$$
23 = 3 + 4 \times 5 , 2^{23} = (800000)_{16}
$$
$$
\log_2 (32768) = 15
$$
$$
15 = 3 + 3\times4 , 2^{15} = (8000)_{16}
$$
$$
(2000){16} = 2^{1+3\times4} = 2^{13} = 2^3 \times 2^{10} = (8192){10}
$$
$$
2^{12} = 4 \times 1024 = 4096
$$
$$
12 = 0 + 3\times 4, 2^{12} = (1000)_{16}
$$
$$
\log_2 {64} = 6
$$
$$
6 = 2+1\times 4 , 2^{6} = (40)_{16}
$$
$$
(100)_{16} = 2^{0+2\times4} = 256
$$
n | $2^n$(decimal) | $2^n$(hexadecimal) |
---|---|---|
5 | 32 | 0x20 |
23 | 8388608 | 0x800000 |
15 | 32768 | 0x8000 |
13 | 8192 | 0x2000 |
12 | 4096 | 0x1000 |
6 | 64 | 0x40 |
8 | 256 | 0x100 |
Converting between decimal and hexadecimal representations requires using multiplication or division to handle the general case.
(just the same method used when we convert between binary and decimal)
Practice Problem 2.3
A single byte can be represented by 2 hexadecimal digits.
Fill in the missing entries in the following table, giving the decimal, binary, and hexadecimal values of different byte patterns:
Decimal Binary Hexadecimal 0 0000 0000 0x00 158 76 145 1010 1110 0011 1100 1111 0001 0x75 0xBD 0xF5
My solution: :white_check_mark:
$$
158 = 9 \times 16 + 14(E)
$$
$$
9 = 0 \times 16 + 9 (9)
$$
$$
158 = (9E)_{16}
$$
Decimal | Binary | Hexadecimal |
---|---|---|
0 | 0000 0000 | 0x00 |
158 | 1001 1110 | 0x9E |
76 | 0100 1100 | 0x4C |
145 | 1001 0001 | 0x91 |
174 | 1010 1110 | 0xAE |
60 | 0011 1100 | 0x3C |
241 | 1111 0001 | 0xF1 |
117 | 0111 0101 | 0x75 |
189 | 1011 1101 | 0xBD |
245 | 1111 0101 | 0xF5 |
Practice Problem 2.4
Without converting the numbers to decimal or binary, try to solve the following arithmetic problems, giving the answers in hexadecimal. Hint: Just modify the methods you use for performing decimal addition and subtraction to use base 16.
A. 0x605c + 0x5 =
B. 0x605c − 0x20 =
C. 0x605c + 32 =
D. 0x60fa − 0x605c =
My solution: :white_check_mark:
$$
A : (605C){16} + (5){16} = (6061)_{16}
$$
B. 0x605c − 0x20 = 0x603c
C. 0x605c + 32 = 0x605c + 0x20 = 0x607c
$$
+60FA\
-605C\
\rule{4cm}{0.4pt}\
+009E
$$
D. 0x60fa − 0x605c = 0x9E
just normal arithmetic with base 16
2.1.2 Data Sizes
Every computer has a word size, and a virtual address is encoded by such a word, so word size determine the maximum size of the virtual address space.
For a machine with a w-bit word size, the virtual addresses can range from 0 to $2^w − 1$, giving the program access to at most $2^w$ bytes.
We will therefore refer to programs as being either “32-bit programs” or “64-bit programs,” since the distinction lies in how a program is compiled, rather than the type of machine on which it runs.
- 32-bits program :
linux>gcc -m32 program.c
- 64-bits program :
linux>gcc -m64 program.c
- Nowadays
32-bits
seems to be deprecated and gcc default compile it as64-bits
, the-m32
parameter will result in missing file error on 64 bits machine.
- 32-bits program :
To avoid the vagaries of relying on “typical” sizes and different compiler settings, ISO C99 introduced a class of data types where the data sizes are fixed regardless of compiler and machine settings :
int32_t
andint64_t
,having exactly 4 and 8 bytes, respectively.Using fixed-size integer types is the best way for programmers to have close control over data representations.
One aspect of portability is to make the program insensitive to the exact sizes of the different data types.
For example, many programmers historically assumed that an object declared as type int could be used to store a pointer. This works fine for most 32-bit programs, but it leads to problems for 64-bit programs.
(the size of virtual address space differs!)
(check size by using
sizeof()
before use it may be a good trick.)
2.1.3 Addressing and Byte Ordering
Byte ordering
where the least significant byte comes first—is referred to as little endian.
where the most significant byte comes first—is referred to as big endian.
Example
Many recent microprocessor chips are bi-endian, meaning that they can be configured to operate as either little- or big-endian machines. In practice, however, byte ordering becomes fixed once a particular operating system is chosen.
For example, ARM microprocessors, used in many cell phones, have hardware that can operate in either little- or big-endian mode, but the two most common operating systems for these chips—Android (from Google) and IOS (from Apple)—operate only in little-endian mode.
Just like the egg issue, there is no technological reason to choose one byte ordering convention over the other
Three case when byte ordering matters
code written for networking applications must follow established conventions for byte ordering to make sure the sending machine converts its internal representation to the network standard, while the receiving machine converts the network standard to its internal representation.
when inspecting machine-level programs, especially codes provided by a disassemble, the byte sequences representing integer data are related to endian.
For example
1
4004d3: 01 05 43 0b 20 00 add %eax,0x200b43(%rip)
little endian reads it as
add eax,0x00200b43
, while big endian will reads it asadd eax,0x430b2000
when programs are written that circumvent the normal type system.
In the C language, this can be done using a cast or a union to allow an object to be referenced according to a different data type from which it was created.
Such coding tricks are strongly discouraged for most application programming, but they can be quite useful and even necessary for system-level programming.
Practice Problem 2.5
1 |
|
Consider the following three calls to show_bytes:
1
2
3
4
5 int a = 0x12345678;
byte_pointer ap = (byte_pointer) &a;
show_bytes(ap, 1); /* A. */
show_bytes(ap, 2); /* B. */
show_bytes(ap, 3); /* C. */Indicate the values that will be printed by each call on a little-endian machine and on a big-endian machine:
A. Little endian: ? Big endian: ?
B. Little endian: ? Big endian: ?
C. Little endian: ? Big endian: ?
My solution: :white_check_mark:
A. Little endian: 78 Big endian: 12
B. Little endian: 78 56 Big endian: 12 34
C. Little endian: 78 56 34 Big endian: 12 34 56
Practice Problem 2.6
Using
show_int
andshow_float
, we determine that the integer 2607352 has hexadecimal representation 0x0027C8F8, while the floating-point number 3510593.0 has hexadecimal representation 0x4A1F23E0.A. Write the binary representations of these two hexadecimal values.
B. Shift these two strings relative to one another to maximize the number of matching bits. How many bits match?
C. What parts of the strings do not match?
My solution: :white_check_mark:
A:
$$
(0027C8F8)_{16} = (0000 0000 0010 0111 1100 1000 1111 1000)_2
$$
$$
(4A1F23E0)_{16} = (0100 1010 0001 1111 0010 0011 1110 0000)_2
$$
B:
right shift the second one 2 bits
1 | 00000000001001111100100011111000 |
21 bits match
C:
(according to IEEE the signed bit , exponent bits and the hidden 1 in mantissa do not match)
2.1.4 Representing Strings
A string in C is encoded by an array of characters terminated by the null (having value 0) character.
- ASCII code for decimal digit $x$ happens to be 0x3$x$, and that the terminating byte has the hex representation 0x00
- Text data are more platform independent than binary data. It has nothing to do with byte ordering. It is always a sequence.
- It is someting can be viewed as big endian.
Practice Problem 2.7
What would be printed as a result of the following call to
show_bytes
?
1 const char *m = "mnopqr"; show_bytes((byte_pointer) m, strlen(m));Note that letters ‘a’ through ‘z’ have ASCII codes 0x61 through 0x7A
My solution:
1 | 6d 6e 6f 70 71 72 |
Recall also that the library routine strlen does NOT count the terminating null character
2.1.5 Representing Code
Consider the following C function:
1 | int sum(int x , int y){ |
When compiled on our sample machines, we generate machine code having the following byte representations:
1 | Linux32 55 89 e5 8b 45 0c 03 45 08 c9 c3 |
Different machine types use different and incompatible instructions and encodings. Even identical processors running different operating systems have differences in their coding conventions and hence are not binary compatible.
A fundamental concept of computer systems is that a program, from the perspective of the machine, is simply a sequence of bytes.
2.1.6 Introduction to Boolean Algebra
Practice Problem 2.8
Fill in the following table showing the results of evaluating Boolean operations on bit vectors.
Operation Result a [01001110] b [11100001] ~a ~b a&b a|b a^b
My solution: :white_check_mark:
Operation | Result |
---|---|
a | [01001110] |
b | [11100001] |
~a | [10110001] |
~b | [00011110] |
a&b | [01000000] |
a|b | [11101111] |
a^b | [10101111] |
Practice Problem 2.9
Computers generate color pictures on a video screen or liquid crystal display by mixing three different colors of light: red, green, and blue. Imagine a simple scheme, with three different lights, each of which can be turned on or off, projecting onto a glass screen:
We can then create eight different colors based on the absence (0) or presence (1) of light sources R, G, and B:
Red Green Blue Color 0 0 0 Black 0 0 1 Blue 0 1 0 Green 0 1 1 Cyan 1 0 0 Red 1 0 1 Magenta 1 1 0 Yellow 1 1 1 White Each of these colors can be represented as a bit vector of length 3, and we can apply Boolean operations to them.
A. The complement of a color is formed by turning off the lights that are on and turning on the lights that are off. What would be the complement of each of the eight colors listed above?
B. Describe the effect of applying Boolean operations on the following colors:
Blue | Green =
Yellow & Cyan =
Red ^ Magenta =
My solution: :white_check_mark:
A:
Red | Green | Blue | Color | Complement Color |
---|---|---|---|---|
0 | 0 | 0 | Black | White |
0 | 0 | 1 | Blue | Yellow |
0 | 1 | 0 | Green | Magenta |
0 | 1 | 1 | Cyan | Red |
1 | 0 | 0 | Red | Cyan |
1 | 0 | 1 | Magenta | Green |
1 | 1 | 0 | Yellow | Blue |
1 | 1 | 1 | White | Black |
B:
Blue | Green = 001 | 010 = 011 = Cyan
Yellow & Cyan = 110 & 011 = 010 = Green
Red ^ Magenta = 100 ^ 101 = 001 = Blue
2.1.7 Bit-Level Operations in C
1 | void inplace_swap(int *x, int *y) { |
There is NO performance advantage to this way of swapping; it is merely an intellectual amusement!!!
Step | *x | *y |
---|---|---|
Initial | a | b |
Step 1 | a | a^b |
Step 2 | a^a^b=b | a^b |
Step 3 | b | b^a^b=a |
Practice Problem 2.11
Armed with the function
inplace_swap
from Problem 2.10, you decide to write code that will reverse the elements of an array by swapping elements from opposite ends of the array, working toward the middle. You arrive at the following function:
1
2
3
4
5 void reverse_array(int a[], int cnt) {
int first, last;
for (first = 0, last = cnt-1; first <= last;first++,last--)
inplace_swap(&a[first], &a[last]);
}When you apply your function to an array containing elements 1, 2, 3, and 4, you find the array now has, as expected, elements 4, 3, 2, and 1. When you try it on an array with elements 1, 2, 3, 4, and 5, however, you are surprised to see that the array now has elements 5, 4, 0, 2, and 1. In fact, you discover that the code always works correctly on arrays of even length, but it sets the middle element to 0 whenever the array has odd length.
A. For an array of odd length cnt = 2k + 1, what are the values of variables first and last in the final iteration of function reverse_array?
B. Why does this call to function inplace_swap set the array element to 0?
C. What simple modification to the code for reverse_array would eliminate this problem?
My solution: :white_check_mark:
A: first = last = (2k+1-1)/2 = k
B: because inplace_swap
use ^
to facilitate swapping rather than more reliable int temp
variable, at the first step we set the value to a^a=0
, and then we play with 0
all the time.
C: change first <= last
to first < last
Mask
One common use of bit-level operations is to implement masking operations, where a mask is a bit pattern that indicates a selected set of bits within a word.
As an example, the mask 0xFF (having ones for the least significant 8 bits) indicates the low-order byte of a word. The bit-level operation x & 0xFF yields a value consisting of the least significant byte of x, but with all other bytes set to 0.
Practice Problem 2.12
Write C expressions, in terms of variable
x
, for the following values. Your code should work for any word size w ≥ 8. For reference, we show the result of evaluating the expressions for x = 0x87654321, with w = 32.A. The least significant byte of x, with all other bits set to 0. [0x00000021]
B. All but the least significant byte of x complemented, with the least significant byte left unchanged. [0x789ABC21]
C. The least significant byte set to all ones, and all other bytes of x left unchanged. [0x876543FF]
My solution: :warning:
1 |
|
1 | 32 bits : 0x87654321 , 64 bits : 0xfedcba9876543210 |
solution B can be simplified
$$
f = ((x \cdot FF)+\overline{FF}) \cdot \overline{x \cdot \overline {FF}}
$$
$$
f = ((x+\overline{FF}) \cdot (FF + \overline{FF})) \cdot (\bar{x}+FF)
$$
$$
f = (x+\overline{FF})\cdot (\bar{x}+FF)
$$
$$
f = x\cdot \bar{x} + (x\cdot FF)+(\overline{FF}\cdot \bar{x})+FF\cdot\overline{FF}
$$
$$
f = (x\cdot FF)+(\bar{x}\cdot\overline{FF} )=\overline{x\oplus FF}
$$
solution to question B:
1 | x ^ ~0xFF |
1 | a ^ 0 = a |
- The expression
~0
will yield a mask of all ones, regardless of the size of the data representation. - when your expression become very long, try to simplify it.
Practice Problem 2.13
The Digital Equipment VAX computer was a very popular machine from the late 1970s until the late 1980s. Rather than instructions for Boolean operations and and or, it had instructions bis (bit set) and bic (bit clear). Both instructions take a data word x and a mask word m. They generate a result z consisting of the bits of x modified according to the bits of m. With bis, the modification involves setting z to 1 at each bit position where m is 1. With bic, the modification involves setting z to 0 at each bit position where m is 1.
To see how these operations relate to the C bit-level operations, assume we have functions bis and bic implementing the bit set and bit clear operations, and that we want to use these to implement functions computing bitwise operations | and ^, without using any other C operations. Fill in the missing code below.
Hint: Write C expressions for the operations bis and bic.
1
2
3
4
5
6
7
8
9
10
11
12
13 /* Declarations of functions implementing operations bis and bic */
int bis(int x, int m);
int bic(int x, int m);
/* Compute x|y using only calls to functions bis and bic */
int bool_or(int x, int y) {
int result = ;
return result;
}
/* Compute x^y using only calls to functions bis and bic */
int bool_xor(int x, int y) {
int result = ;
return result;
}
My solution : :white_check_mark:
1 |
|
1 | a|b test : 1 |
when doing boolean algebra, truth table and sum-of-product expression are very useful!!!
also , don’t forget xor property : x ^ y = (x & ~y) | (~x & y)
2.1.8 Logical Operations in C
- The logical operations treat any nonzero argument as representing true and argument 0 as representing false
- logical operators do not evaluate their second argument if the result of the expression can be determined by evaluating the first argument. Thus, for example, the expression
a && 5/a
will never cause a division by zero, and the expressionp && *p++
will never cause the dereferencing of a null pointer.
Practice Problem 2.14
Suppose that a and b have byte values
0x55
and0x46
, respectively. Fill in the following table indicating the byte values of the different C expressions:
Expression Value a & b a | b ~a | ~b a & !b a && b a || b !a || !b a && ~b
My solution : :white_check_mark:
Expression | Value |
---|---|
a & b | 0x44 |
a | b | 0x57 |
~a | ~b | 0xbb |
a & !b | 0x00 |
a && b | 0x01 |
a || b | 0x01 |
!a || !b | 0x00 |
a && ~b | 0x01 |
Practice Problem 2.15
Using only bit-level and logical operations, write a C expression that is equivalent to
x == y
. In other words, it will return 1 when x and y are equal and 0 otherwise.
My solution :
1 |
|
There is NO real reason to use this expression rather than simply writing x == y
2.1.9 Shift Operations in C
Left shift
Left shift is simple and stable.
That is x << k
means x is shifted k bits to the left, dropping off the k most significant bits and filling the right end with k zeros.
Right shift
Logical Right shift
A logical right shift fills the left end with k zeros, giving a result $[0,…, 0, x_{w−1}, x_{w−2},…,x_k].$
Arithmetic left shift
An arithmetic right shift fills the left end with k repetitions of the most significant bit, giving a result$[x_{w−1},…,x_{w−1}, x_{w−1}, x_{w−2},…,x_k]$.
- In C, almost all compiler/machine combinations use arithmetic right shifts for signed data, and many programmers assume this to be the case. For unsigned data, on the other hand, right shifts must be logical.
- Java has a precise definition of how right shifts should be performed. The expression
x >> k
shifts x arithmetically by k positions, whilex >>> k
shifts it logically
Practice Problem 2.16
Fill in the table below showing the effects of the different shift operations on singlebyte quantities. The best way to think about shift operations is to work with binary representations. Convert the initial values to binary, perform the shifts, and then convert back to hexadecimal. Each of the answers should be 8 binary digits or 2 hexadecimal digits.
My solution: :white_check_mark:
Hex | Binary | Binary | Hex | Binary | Hex | Binary | Hex |
---|---|---|---|---|---|---|---|
0xd4 | [11010100] | [01010000] | 0x50 | [00011010] | 0x1a | [11111010] | 0xfa |
0x64 | [01100100] | [10010000] | 0x90 | [00001100] | 0x0c | [00001100] | 0x0c |
0x72 | [01110010] | [11001000] | 0xc8 | [00001110] | 0x0e | [00001110] | 0x0e |
0x44 | [01000100] | [00010000] | 0x10 | [00001000] | 0x08 | [00001000] | 0x08 |
What if shift length k is larger than the word width w?
- The C standards carefully avoid stating what should be done in such a case. On many machines, the shift amount is computed as $k \mod w$
- Java, on the other hand, specifically requires that shift amounts should be computed in the modular fashion we have shown.
Programming tip : When in doubt about operator precedence, put in parentheses!
2.2 Integer Representations
2.2.1 Integral Data Types
2.2.2 Unsigned Encodings
The most common way you do with binary to interger.
$$
x_{w-1}…x_1x_0 = x_{w-1}\cdot2^{w-1}+…+x_1\cdot2^1+x_0\cdot 2^0
$$
principle: Uniqueness of unsigned encoding
Function $B2U_w$ is a bijection.
The mathematical term bijection refers to a function f that goes two ways: it maps a value x to a value y where y = f (x), but it can also operate in reverse, since for every y, there is a unique value x such that f (x) = y.
2.2.3 Two’s-Complement Encodings
The most common computer representation of signed numbers is known as two’s-complement form. This is defined by interpreting the most significant bit of the word to have negative weight.
The most significant bit $x_{w−1}$ is also called the sign bit
Range of Two’s complement : $[100….00 , 011..11]$
the two’s-complement range is asymmetric: |TMin| = |TMax| + 1; that is, there is no positive counterpart to TMin.
principle: Uniqueness of two’s-complement encoding
Function $B2T_w$ is a bijection.
Practice Problem 2.17
Assuming w = 4, we can assign a numeric value to each possible hexadecimal digit, assuming either an unsigned or a two’s-complement interpretation. Fill in the following table according to these interpretations by writing out the nonzero powers of 2 in the summations shown in Equations 2.1 and 2.3:
Hexadecimal Binary B2U B2T 0xa [1010] $2^3+2^1=10$ $-2^{3}+2^1=-6$ 0x1 0xb 0x2 0x7 0xc
My solution: :white_check_mark:
Hexadecimal | Binary | B2U | B2T |
---|---|---|---|
0xa | [1010] | $2^3+2^1=10$ | $-2^{3}+2^1=-6$ |
0x1 | [0001] | $2^0=1$ | $2^0=1$ |
0xb | [1011] | $2^3+2^1+2^0=11$ | $-2^3+2^1+2^0=-5$ |
0x2 | [0010] | $2^1=2$ | $2^1=2$ |
0x7 | [0111] | $2^2+2^1+2^0=7$ | $2^2+2^1+2^0=7$ |
0xc | [1100] | $2^3+2^2=12$ | $-2^3+2^2=-4$ |
Programming tips:
use
int32_t
,uint64_t
,…., to get exact size you want.Formatted printing with fixed-width types requires use of macros that expand into format strings in a system-dependent manner.
So, for example, the values of variables x and y of type
int32_t
anduint64_t
can be printed by the following call to printf:printf("x = %" PRId32 ", y = %" PRIu64 "\n", x, y);
When compiled as a 64-bit program, macro PRId32 expands to the string “d”, while PRIu64 expands to the pair of strings “l” “u”. When the C preprocessor encounters a sequence of string constants separated only by spaces (or other whitespace characters), it concatenates them together.
Thus, the above call to printf becomes
printf("x = %d, y = %lu\n", x, y);
Using the macros ensures that a correct format string will be generated regardless of how the code is compiled.
The C standards do NOT require signed integers to be represented in two’scomplement form, but nearly all machines do so.
So if you want max portability, do NOT assume any “typical” ranges beyond the minimum range and do NOT assume signed numbers are encoded by 2’s complement. Use
<limit.h>
which defines constantsINT_MAX
,INT_ MIN
, andUINT_MAX
describing the ranges of signed and unsigned integers.The Java standard is quite specific about integer data type ranges and representations.These detailed requirements are intended to enable Java programs to behave identically regardless of the machines or operating systems running them.
Practice Problem 2.18
In Chapter 3, we will look at listings generated by a disassembler, a program that converts an executable program file back to a more readable ASCII form.
These files contain many hexadecimal numbers, typically representing values in two’scomplement form. Being able to recognize these numbers and understand their significance (for example, whether they are negative or positive) is an important skill.
For the lines labeled A–I (on the right) in the following listing, convert the hexadecimal values (in 32-bit two’s-complement form) shown to the right of the instruction names (sub, mov, and add) into their decimal equivalents:
1
2
3
4
5
6
7
8
9
10
11
12 4004d0: 48 81 ec e0 02 00 00 sub $0x2e0,%rsp A.
4004d7: 48 8b 44 24 a8 mov -0x58(%rsp),%rax B.
4004dc: 48 03 47 28 add 0x28(%rdi),%rax C.
4004e0: 48 89 44 24 d0 mov %rax,-0x30(%rsp) D.
4004e5: 48 8b 44 24 78 mov 0x78(%rsp),%rax E.
4004ea: 48 89 87 88 00 00 00 mov %rax,0x88(%rdi) F.
4004f1: 48 8b 84 24 f8 01 00 mov 0x1f8(%rsp),%rax G.
4004f8: 00
4004f9: 48 03 44 24 08 add 0x8(%rsp),%rax
4004fe: 48 89 84 24 c0 00 00 mov %rax,0xc0(%rsp) H.
400505: 00
400506: 48 8b 44 d4 b8 mov -0x48(%rsp,%rdx,8),%rax I.
My solution: :white_check_mark:
Index | hexadecimal | decimal |
---|---|---|
A | 0x2e0 | 736 |
B | -0x58 | -88 |
C | 0x28 | 40 |
D | -0x30 | -48 |
E | 0x78 | 120 |
F | 0x88 | 136 |
G | 0x1f8 | 504 |
H | 0xc0 | 192 |
I | -0x48 | -72 |
2.2.4 Conversions between Signed and Unsigned
the effect of casting is to keep the bit values identical but change how these bits are interpreted.
Practice Problem 2.19
Using the table you filled in when solving Problem 2.17, fill in the following table describing the function $T2U_4$:
x $T2U_4(x)$ -1 -5 -6 -4 1 8
My solution: :white_check_mark:
x | $T2U_4(x)$ |
---|---|
-1 | 15 |
-5 | 11 |
-6 | 10 |
-4 | 12 |
1 | 1 |
8 | ? |
1 | bitstring.CreationError: 8 is too large a signed integer for a bitstring of length 4. The |
when mapping a signed number to its unsigned counterpart, negative numbers are converted to large positive numbers, while nonnegative numbers remain unchanged.
principle: Conversion from two’s complement to unsigned
$$
\forall x , TMin\le x \le TMax,T2U_w(x) = \begin{cases}
x,&x\ge0\
x+2^w,&x<0
\end{cases}
$$
Derivation:
$$
B2U_w(\vec{x}) - B2T_w(\vec{x}) = x_{w-1}\cdot2^{w-1}-(-x_{w-1}\cdot2^{w-1}) = x_{w-1}\cdot 2^w
$$$$
T2U_w(x) = B2U_w(T2B_w(x))=x+x_{w-1}\cdot 2^w
$$
Practice Problem 2.20
Explain how Equation 2.5 applies to the entries in the table you generated when solving Problem 2.19.
My solution : :white_check_mark:
if the value is nonnegative, leave it unchanged, otherwise add it with $2^4=16$
Going in the other direction:
principle: Unsigned to two’s-complement conversion
$$
\forall u , 0\le u \le UMax,U2T_w(u) = \begin{cases}
u , & u\le TMax\
u - 2^{w} , & u > TMax
\end{cases}
$$
Summary:
- Numbers in this range have identical unsigned and two’s-complement representations.
- For values outside of this range, the conversions either add or subtract $2^w$
2.2.5 Signed versus Unsigned in C
When an operation is performed where one operand is signed and the other is unsigned, C implicitly casts the signed argument to unsigned and performs the operations assuming the numbers are nonnegative.
Practice Problem 2.21
Assuming the expressions are evaluated when executing a 32-bit program on a machine that uses two’s-complement arithmetic, fill in the following table describing the effect of casting and relational operations, in the style of Figure 2.19:
Expression Type Evaluation -2147483647-1 == 2147483648U -2147483647-1 < 2147483647 -2147483647-1U < 2147483647 -2147483647-1 < -2147483647 -2147483647-1U < -2147483647
My solution: :white_check_mark:
Expression | Type | Evaluation |
---|---|---|
-2147483647-1 == 2147483648U | Unsigned | 1 |
-2147483647-1 < 2147483647 | Signed | 1 |
-2147483647-1U < 2147483647 | Unsigned | 0 |
-2147483647-1 < -2147483647 | Signed | 1 |
-2147483647-1U < -2147483647 | Unsigned | 1 |
2.2.6 Expanding the Bit Representation of a Number
One common operation is to convert between integers having different word sizes while retaining the same numeric value.
- When the destination data type is too small to represent the desired value, this may not be possible.
- Converting from a smaller to a larger data type, however, should always be possible.
Convert an unsigned number to a larger data type:
- simply add leading zeros to the representation
- this operation is known as zero extension
Convert a two’s-complement number to a larger data type:
- adding copies of the most significant bit to the representation
- this operation is known as sign extension
Why sign extension pad 1 ? How this perseves the value ?
Derivation:
assume we expend by k bits
$$
\forall k \in Z^*,w’=w+k
$$
we only need to prove when k=1 the value holds, then all the situations are the same.
$$
B2T_{w+1}([x_{w−1}, x_{w−1}, x_{w−2},…,x_0]) = -x_{w−1}2^{w}+x_{w−1}2^{w-1}+…+x_02^0\
=-x_{w−1}2^{w-1}+…+x_0\
=B2T_{w}([ x_{w−1}, x_{w−2},…,x_0])
$$
The key property we exploit is that $2^w − 2^{w−1} = 2^{w−1}$
Practice Problem 2.22
Show that each of the following bit vectors is a two’s-complement representation of −4 by applying Equation 2.3:
A. [1100]
B. [11100]
C. [111100]
Observe that the second and third bit vectors can be derived from the first by sign extension
My solution : :white_check_mark:
A : $-2^3+2^2=-4$
B : $-2^4+2^3+2^2=-4$
C : $-2^5+2^4+2^3+2^2=-4$
When converting from short
to unsigned
, the program first changes the size and then the type. This convention is required by the C standards.
Practice Problem 2.23
Consider the following C functions:
1
2
3
4
5
6 int fun1(unsigned word) {
return (int) ((word << 24) >> 24);
}
int fun2(unsigned word) {
return ((int) word << 24) >> 24;
}Assume these are executed as a 32-bit program on a machine that uses two’scomplement arithmetic. Assume also that right shifts of signed values are performed arithmetically, while right shifts of unsigned values are performed logically.
A. Fill in the following table showing the effect of these functions for several example arguments. You will find it more convenient to work with a hexadecimal representation. Just remember that hex digits 8 through F have their most significant bits equal to 1.
w fun1(w) fun2(2) 0x00000076 0x87654321 0x000000C9 0xEDCBA987
My solution: :white_check_mark:
w | fun1(w) | fun2(2) |
---|---|---|
0x00000076 | 0x00000076 | 0x00000076 |
0x87654321 | 0x00000021 | 0x00000021 |
0x000000C9 | 0x000000C9 | 0xFFFFFFC9 |
0xEDCBA987 | 0x00000087 | 0xFFFFFF87 |
2.2.7 Truncating Numbers
Truncate unsigned number
principle: Truncation of an unsigned number
Let x be the bit vector $[x_{w−1}, x_{w−2},…,x_0]$, and let $x’$ be the result of truncating it to k bits: $x’ = [x_{k−1}, x_{k−2},…,x_0]$. Let $x = B2U_w(\vec{x})$ and $x’ = B2U_k(\vec{x’})$.
Then $x’ = x \mod 2^k$.
The intuition behind this principle is simply that all of the bits that were truncated have weights of the form $2^i$ , where i ≥ k, and therefore each of these weights reduces to zero under the modulus operation.
Truncate signed number
principle: Truncation of a two’s-complement number
Let x be the bit vector $[x_{w−1}, x_{w−2},…,x_0]$, and let $x’$ be the result of truncating it to k bits: $x’ = [x_{k−1}, x_{k−2},…,x_0]$. Let $x = B2T_w(\vec{x})$ and $x’ = B2T_k(\vec{x’})$.
Then $x’ = U2T_k(x \mod 2^k)$.
Practice Problem 2.24
Suppose we truncate a 4-bit value (represented by hex digits 0 through F) to a 3- bit value (represented as hex digits 0 through 7.) Fill in the table below showing the effect of this truncation for some cases, in terms of the unsigned and two’scomplement interpretations of those bit patterns
Unsigned:
Original Truncated 1 3 5 12 14 Two’s complement:
Original Truncated 1 3 5 -4 -2
My solution: :white_check_mark:
Unsigned:
Original | Truncated |
---|---|
1 | 1 |
3 | 3 |
5 | 5 |
12 | 4 |
14 | 6 |
Two’s complement:
Original | Truncated |
---|---|
1 | 1 |
3 | 3 |
5 | -3 |
-4 | -4 |
-2 | -2 |
1 | ls = [5 , -4 , -2] |
1 | original: 5 after: -3 |
2.2.8 Advice on Signed versus Unsigned
Since the casting takes place without any clear indication in the code, programmers often overlook its effects.Be careful!
Practice Problem 2.25
Consider the following code that attempts to sum the elements of an array
a
, where the number of elements is given by parameter length:
1
2
3
4
5
6
7
8
9 /* WARNING: This is buggy code */
float sum_elements(float a[], unsigned length) {
int i;
float result = 0;
for (i = 0; i <= length-1; i++)
result += a[i];
return result;
}When run with argument length equal to 0, this code should return 0.0. Instead, it encounters a memory error. Explain why this happens. Show how this code can be corrected.
My solution: :white_check_mark:
length
is an unsigned number, while i
is a signed one. When evaluate i<= length-1
, i
will be converted to unsigned
.
When length
equals 0, length-1
gives the bit string 0b111111....111
, then any value in unsigned
is less than length-1
the comparison always holds
expression
a[i]
will access a forbidden memory area.
1 | float sum_elements(float a[], unsigned length) { |
Practice Problem 2.26
You are given the assignment of writing a function that determines whether one string is longer than another. You decide to make use of the string library function
strlen
having the following declaration:
1
2 /* Prototype for library function strlen */
size_t strlen(const char *s);Here is your first attempt at the function:
1
2
3
4
5 /* Determine whether string s is longer than string t */
/* WARNING: This function is buggy */
int strlonger(char *s, char *t) {
return strlen(s) - strlen(t) > 0;
}When you test this on some sample data, things do not seem to work quite right. You investigate further and determine that, when compiled as a 32-bit program, data type
size_t
is defined (viatypedef
) in header filestdio.h
to beunsigned
.A. For what cases will this function produce an incorrect result?
B. Explain how this incorrect result comes about.
C. Show how to fix the code so that it will work reliably.
My solution: :white_check_mark:
A : if string s doesn’t have the same length with string t, the result will always be true
B : when compare unsigned
with 0
, the result will always be true
C:
1 | int strlonger(char *s, char *t) { |
We have seen multiple ways in which the subtle features of unsigned arithmetic, and especially the implicit conversion of signed to unsigned, can lead to errors or vulnerabilities.
- One way to avoid such bugs is to never use unsigned numbers. (In fact, few languages other than C support unsigned integers.)
- Unsigned values are very useful when we want to think of words as just collections of bits with no numeric interpretation.
- when packing a word with flags describing various Boolean conditions.
- Addresses are naturally unsigned, so systems programmers find unsigned types to be helpful.
- Unsigned values are also useful when implementing mathematical packages for modular arithmetic and for multiprecision arithmetic, in which numbers are represented by arrays of words.