Ch1

Chapter 1 : A Tour of Computer Systems

Specific implementations of systems change over time, but the underlying concepts do not. All computer systems have similar hardware and software components that perform similar functions.

We begin our study of systems by tracing the lifetime of the hello program

1
2
3
4
5
6
7
8
//hello.c
#include <stdio.h>

int main()
{
printf("hello, world\n");
return 0;
}

1.1 Information Is Bits + Context

  • Our hello program begins life as a source program (or source file) that the programmer creates with an editor and saves in a text file called hello.c.
  • The source program is a sequence of bits, each with a value of 0 or 1, organized in 8-bit chunks called bytes. Each byte represents some text character in the program
  • Files such as hello.c that consist exclusively of ASCII characters are known as text files. All other files are known as binary files.
1
2
3
4
5
6
$ hexdump -C hello.c
00000000 23 69 6e 63 6c 75 64 65 20 3c 73 74 64 69 6f 2e |#include <stdio.|
00000010 68 3e 0a 0a 69 6e 74 20 6d 61 69 6e 28 29 7b 0a |h>..int main(){.|
00000020 20 20 20 20 70 72 69 6e 74 66 28 22 68 65 6c 6c | printf("hell|
00000030 6f 2c 20 77 6f 72 6c 64 5c 6e 22 29 3b 0a 20 20 |o, world\n");. |
00000040 20 20 72 65 74 75 72 6e 20 30 3b 0a 7d 0a | return 0;.}.|
1
2
3
4
5
6
7
8
9
10
11
12
13
14
# dump.py
from prettytable import PrettyTable
from string import printable
tb = PrettyTable()
file = open('hello.c' , "rb")
binary = file.read()
tb.field_names = ['Decimal' , 'Hexadecimal' , 'ASCII decode']
for b in binary:
if b == 10:
tb.add_row([b,hex(b)[2:],'.'])
else:
tb.add_row([b,hex(b)[2:],chr(b)])
print(tb)
file.close()
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
$ py dump.py
+---------+-------------+--------------+
| Decimal | Hexadecimal | ASCII decode |
+---------+-------------+--------------+
| 35 | 23 | # |
| 105 | 69 | i |
| 110 | 6e | n |
| 99 | 63 | c |
| 108 | 6c | l |
| 117 | 75 | u |
| 100 | 64 | d |
| 101 | 65 | e |
| 32 | 20 | |
| 60 | 3c | < |
| 115 | 73 | s |
| 116 | 74 | t |
| 100 | 64 | d |
| 105 | 69 | i |
| 111 | 6f | o |
| 46 | 2e | . |
| 104 | 68 | h |
| 62 | 3e | > |
| 10 | a | . |
| 10 | a | . |
| 105 | 69 | i |
| 110 | 6e | n |
| 116 | 74 | t |
| 32 | 20 | |
| 109 | 6d | m |
| 97 | 61 | a |
| 105 | 69 | i |
| 110 | 6e | n |
| 40 | 28 | ( |
| 41 | 29 | ) |
| 123 | 7b | { |
| 10 | a | . |
| 32 | 20 | |
| 32 | 20 | |
| 32 | 20 | |
| 32 | 20 | |
| 112 | 70 | p |
| 114 | 72 | r |
| 105 | 69 | i |
| 110 | 6e | n |
| 116 | 74 | t |
| 102 | 66 | f |
| 40 | 28 | ( |
| 34 | 22 | " |
| 104 | 68 | h |
| 101 | 65 | e |
| 108 | 6c | l |
| 108 | 6c | l |
| 111 | 6f | o |
| 44 | 2c | , |
| 32 | 20 | |
| 119 | 77 | w |
| 111 | 6f | o |
| 114 | 72 | r |
| 108 | 6c | l |
| 100 | 64 | d |
| 92 | 5c | \ |
| 110 | 6e | n |
| 34 | 22 | " |
| 41 | 29 | ) |
| 59 | 3b | ; |
| 10 | a | . |
| 32 | 20 | |
| 32 | 20 | |
| 32 | 20 | |
| 32 | 20 | |
| 114 | 72 | r |
| 101 | 65 | e |
| 116 | 74 | t |
| 117 | 75 | u |
| 114 | 72 | r |
| 110 | 6e | n |
| 32 | 20 | |
| 48 | 30 | 0 |
| 59 | 3b | ; |
| 10 | a | . |
| 125 | 7d | } |
| 10 | a | . |
+---------+-------------+--------------+

Every thing is binary !!!

The representation of hello.c illustrates a fundamental idea: All information in a system—including disk files, programs stored in memory, user data stored in memory, and data transferred across a network—is represented as a bunch of bits

The only thing that distinguishes different data objects is the context in which we view them.

For example, in different contexts, the same sequence of bytes might represent an integer, floating-point number, character string, or machine instruction.

Binary can never be concise.

As programmers, we need to understand machine representations of numbers because they are not the same as integers and real numbers. They are finite approximations that can behave in unexpected ways.

1.2 Programs Are Translated by Other Programs into Different Forms

In order to run hello.c on the system, the individual C statements must be translated by other programs into a sequence of low-level machine-language instructions. These instructions are then packaged in a form called an executable object program and stored as a binary disk file. Object programs are also referred to as executable object files.

compilation system

Phase 1 : Preprocessing phase

the c preprocessor take care of macros and header files etc.

For example, the #include command in line 1 of hello.c tells the preprocessor to read the contents of the system header file stdio.h and insert it directly into the program text. The result is another C program, typically with the .i suffix.

1
2
3
4
5
6
7
8
9
10
11
12
13
$ man cpp
NAME
cpp - The C Preprocessor
...
$ cpp hello.c -o hello.i

# -E : Stop after the preprocessing stage; do not run the compiler proper.
$ gcc hello.c -E -o h.i

# -s, --report-identical-files
# report when two files are the same
$ diff -s hello.i h.i
Files h.i and hello.i are identical

Phase 2 : Compilation phase

The compiler (cc1) translates the text file hello.i into the text file hello.s, which contains an assembly-language program

1
2
# Stop after the stage of compilation proper; do not assemble. 
$ gcc -S hello.i -o hello.s
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
# hello.s
.file "hello.c"
.text
.section .rodata
.LC0:
.string "hello, world"
.text
.globl main
.type main, @function
main:
.LFB0:
.cfi_startproc
pushq %rbp
.cfi_def_cfa_offset 16
.cfi_offset 6, -16
movq %rsp, %rbp
.cfi_def_cfa_register 6
leaq .LC0(%rip), %rdi
call puts@PLT
movl $0, %eax
popq %rbp
.cfi_def_cfa 7, 8
ret
.cfi_endproc
.LFE0:
.size main, .-main
.ident "GCC: (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0"
.section .note.GNU-stack,"",@progbits

Phase 3 : Assembly phase

Next, the assembler as translates hello.s into machinelanguage instructions, packages them in a form known as a relocatable object program, and stores the result in the object file hello.o. This file is a binary file containing 17 bytes to encode the instructions for function main. If we were to view hello.o with a text editor, it would appear to be gibberish.

1
2
3
4
5
6
7
# gcc -o <target_name> <remainder of command>
# gcc -c <filename>
# -c Compile or assemble the source files, but do not link.
$ gcc -o h.o -c hello.s
$ as hello.s -o hello.o
$ diff -s h.o hello.o
Files hello.o and h.o are identical

Phase 4 : Linking phase

the printf function, which is part of the standard C library provided by every C compiler. The printf function resides in a separate precompiled object file called printf.o, which must somehow be merged with our hello.o program. The linker ld handles this merging. The result is the hello file, which is an executable object file that is ready to be loaded into memory and executed by the system.

GCC is doing much more than most people think on an Ubuntu System. For example, it typically links with ld using something like this:

1
/usr/lib/gcc/i486-linux-gnu/4.1.2/collect2 --eh-frame-hdr -m elf_i386 -dynamic-linker /lib/ld-linux.so.2 -oS /usr/lib/gcc/i486-linux-gnu/4.1.2/../../../../lib/crt1.o /usr/lib/gcc/i486-linux-gnu/4.1.2/../../../../lib/crti.o /usr/lib/gcc/i486-linux-gnu/4.1.2/crtbegin.o -L/usr/lib/gcc/i486-linux-gnu/4.1.2 -L/usr/lib/gcc/i486-linux-gnu/4.1.2 -L/usr/lib/gcc/i486-linux-gnu/4.1.2/../../../../lib -L/lib/../lib -L/usr/lib/../lib S.o -lgcc --as-needed -lgcc_s --no-as-needed -lc -lgcc --as-needed -lgcc_s --no-as-needed /usr/lib/gcc/i486-linux-gnu/4.1.2/crtend.o /usr/lib/gcc/i486-linux-gnu/4.1.2/../../../../lib/crtn.o

You can always compile using the -v (verbose) flag to see everything gcc is doing to your files.

1
2
3
$ gcc -o hello hello.o
$ file hello
hello: ELF 64-bit LSB shared object, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, for GNU/Linux 3.2.0, BuildID[sha1]=2104eb23ebe0786674d413318f5c8f37597cb913, not stripped

1.3 It Pays to Understand How Compilation Systems Work

we can rely on the compilation system to produce correct and efficient machine code.

why programmers need to understand how compilation systems work?

  • Optimizing program performance.

    in order to make good coding decisions in our C programs, we do need a basic understanding of machine-level code and how the compiler translates different C statements into machine code.

    • is a switch statement always more efficient than a sequence of if-else statements?
    • How much overhead is incurred by a function call?
    • Is a while loop more efficient than a for loop?
    • Are pointer references more efficient than array indexes?
    • Why does our loop run so much faster if we sum into a local variable instead of an argument that is passed by reference?
    • ….
  • Understanding link-time errors.

    In our experience, some of the most perplexing programming errors are related to the operation of the linker, especially when you are trying to build large software.

    • what does it mean when the linker reports that it cannot resolve a reference?
    • What is the difference between a static variable and a global variable?
    • What happens if you define two global variables in different C files with the same name?
    • What is the difference between a static library and a dynamic library?
    • Why does it matter what order we list libraries on the command line?
    • why do some linker-related errors not appear until run time?
    • …..
  • Avoiding security holes

    For many years, buffer overflow vulnerabilities have accounted for many of the security holes in network and Internet servers. These vulnerabilities exist because too few programmers understand the need to carefully restrict the quantity and forms of data they accept from untrusted sources. A first step in learning secure programming is to understand the consequences of the way data and control information are stored on the program stack.

1.4 Processors Read and Interpret Instructions Stored in Memory

Shell

  • The shell is a command-line interpreter that prints a prompt, waits for you to type a command line, and then performs the command.

  • If the first word of the command line does not correspond to a built-in shell command, then the shell assumes that it is the name of an executable file that it should load and run.

  • When running loaded program, shell wait for the program to terminate. After the program terminates, the shell regain its control. The shell then prints a prompt and waits for the next input command line.

Hardware Organization of a System

organization

Bus

  • Running throughout the system is a collection of electrical conduits called buses that carry bytes of information back and forth between the components.

  • Buses are typically designed to transfer fixed-size chunks of bytes known as words. The number of bytes in a word (the word size) is a fundamental system parameter that varies across systems

  • Most machines today have word sizes of either 4 bytes (32 bits) or 8 bytes (64 bits).

I/O Devices

  • Input/output (I/O) devices are the system’s connection to the external world

  • Our example system has four I/O devices:

    • Keyboard for user input
    • Mouse for user input
    • Screen for user output
    • Disk for long-term storage
  • Each I/O device is connected to the I/O bus by either a controller or an adapter

    The purpose of each is to transfer information back and forth between the I/O bus and an I/O device

    The distinction between the two is mainly one of packaging.

    • Controllers are chip sets in the device itself or on the system’s main printed circuit board (often called the motherboard)
    • An adapter is a card that plugs into a slot on the motherboard.
  • network is also an external device.

Main Memory

  • The main memory is a temporary storage device that holds both a program and the data it manipulates while the processor is executing the program
  • Physically, main memory consists of a collection of dynamic random access memory (DRAM) chips.
  • Logically, memory is organized as a linear array of bytes, each with its own unique address (array index) starting at zero.

Processor

  • The central processing unit (CPU), or simply processor, is the engine that interprets (or executes) instructions stored in main memory

  • At its core is a word-size storage device (or register) called the program counter (PC).At any point in time, the PC points at (contains the address of) some machine-language instruction in main memory.

    a processor repeatedly executes the instruction pointed at by the program counter and updates the program counter to point to the next instruction.

    • The processor reads the instruction from memory pointed at by the program counter (PC)
    • interprets the bits in the instruction
    • performs some simple operation dictated by the instruction
    • updates the PC to point to the next instruction, which may or may not be contiguous in memory to the instruction that was just executed.
  • There are only a few of these simple operations, and they revolve around : the main memory, the register files, the ALU

    • The register file is a small storage device that consists of a collection of word-size registers, each with its own unique name.
    • The ALU computes new data and address values.
  • some instruction examples

    examples

  • a processor appears to be a simple implementation of its instruction set architecture, but in fact modern processors use far more complex mechanisms to speed up program execution.

  • Thus, we can distinguish the processor’s instruction set architecture(describing the effect of each machine-code instruction), from its microarchitecture (describing how the processor is actually implemented.)

Running the hello Program

We must omit a lot of details here that will be filled in later, but for now we will be content with the big picture

  1. Initially, the shell program is executing its instructions, waiting for us to type a command

  2. As we type the characters ./hello at the keyboard, the shell program reads each one into a register and then stores it in memory

    hello

  3. When we hit the enter key on the keyboard, the shell knows that we have finished typing the command.

  4. The shell then loads the executable hello file by executing a sequence of instructions that copies the code and data in the hello object file from disk to main memory.

    Using a technique known as direct memory access (DMA), the data travel directly from disk to main memory, without passing through the processor.

    DMA

  5. Once the code and data in the hello object file are loaded into memory, the processor begins executing the machine-language instructions in the hello program’s main routine.

  6. These instructions copy the bytes in the hello, world\n string from memory to the register file, and from there to the display device, where they are displayed on the screen

    output

1.5 Caches Matter

  • An important lesson from this simple example is that a system spends a lot of time moving information from one place to another

    Thus, a major goal for system designers is to make these copy operations run as fast as possible.

    (can we just moving data directly ? :confused:)

  • Because of physical laws, larger storage devices are slower than smaller storage devices. And faster devices are more expensive to build than their slower counterparts

  • processor–memory gap : the processor can read data from the register file much faster than from memory.

  • To deal with the processor–memory gap, system designers include smaller, faster storage devices called cache memories (or simply caches) that serve as temporary staging areas for information that the processor is likely to need in the near future.

    cache

  • An L1 cache on the processor chip holds tens of thousands of bytes and can be accessed nearly as fast as the register file.

  • A larger L2 cache with hundreds of thousands to millions of bytes is connected to the processor by a special bus.

  • The L1 and L2 caches are implemented with a hardware technology known as static random access memory (SRAM). Some newer system may have the L3 cache.

  • The idea behind caching is that a system can get the effect of both a very large memory and a very fast one by exploiting locality, the tendency for programs to access data and code in localized regions. By setting up caches to hold data that are likely to be accessed often, we can perform most memory operations using the fast caches.

  • One of the most important lessons in this book is that application programmers who are aware of cache memories can exploit them to improve the performance of their programs by an order of magnitude.(order of magnitude)

1.6 Storage Devices Form a Hierarchy

This notion of inserting a smaller, faster storage device (e.g., cache memory) between the processor and a larger, slower device (e.g., main memory) turns out to be a general idea.

The main idea of a memory hierarchy is that storage at one level serves as a cache for storage at the next lower level.

hierarchy

1.7 The Operating System Manages the Hardware

  • When we run an application program, neither does it access keyboard, display, disk, or main memory directly, they relied on the services provided by the operating system.

    All attempts by an application program to manipulate the hardware must go through the operating system

  • We can think of the operating system as a layer of software interposed between the application program and the hardware

    interpose

  • The operating system has two primary purposes:

    • to protect the hardware from misuse by runaway applications
    • to provide applications with simple and uniform mechanisms for manipulating complicated and often wildly different low-level hardware devices.
  • The operating system achieves both goals via the fundamental abstractions:

    • Files : abstraction for I/O devices.
    • Virtual memory : abstraction for the main memory and disk I/O devices.
    • Processes : abstractions for the processor, main memory, and I/O devices.

1.7.1 Processes

the notion of a process is one of the most important and successful ideas in computer science.

  • A process is the operating system’s abstraction for a running program

  • Multiple processes can run concurrently on the same system, and each process appears to have exclusive use of the hardware.

  • By concurrently, we mean that the instructions of one process are interleaved with the instructions of another process

    In computing, interleaving of data refers to the interspersing of fields or channels of different meaning sequentially in memory, in processor registers, or in file formats. For example, for coordinate data, x0 y0 z0 w0 x1 y1 z1 w1 x2 y2 z2 w2 is interleaved whilst x0 x1 x2 x3 y0 y1 y2 y3 z0 z1 z2 z3 w0 w1 w2 w3 is not.

  • In both uniporcessor machine and mulitprocessor machine, a single CPU can appear to execute multiple processes concurrently by having the processor switch among them. The operating system performs this interleaving with a mechanism known as context switching.

  • The operating system keeps track of all the state information that the process needs in order to run.

    This state, which is known as the context, includes information such as the current values of the PC, the register file, and the contents of main memory

  • When the operating system decides to transfer control from the current process to some new process, it performs a context switch by saving the context of the current process, restoring the context of the new process, and then passing control to the new process. The new process picks up exactly where it left off

    context switch

  • Example

    1. Initially, the shell process is running alone, waiting for input on the command line.
    2. When we ask it to run the hello program, the shell carries out our request by invoking a special function known as a system call that passes control to the operating system.
    3. The operating system saves the shell’s context, creates a new hello process and its context, and then passes control to the new hello process.
    4. After hello terminates, the operating system restores the context of the shell process and passes control back to it, where it waits for the next command-line input.
  • The kernal

    • The kernel is the portion of the operating system code that is always resident in memory

    • When an application program requires some action by the operating system, such as to read or write a file, it executes a special system call instruction, transferring control to the kernel.

      The kernel then performs the requested operation and returns back to the application program.

    • the kernel is not a separate process, it is a collection of code and data structures that the system uses to manage all the processes.

1.7.2 Threads

  • In modern systems a process can actually consist of multiple execution units instead of having a single control flow, called threads, each running in the context of the process and sharing the same code and global data.
  • It is easier to share data between multiple threads than between multiple processes
  • Threads are typically more efficient than processes.
  • Multi-threading is also one way to make programs run faster when multiple processors are available.

1.7.3 Virtual Memory

  • Virtual memory is an abstraction that provides each process with the illusion that it has exclusive use of the main memory.

  • Each process has the same uniform view of memory, which is known as its virtual address space.(which means this view is identical for all process)

    virtual memory space

  • The virtual address space seen by each process consists of a number of well defined areas, each with a specific purpose

    upper

    lower

  • The basic idea is to store the contents of a process’s virtual memory on disk and then use the main memory as a cache for the disk

1.7.4 Files

A file is a sequence of bytes, nothing more and nothing less.

  • Every I/O device, including disks, keyboards, displays, and even networks, is modeled as a file.

  • This simple and elegant notion of a file is nonetheless very powerful because it provides applications with a uniform view of all the varied I/O devices that might be contained in the system.

    For example, application programmers who manipulate the contents of a disk file are blissfully unaware of the specific disk technology. Further, the same program will run on different systems that use different disk technologies

1.8 Systems Communicate with Other Systems Using Networks

From the point of view of an individual system, the network can be viewed as just another I/O device

network

1.9 Important Themes

  • An important idea to take away from this discussion is that a system is more than just hardware. It is a collection of intertwined hardware and systems software that must cooperate in order to achieve the ultimate goal of running application programs.
  • To close out this chapter, we highlight several important concepts that cut across all aspects of computer systems

1.9.1 Amdahl’s Law

The main idea is that when we speed up one part of a system, the effect on the overall system performance depends on both how significant this part was and how much it sped up.

Deduction

Consider a system in which executing some application requires time $T_\text{old}$.

Suppose some part of the system requires a fraction $\alpha$ of this time.

we improve its performance by a factor of $k$
$$
\therefore \text{the component originally required time} = \alpha T_\text{old}
$$

$$
\therefore \text{after improvement required time} = \frac{\alpha T_\text{old}}{k}
$$

$$
\therefore \text{the application new requires time} : T_\text{new} = (1-\alpha)T_\text{old} +\frac{\alpha T_\text{old}}{k}\
=(1-\alpha+\frac{\alpha}{k}) T_\text{old}
$$

$$
\therefore \frac{T_\text{new}}{T_\text{old}} =1-\alpha+\frac{\alpha}{k}
$$

we define that speed up as:
$$
S = \frac{T_\text{old}}{T_\text{new}} =\frac{1}{1-\alpha+\frac{\alpha}{k}}
$$
The major insight of Amdahl’s law— to significantly speed up the entire system, we must improve the speed of a very large fraction of the overall system.

Practice Problem

Suppose you work as a truck driver, and you have been hired to carry a load of potatoes from Boise, Idaho, to Minneapolis, Minnesota, a total distance of 2,500 kilometers. You estimate you can average 100 km/hr driving within the speed limits, requiring a total of 25 hours for the trip

A. You hear on the news that Montana has just abolished its speed limit, which constitutes 1,500 km of the trip. Your truck can travel at 150 km/hr. What will be your speedup for the trip?

B. You can buy a new turbocharger for your truck at www.fasttrucks.com. They stock a variety of models, but the faster you want to go, the more it will cost. How fast must you travel through Montana to get an overall speedup for your trip of 1.67?

My solution:

$$
T_\text{old} = 25, D = 2500
$$

Question A: :white_check_mark:

$$
\alpha = \frac{1500}{D} = \frac{3}{5} = 0.6
$$

$$
k = \frac{150}{100} = \frac{3}{2} = 1.5
$$

$$
S = \frac{1}{1-\alpha + \frac{\alpha}{k}} = \frac{1}{0.4 + 0.4} = 1.25
$$

Question B: :white_check_mark:

assume I need to trave at speed $x$ km/hr
$$
k = \frac{x}{100}
$$

$$
S = 1.67 = \frac{1}{0.4 +\frac{60}{x}}
$$

$$
x \approx 301.807 \approx 300
$$

A car manufacturing company has promised their customers that the next release of a new engine will show a 4× performance improvement. You have been assigned the task of delivering on that promise. You have determined that only 90% of the engine can be improved. How much (i.e., what value of k) would you need to improve this part to meet the overall performance target of the engine?

My solution: :white_check_mark:

assume I need to improve $k$ times of the original one
$$
S = 4 = \frac{1}{0.1 + \frac{0.9}{k}}
$$

$$
k = 6
$$


One interesting special case of Amdahl’s law is to consider the effect of setting k to $\infty$.
$$
S = \frac{1}{1-\alpha}
$$
Therefore, even we can make some part of our system cost no time, we still don’t have a satisfying result if ⍺ is small

1.9.2 Concurrency and Parallelism

Throughout the history of digital computers, two demands have been constant forces in driving improvements: we want them to do more, and we want them to run faster.

  • We use the term concurrency to refer to the general concept of a system with multiple, simultaneous activities

    In computer science, concurrency is the ability of different parts or units of a program, algorithm, or problem to be executed out-of-order or at the same time simultaneously partial order, without affecting the final outcome.

    Concurrency: Interruptability

  • the term parallelism to refer to the use of concurrency to make a system run faster.

    Parallel computing is a type of computation in which many calculations or processes are carried out simultaneously. Large problems can often be divided into smaller ones, which can then be solved at the same time.

    Parallelism is when tasks literally run at the same time, e.g., on a multicore processor.

    Parallelism: Independentability

  • Parallelism can be exploited at multiple levels of abstraction in a computer system.

1
2
3
4
5
6
7
8
9
10
11
12
Concurrency                 	Parallelism
(Single-Core CPU) (Multi-Core CPU)
CPU CPU1 CPU2
___ ___ ___
|th1| |th1| |th2|
| | | | | |
|___|___ | | | |
|th2| | | | |
___|___| | | | |
|th1| | | | |
|___|___ | | | |
|th2| | | | |

Thread-Level Concurrency

  • with process abstraction, we are able to devise systems where multiple programs execute at the same time, leading to concurrency
  • with thread, we can have multiple control flows executing within a single process
uniprocessor system
  • Traditionally, this concurrent execution was only simulated, by having a single computer rapidly switch among its executing processes

    • time-sharing strategy

    • allows a single user to engage in multiple tasks concurrently, such as having a Web browser in one window, a word processor in another, and streaming music playing at the same time.

multiprocessor system
  • When we construct a system consisting of multiple processors all under the control of a single operating system kernel, we have a

    • Multi-core processors have several CPUs (referred to as “cores”) integrated onto a single integrated-circuit chip.

    multicore

    Each core has it unique L1 and L2 cache, and shares the rest of caches such as L3 cache and memory with other cores.

Hyperthreading

sometimes called simultaneous multi-threading

  • is a technique that allows a single CPU to execute multiple flows of control.

  • having multiple copies of some of the CPU hardware, such as PC and registers, while having only single copies of other parts of the hardware, such as ALU

  • It enables the CPU to take better advantage of its processing resources.

    For example, if one thread must wait for some data to be loaded into a cache, the CPU can proceed with the execution of a different thread.

  • the Intel Core i7 processor can have each core executing two threads, and so a four-core system can actually execute eight threads in parallel.

Advantages
  • reduces the need to simulate concurrency when performing multiple tasks.
  • run a single application program faster, but only if that program is expressed in terms of multiple threads that can effectively execute in parallel.

Instruction-Level Parallelism

modern processors can execute multiple instructions at one time, a property known as instruction-level parallelism

  • Processors that can sustain execution rates faster than 1 instruction per cycle are known as superscalar processors

Single-Instruction, Multiple-Data (SIMD) Parallelism

At the lowest level, many modern processors have special hardware that allows a single instruction to cause multiple operations to be performed in parallel, a mode known assingle-instruction, multiple-data (SIMD) parallelism.

  • For example, recent generations of Intel and AMD processors have instructions that can add 8 pairs of single-precision floating-point numbers (C data type float) in parallel
  • These SIMD instructions are provided mostly to speed up applications that process image, sound, and video data.
  • Although some compilers attempt to automatically extract SIMD parallelism from C programs, a more reliable method is to write programs using special vector data types supported in compilers such as gcc.

1.9.3 The Importance of Abstractions in Computer Systems

The use of abstractions is one of the most important concepts in computer science.

  • For example, one aspect of good programming practice is to formulate a simple application program interface (API) for a set of functions that allow programmers to use the code without having to delve into its inner workings

  • the instruction set architecture provides an abstraction of the actual processor hardware.

    • With this abstraction, a machine-code program behaves as if it were executed on a processor that performs just one instruction at a time. The underlying hardware is far more elaborate, executing multiple instructions in parallel, but always in a way that is consistent with the simple, sequential model.
    • By keeping the same execution model, different processor implementations can execute the same machine code while offering a range of cost and performance.
  • the virtual machine, providing an abstraction of the entire computer, including the operating system, the processor, and the programs.

    vm

1.10 Summary

Bibliographic Notes