2021-09-19

csapp

Ch7

Chapter 7 Linking

Linking is the process of collecting and combining various pieces of code and data into a single file that can be loaded (copied) into memory and executed.

Linking can be performed

At compile time, when the source code is translated into machine code
At load time, when the program is loaded into memory and executed by the loader
At run time, by application programs

Linkers play a crucial role in software development because they enable separate compilation. We can saperate a hugh software into different modules. When we change one of these modules, we simply recompile it and relink the application, without having to recompile the other files.

Why bother learning about linking?

Understanding linkers will help you build large programs
Understanding linkers will help you avoid dangerous programming errors
Understanding linking will help you understand how language scoping rules are implemented.
Understanding linking will help you understand other important systems concepts
Understanding linking will enable you to exploit shared libraries

This chapter provides a thorough discussion of all aspects of linking, from traditional static linking, to dynamic linking of shared libraries at load time, to dynamic linking of shared libraries at run time.

7.1 Compiler Drivers

Most compilation systems provide a compiler driver that invokes the language preprocessor, compiler, assembler, and linker, as needed on behalf of the user

Let’s look at a concrete example

//source of main.c
int sum(int *a, int n);

int array[2] = {1, 2};

int main()
{
int val = sum(array, 2);
return val;
}

//source of sum.c
int sum(int *a, int n)
{
int i, s = 0;

for (i = 0; i < n; i++) {
s += a[i];
}
return s;
}

To compile these two C source file into a executable file, we will simply use gcc like this:

1
2
3

$ gcc -Og -o program main.c sum.c
$ file program
program: ELF 64-bit LSB shared object, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, for GNU/Linux 3.2.0, BuildID[sha1]=dd27fca66b73407650207acb585c98bb58ea4ed2, not stripped

In fact a lot of things happened silently, let’s dig deeper

1 2	$ gcc -Og -o program main.c sum.c --verbose 2>compilelog $ less compilelog

static link

(In some version of gcc, the cpp is integrated into the compile driver)

1
2

$ /usr/lib/gcc/x86_64-linux-gnu/7/cc1 -quiet -v -imultiarch x86_64-linux-gnu main.c -quiet -dumpbase main.c     -mtune=generic -march=x86-64 -auxbase main -Og -version -fstack-protector-strong -Wformat -Wformat-security -o /tmp/cc4yMXEf.s
# compile main.c to /tmp/cc4yMXEf.s

1 2	$ as -v --64 -o /tmp/ccJm9qgp.o /tmp/cc4yMXEf.s # assemble /tmp/cc4yMXEf.s to /tmp/ccJm9qgp.o

1
2

$ /usr/lib/gcc/x86_64-linux-gnu/7/cc1 -quiet -v -imultiarch x86_64-linux-gnu sum.c -quiet -dumpbase sum.c -mtune=generic -march=x86-64 -auxbase sum -Og -version -fstack-protector-strong -Wformat -Wformat-security  -o /tmp/cc4yMXEf.s
# compile sum.c to /tmp/cc4yMXEf.s

1 2	$ as -v --64 -o /tmp/ccCtbZSy.o /tmp/cc4yMXEf.s # assemble /tmp/cc4yMXEf.s to /tmp/ccCtbZSy.o

$ /usr/lib/gcc/x86_64-linux-gnu/7/collect2 
-plugin /usr/lib/gcc/x86_64-linux-gnu/7/liblto_plugin.so 
-plugin-opt=/usr/lib/gcc/x86_64-linux-gnu/7/lto-wrapper 
-plugin-opt=-fresolution=/tmp/ccFLjPvI.res 
-plugin-opt=-pass-through=-lgcc 
-plugin-opt=-pass-through=-lgcc_s 
-plugin-opt=-pass-through=-lc 
-plugin-opt=-pass-through=-lgcc 
-plugin-opt=-pass-through=-lgcc_s 
--build-id 
--eh-frame-hdr 
-m elf_x86_64 
--hash-style=gnu 
--as-needed 
-dynamic-linker /lib64/ld-linux-x86-64.so.2 
-pie 
-z now 
-z relro 
-o program 
/usr/lib/gcc/x86_64-linux-gnu/7/../../../x86_64-linux-gnu/Scrt1.o 
/usr/lib/gcc/x86_64-linux-gnu/7/../../../x86_64-linux-gnu/crti.o 
/usr/lib/gcc/x86_64-linux-gnu/7/crtbeginS.o 
-L/usr/lib/gcc/x86_64-linux-gnu/7 
-L/usr/lib/gcc/x86_64-linux-gnu/7/../../../x86_64-linux-gnu 
-L/usr/lib/gcc/x86_64-linux-gnu/7/../../../../lib 
-L/lib/x86_64-linux-gnu 
-L/lib/../lib 
-L/usr/lib/x86_64-linux-gnu 
-L/usr/lib/../lib 
-L/usr/lib/gcc/x86_64-linux-gnu/7/../../.. 
/tmp/ccJm9qgp.o 
/tmp/ccCtbZSy.o 
-lgcc 
--push-state 
--as-needed 
-lgcc_s 
--pop-state 
-lc 
-lgcc 
--push-state 
--as-needed 
-lgcc_s 
--pop-state 
/usr/lib/gcc/x86_64-linux-gnu/7/crtendS.o 
/usr/lib/gcc/x86_64-linux-gnu/7/../../../x86_64-linux-gnu/crtn.o
# link all the stuff

collect2 error

what is collect2

7.2 Static Linking

Static linkers such as the Linux ld program take as input a collection of relocatable object files and command-line arguments and generate as output a fully linked executable object file that can be loaded and run.

To build the executable, the linker must perform two main tasks:

Symbol resolution

Object files define and reference symbols, where each symbol corresponds to a function, a global variable, or a static variable.

The purpose of symbol resolution is to associate each symbol reference with exactly one symbol definition
Relocation

Compilers and assemblers generate code and data sections that start at address 0

The linker relocates these sections by associating a memory location with each symbol definition, and then modifying all of the references to those symbols so that they point to this memory location.

The linker blindly performs these relocations using detailed instructions, generated by the assembler, called relocation entries.

7.3 Object Files

Object files come in three forms:

Relocatable object file.

Contains binary code and data in a form that can be combined with other relocatable object files at compile time to create an executable object file.
Executable object file

Contains binary code and data in a form that can be copied directly into memory and executed
Shared object file

A special type of relocatable object file that can be loaded into memory and linked dynamically, at either load time or run time

Compilers and assemblers generate relocatable object files

Linkers generate executable object files.

Object files are organized according to specific object file formats, which vary from system to system.

The first Unix systems from Bell Labs used the a.out format.
Windows uses the Portable Executable PE format
Mac OS-X uses the Mach-O format.
Modern x86-64 Linux and Unix systems use Executable and Linkable Format ELF.

7.4 Relocatable Object Files

elf format

The ELF header begins with a 16-byte sequence that describes the word size and byte ordering of the system that generated the file

# -h stand for --file-header
$ readelf -h sum.o
ELF Header:
  Magic:   7f 45 4c 46 02 01 01 00 00 00 00 00 00 00 00 00 
  Class:                             ELF64
  Data:                              2's complement, little endian
  Version:                           1 (current)
  OS/ABI:                            UNIX - System V
  ABI Version:                       0
  Type:                              REL (Relocatable file)
  Machine:                           Advanced Micro Devices X86-64
  Version:                           0x1
  Entry point address:               0x0
  Start of program headers:          0 (bytes into file)
  Start of section headers:          528 (bytes into file)
  Flags:                             0x0
  Size of this header:               64 (bytes)
  Size of program headers:           0 (bytes)
  Number of program headers:         0
  Size of section headers:           64 (bytes)
  Number of section headers:         11
  Section header string table index: 10

The rest of the ELF header contains information that allows a linker to parse and interpret the object file.

size of the ELF header
the object file type (e.g., relocatable, executable, or shared)
the machine type (e.g., x86-64)
….

The locations and sizes of the various sections are described by the section header table, which contains a fixed-size entry for each section in the object file.

# -S stand for --section-headers
$ readelf -S sum.o
There are 11 section headers, starting at offset 0x210:

Section Headers:
  [Nr] Name              Type             Address           Offset
       Size              EntSize          Flags  Link  Info  Align
  [ 0]                   NULL             0000000000000000  00000000
       0000000000000000  0000000000000000           0     0     0
  [ 1] .text             PROGBITS         0000000000000000  00000040
       000000000000001b  0000000000000000  AX       0     0     1
  [ 2] .data             PROGBITS         0000000000000000  0000005b
       0000000000000000  0000000000000000  WA       0     0     1
  [ 3] .bss              NOBITS           0000000000000000  0000005b
       0000000000000000  0000000000000000  WA       0     0     1
  [ 4] .comment          PROGBITS         0000000000000000  0000005b
       000000000000002a  0000000000000001  MS       0     0     1
  [ 5] .note.GNU-stack   PROGBITS         0000000000000000  00000085
       0000000000000000  0000000000000000           0     0     1
  [ 6] .eh_frame         PROGBITS         0000000000000000  00000088
       0000000000000030  0000000000000000   A       0     0     8
  [ 7] .rela.eh_frame    RELA             0000000000000000  000001a0
       0000000000000018  0000000000000018   I       8     6     8
  [ 8] .symtab           SYMTAB           0000000000000000  000000b8
       00000000000000d8  0000000000000018           9     8     8
  [ 9] .strtab           STRTAB           0000000000000000  00000190
       000000000000000b  0000000000000000           0     0     1
  [10] .shstrtab         STRTAB           0000000000000000  000001b8
       0000000000000054  0000000000000000           0     0     1
Key to Flags:
  W (write), A (alloc), X (execute), M (merge), S (strings), I (info),
  L (link order), O (extra OS processing required), G (group), T (TLS),
  C (compressed), x (unknown), o (OS specific), E (exclude),
  l (large), p (processor specific)

A typical ELF relocatable object file contains the following sections:

.text The machine code of the compiled program.
.rodata Read-only data such as the format strings in printf statements, and jump tables for switch statements.
.data ==Initialized== global and static C variables. Local C variables are maintained at run time on the stack and do not appear in either the .data or .bss sections.
.bss Uninitialized global and static C variables, along with any global or static variables that are initialized to zero

This section occupies no actual space in the object file, uninitialized variables do not have to occupy any actual disk space in the object file. At run time, these variables are allocated in memory with an initial value of zero.

.symtab A symbol table with information about functions, static local variables and global variables(NO nonstatic local variables) that are defined and referenced in the program.

Every relocatable object file has a symbol table in .symtab, unless the programmer has specifically removed it with the strip command.

# -s stand for --symbols
$ readelf -s sum.o

Symbol table '.symtab' contains 9 entries:
   Num:    Value          Size Type    Bind   Vis      Ndx Name
     0: 0000000000000000     0 NOTYPE  LOCAL  DEFAULT  UND 
     1: 0000000000000000     0 FILE    LOCAL  DEFAULT  ABS sum.c
     2: 0000000000000000     0 SECTION LOCAL  DEFAULT    1 
     3: 0000000000000000     0 SECTION LOCAL  DEFAULT    2 
     4: 0000000000000000     0 SECTION LOCAL  DEFAULT    3 
     5: 0000000000000000     0 SECTION LOCAL  DEFAULT    5 
     6: 0000000000000000     0 SECTION LOCAL  DEFAULT    6 
     7: 0000000000000000     0 SECTION LOCAL  DEFAULT    4 
     8: 0000000000000000    27 FUNC    GLOBAL DEFAULT    1 sum

# -R stand for remove
$ strip -R symtab  -o stripped_sum.o sum.o
$ readelf -s stripped_sum.o
$ readelf -S stripped_sum.o
Section Headers:
  [Nr] Name              Type             Address           Offset
       Size              EntSize          Flags  Link  Info  Align
  [ 0]                   NULL             0000000000000000  00000000
       0000000000000000  0000000000000000           0     0     0
  [ 1] .text             PROGBITS         0000000000000000  00000040
       000000000000001b  0000000000000000  AX       0     0     1
  [ 2] .data             PROGBITS         0000000000000000  0000005b
       0000000000000000  0000000000000000  WA       0     0     1
  [ 3] .bss              NOBITS           0000000000000000  0000005b
       0000000000000000  0000000000000000  WA       0     0     1
  [ 4] .comment          PROGBITS         0000000000000000  0000005b
       000000000000002a  0000000000000001  MS       0     0     1
  [ 5] .note.GNU-stack   PROGBITS         0000000000000000  00000085
       0000000000000000  0000000000000000           0     0     1
  [ 6] .eh_frame         PROGBITS         0000000000000000  00000088
       0000000000000030  0000000000000000   A       0     0     8
  [ 7] .shstrtab         STRTAB           0000000000000000  000000b8
       000000000000003f  0000000000000000           0     0     1

.rel.text A list of locations in the .text section that will need to be modified when the linker combines this object file with others.

In general, any instruction that calls an external function or references a global variable will need to be modified.

# -r stand for --relocs
$ readelf -r sum.o

Relocation section '.rela.eh_frame' at offset 0x1a0 contains 1 entry:
  Offset          Info           Type           Sym. Value    Sym. Name + Addend
000000000020  000200000002 R_X86_64_PC32     0000000000000000 .text + 0

.rel.data Relocation information for any global variables that are referenced or defined by the module. In general, any initialized global variable whose initial value is the address of a global variable or externally defined function will need to be modified.
.debug A debugging symbol table with entries for local variables and typedefs defined in the program, global variables defined and referenced in the program, and the original C source file.

.lineA mapping between line numbers in the original C source program and machine code instructions in the .text section

They are only present if the compiler driver is invoked with the -g option.
.strtabThis section holds strings, most commonly the strings that represent the names associated with symbol table entries. A string table is a sequence of null-terminated character strings.

If you want to know more about ELF : man elf

The use of the term .bss to denote uninitialized data is universal. It was originally an acronym for the “block started by symbol” directive from the IBM 704 assembly language (circa 1957) and the acronym has stuck. A simple way to remember the difference between the .data and .bss sections is to think of “bss” as an abbreviation for “Better Save Space!”

7.5 Symbols and Symbol Tables

Each relocatable object module, m, has a symbol table that contains information about the symbols that are defined and referenced by m.

In the context of a linker, there are three different kinds of symbols:

Global symbols that are defined by module m and that can be referenced by other modules.

Global linker symbols correspond to nonstatic C functions and global variables
Global symbols that are referenced by module m but defined by some other module.

Such symbols are called externals and correspond to nonstatic C functions and global variables that are defined in other modules.
Local symbolsthat are defined and referenced exclusively by module m. These correspond to static C functions and global variables that are defined with the static attribute. These symbols are visible anywhere within module m, but cannot be referenced by other modules.

For example

//source of fun.c
int globalvariable = 100;

void f(){
    static int x = 10;
    static int staticlocal = 5;
    int nonstaticlocalinf = 50;
}

void g(){
    static int x = 90;
    int nonstaticlocaling = 40;
}

$ readelf -s fun.o

Symbol table '.symtab' contains 14 entries:
   Num:    Value          Size Type    Bind   Vis      Ndx Name
     0: 0000000000000000     0 NOTYPE  LOCAL  DEFAULT  UND 
     1: 0000000000000000     0 FILE    LOCAL  DEFAULT  ABS fun.c
     2: 0000000000000000     0 SECTION LOCAL  DEFAULT    1 
     3: 0000000000000000     0 SECTION LOCAL  DEFAULT    2 
     4: 0000000000000000     0 SECTION LOCAL  DEFAULT    3 
     5: 0000000000000004     4 OBJECT  LOCAL  DEFAULT    2 staticlocal.1796
     6: 0000000000000008     4 OBJECT  LOCAL  DEFAULT    2 x.1795
     7: 000000000000000c     4 OBJECT  LOCAL  DEFAULT    2 x.1800
     8: 0000000000000000     0 SECTION LOCAL  DEFAULT    5 
     9: 0000000000000000     0 SECTION LOCAL  DEFAULT    6 
    10: 0000000000000000     0 SECTION LOCAL  DEFAULT    4 
    11: 0000000000000000     4 OBJECT  GLOBAL DEFAULT    2 globalvariable
    12: 0000000000000000    14 FUNC    GLOBAL DEFAULT    1 f
    13: 000000000000000e    14 FUNC    GLOBAL DEFAULT    1 g

Notice that each static variable is given a unique name, such as x.1795 and x.1800, they are different variables!

C programmers use the static attribute to hide variable and function declarations inside modules, much as you would use public and private declarations in Java and C++. In C, source files play the role of modules. Any global variable or function declared with the static attribute is private to that module. Similarly, any global variable or function declared without the static attribute is public and can be accessed by any other module. It is good programming practice to protect your variables and functions with the static attribute wherever possible.

An ELF symbol table is contained in the .symtab section. It contains an array of entries

typedef struct {
int name; /* String table offset */
char type:4, /* Function or data (4 bits) */
binding:4; /* Local or global (4 bits) */
char reserved; /* Unused */
short section; /* Section header index */
long value; /* Section offset or absolute address, the run-time address for every reference */
long size; /* Object size in bytes */
} Elf64_Symbol;

The name is a byte offset into the string table that points to the null-terminated string name of the symbol.

(For relocatable modules, the value is an offset from the beginning of the section where the object is defined. )

$ readelf -p .strtab fun.o

String dump of section '.strtab':
  [     1]  fun.c
  [     7]  staticlocal.1796
  [    18]  x.1795
  [    1f]  x.1800
  [    26]  globalvariable
  [    35]  f
  [    37]  g
$ readelf -x .symtab fun.o
.....
0x00000130 0e000000 00000000 37000000 12000100 
0x00000140 0e000000 00000000 0e000000 00000000 
.....
# beware that data is in little-endian mode
# int name = 0x00000037; (g)
# char typebinding = 0b00010010; (FUNC GLOBAL)
# char reserved = 0x00;
# short section = 0x0001; (in .text section)
# long value = 0x000000000000000e; (000000000000000e)
# long value = 0x000000000000000e; (14 bytes)

Each symbol is assigned to some section of the object file, denoted by the section field, which is an index into the section header table

There are three special pseudosections that don’t have entries in the section header table:

ABS(absolute) is for symbols that should not be relocated.
UNDEF is for undefined symbols—that is, symbols that are referenced in this object module but defined elsewhere.
COMMON is for uninitialized data objects that are not yet allocated

For COMMON symbols, the value field gives the alignment requirement, and size gives the minimum size.

Note that these pseudosections exist only in relocatable object files; they do not exist in executable object files.

The distinction between COMMON and .bss is subtle. Modern versions of gcc assign symbols in relocatable object files to COMMON and .bss using the following convention:

COMMON Uninitialized global variables
.bss Uninitialized static variables and global or static variables that are initialized to zero

The distinction stems from the way the linker performs symbol resolution

Practice Problem 7.1

This problem concerns the m.o and swap.o modules from the following code.

For each symbol that is defined or referenced in swap.o, indicate whether or not it will have a symbol table entry in the .symtab section in module swap.o.

If so, indicate the module that defines the symbol (swap.o or m.o), the symbol type (local, global, or extern), and the section (.text, .data, .bss, or COMMON) it is assigned to in the module.

Symbol .symtab entry? Symbol type Module where defined Section

Buf

Bufp0

bufp1

swap

temp
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
//source of swap.c
extern int buf[];

int *bufp0 = &buf[0];
int *bufp1;

void swap()
{
int temp;

 bufp1 = &buf[1];
 temp = *bufp0;
 *bufp0 = *bufp1;
 *bufp1 = temp;
 }
1
2
3
4
5
6
7
8
9
10
//source of m.c
void swap();

int buf[2] = {1, 2};

int main()
{
swap();
return 0;
}

My solution : :white_check_mark:

Symbol	.symtab entry?	Symbol type	Module where defined	Section
Buf	YES	Extern	m.o	UNDEF
Bufp0	YES	Global	swap.o	.data
bufp1	YES	Global	swap.o	COMMON
swap	YES	Global	swap.o	.text
temp	NO	local	swap.o	NO

Verification:

$ readelf -s swap.o
Symbol table '.symtab' contains 13 entries:
   Num:    Value          Size Type    Bind   Vis      Ndx Name
     9: 0000000000000000     8 OBJECT  GLOBAL DEFAULT    5 bufp0
    10: 0000000000000000     0 NOTYPE  GLOBAL DEFAULT  UND buf
    11: 0000000000000008     8 OBJECT  GLOBAL DEFAULT  COM bufp1
    12: 0000000000000000    63 FUNC    GLOBAL DEFAULT    1 swap

7.6 Symbol Resolution

The linker resolves symbol references by associating each reference with exactly one symbol definition from the symbol tables of its input relocatable object files

The compiler allows only one definition of each local symbol per module.
The compiler also ensures that static local variables, which get local linker symbols, have unique names.(x.1780 and x.1775, for example)
When the compiler encounters a symbol (either a variable or function name) that is not defined in the current module, it assumes that it is defined in some other module, generates a linker symbol table entry, and leaves it for the linker to handle.

If the linker is unable to find a definition for the referenced symbol in any of its input modules, it prints an error message and terminates

For instance
1
2
3
4
5
void fun(void);
int main(){
fun();
return 0;
}
1
2
3
4
$ gcc -o error error.c
/tmp/cc10TnoD.o: In function `main':
error.c:(.text+0x5): undefined reference to `fun'
collect2: error: ld returned 1 exit status
Multiple object modules might define global symbols with the same name, the linker must report an error or choose one as the definition and discard the rest.

7.6.1 How Linkers Resolve Duplicate Symbol Names

Here is the approach that Linux compilation systems use.

At compile time, the compiler exports each global symbol to the assembler as either strong or weak, and the assembler encodes this information implicitly in the symbol table of the relocatable object file.

Functions and initialized global variables get strong symbols. Uninitialized global variables get weak symbols.
Linux linkers use the following rules for dealing with duplicate symbol names:
- Rule 1. Multiple strong symbols with the same name are not allowed
- Rule 2. Given a strong symbol and multiple weak symbols with the same name, choose the strong symbol.
- Rule 3. Given multiple weak symbols with the same name, choose any of the weak symbols.(Can be viewed as an undefined behavior!!!)
Tons of bugs can happen when using global variables between different modules, that is why someone say that it is a bad idea to use global variables, be careful when you using it!!! Compile with the following parameters
- -fno-common : report error if encounters multiply defined global symbols
- -Werror : turns all warnings into errors.
- -Wall : enables all the warnings

And this is the reason for COMMON and .bss section

When compiler encounters a static local symbol, it can confidently say that it is the definition.
When the compiler encounters global symbos which is initialized to 0, it can say that it is a strong symbol and put it in .bss
Only when the compiler encounters an uninitialized global symbol, it can’t guarantee that it is a definition, since it is a weak symbol, so put it in COMMON

Practice Problem 7.2

In this problem, let REF(x.i) -> DEF(x.k) denote that the linker will associate an arbitrary reference to symbol x in module i to the definition of x in module k.

For each example that follows, use this notation to indicate how the linker would resolve references to the multiply-defined symbol in each module.

If there is a link-time error (rule 1), write “error”. If the linker arbitrarily chooses one of the definitions (rule 3), write “unknown”.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
//A
//Module 1
int main(){}
//Module 2
int main;
int p2(){}
/*
(a)REF(main.1) -> DEF(______)
(b)REF(main.2) -> DEF(______)
*/
//B
//Module 1
void main(){}
//Module 2
int main = 1;
int p2(){}
/*
(a)REF(main.1) -> DEF(_____)
(b)REF(main.2) -> DEF(_____) 
*/
//C
//Module 1
int x;
void main(){}
//Module 2
double x = 1.0;
int p2(){}
/*
(a)REF(x.1) -> DEF(_____)
(b)REF(x.2) -> DEF(_____) 

My solution : :white_check_mark:

A:
(a)REF(main.1) -> DEF(main.1)
(b)REF(main.2) -> DEF(main.1)
B:
(a)REF(main.1) -> DEF(ERROR)
(b)REF(main.2) -> DEF(ERROR)
C:
(a)REF(x.1) -> DEF(x.2)
(b)REF(x.2) -> DEF(x.2)

7.6.2 Linking with Static Libraries

In practice, all compilation systems provide a mechanism for packaging related object modules into a single file called a static library, which can then be supplied as input to the linker

When it builds the output executable, the linker copies only the object modules in the library that are referenced by the application program.

Library functions can be compiled into separate object modules and then packaged in a single static library file.
Application programs can then use any of the functions defined in the library by specifying a single filename on the command line
1
2
# program use C standard library
$ gcc main.c /usr/lib/libc.a
At link time, the linker will only copy the object modules that are referenced by the program, which reduces the size of the executable on disk and in memory.

On Linux systems, static libraries are stored on disk in a particular file format known as an archive.

An archive is a collection of concatenated relocatable object files, with a header that describes the size and location of each member object file.
Archive filenames are denoted with the .a suffix.

Let’s make our own little static library for illustration

//source of addvec.c
int addcnt = 0;
void addvec(int *x, int *y,
int *z, int n)
{
int i;
addcnt++;
 for (i = 0; i < n; i++)
 z[i] = x[i] + y[i];
 }

//source of mulvec.c
int multcnt = 0;
void multvec(int *x, int *y,
int *z, int n)
{
int i;
multcnt++;
 for (i = 0; i < n; i++)
 z[i] = x[i] * y[i];
 }

1
2
3

$ gcc -c addvec.c mulvec.c
$ ar rcs libvec.a addvec.o mulvec.o 
# rcs are all arguments for ar, see at `man ar`

1
2
3

//source of vector.h
void addvec(int *x, int *y, int *z, int n);
void multvec(int *x, int *y, int *z, int n);

//source of main.c
#include <stdio.h>
#include "vector.h"

int x[2] = {1, 2};
int y[2] = {3, 4};
int z[2];

int main()
{
 addvec(x, y, z, 2);
 printf("z = [%d %d]\n", z[0], z[1]);
 return 0;
 }

$ gcc -c main.c
$ gcc -o prog main.c -static -L. -lvec
$ file prog
prog: ELF 64-bit LSB executable, x86-64, version 1 (GNU/Linux), statically linked, for GNU/Linux 3.2.0, BuildID[sha1]=4def2b1996af12eace08bd1e58e7b08332933dc9, not stripped
$ ./prog
[4 6]
# -L specify the directory to look up the static library
# . means current directory
# -lvec is shorthand for libvec.a
# -lxxxxxx is short for libxxxxx.a
# libc.a is passed as default argument in gcc
# -static in manual :On systems that support dynamic linking, this prevents linking with the shared libraries.  On other systems, this option has no effect.
# statically linked means we have all instructions stored in our ELF file, no run-time linking needed.

link static library

The -static argument tells the compiler driver that the linker should build a fully linked executable object file that can be loaded into memory and run without any further linking at load time
Since the program doesn’t reference any symbols defined by mulvec.o, the linker does not copy this module into the executable
The linker also copies the printf.o module from libc.a, along with a number of other modules from the C run-time system.

7.6.3 How Linkers Use Static Libraries to Resolve References

During the symbol resolution phase, the linker scans the relocatable object files and archives left to right in the same sequential order that they appear on the compiler driver’s command line.

During this scan, the linker maintains

a set E of relocatable object files that will be merged to form the executable,
a set U of unresolved symbols (i.e., symbols referred to but not yet defined),
a set D of symbols that have been defined in previous input files.
Initially, E, U, and D are empty.

# scan process
E = []
U = []
D = []
for file in command_line_arguments:
    if file is object_file:
        E.append(file)
        U.update(file.symbols)
        D.update(file.symbols)
    elif file is archive_file:
        for undefsym in U:
            for m in file:
                if m.defines(undefsym):
                    E.append(m)
                    U.update(m.symbols)
                    D.update(m.symbols)
if U.empty() == False:
    sys.stderr.write(error)
    exit()
else:
    build(E)

This algorithm can result in some baffling link-time errors because the ordering of libraries and object files on the command line is significant.

If the library that defines a symbol appears on the command line before the object file that references that symbol, then the reference will not be resolved and linking will fail(Library should always comes after object files)

For example

$ gcc -static -o prog ./libvec.a main.o 
main.o: In function `main':
main.c:(.text+0x1f): undefined reference to `addvec'
collect2: error: ld returned 1 exit status
# U is initialized to empty, so if libraries come first, no object in archive will be added to E.
# Then the `addevec` cannot be solved laterly

Libraries can be repeated on the command line if necessary to satisfy the dependence requirements.

(Object files do not need to be repeated since they are added to E immediately after appearance)

Practice Problem 7.3

Let a and b denote object modules or static libraries in the current directory, and let a→b denote that a depends on b, in the sense that b defines a symbol that is referenced by a.

For each of the following scenarios, show the minimal command line (i.e., one with the least number of object file and library arguments) that will allow the static linker to resolve all symbol references.
1
2
3
A. p.o → libx.a
B. p.o → libx.a → liby.a
C. p.o → libx.a → liby.a and liby.a → libx.a → p.o

My solution : :white_check_mark:

# A
$ gcc p.o ./libx.a
# B
$ gcc p.o ./libx.a ./liby.a
# C
$ gcc p.o ./libx.a ./liby.a ./libx.a

7.7 Relocation

After symbol resolution, the linker knows the exact sizes of the code and data sections in its input object modules. It is now ready to begin the relocation step, where it merges the input modules and assigns run-time addresses to each symbol

Relocation consists of two steps

Relocating sections and symbol definitions

The linker merges all sections of the same type into a new aggregate section of the same type
The linker then assigns run-time memory addresses to the new aggregate sections, to each section defined by the input modules, and to each symbol defined by the input modules

Relocating symbol references within sections

the linker modifies every symbol reference in the bodies of the code and data sections so that they point to the correct run-time addresses.
To perform this step, the linker relies on data structures in the relocatable object modules known as relocation entries, which we describe next.

7.7.1 Relocation Entries

Whenever the assembler encounters a reference to an object whose ultimate location is unknown, it generates a relocation entry that tells the linker how to modify the reference when it merges the object file into an executable

Relocation entries for code are placed in .rel.text. Relocation entries for data are placed in .rel.data.

typedef struct {
long offset; /* Offset of the reference to relocate */
long type:32, /* Relocation type */
symbol:32; /* Symbol table index */
long addend; /* Constant part of relocation expression */
} Elf64_Rela;
//ELF relocation entry. Each entry identifies a reference that must be relocated and specifies how to compute the modified reference.

$ readelf -r main.o

Relocation section '.rela.text' at offset 0x2c8 contains 8 entries:
  Offset          Info           Type           Sym. Value    Sym. Name + Addend
00000000000c  000b00000002 R_X86_64_PC32     0000000000000008 z - 4
000000000013  000a00000002 R_X86_64_PC32     0000000000000008 y - 4
00000000001a  000900000002 R_X86_64_PC32     0000000000000000 x - 4
00000000001f  000e00000004 R_X86_64_PLT32    0000000000000000 addvec - 4
000000000025  000b00000002 R_X86_64_PC32     0000000000000008 z + 0
00000000002b  000b00000002 R_X86_64_PC32     0000000000000008 z - 4
000000000034  000500000002 R_X86_64_PC32     0000000000000000 .rodata - 4
00000000003e  000f00000004 R_X86_64_PLT32    0000000000000000 printf - 4

Relocation section '.rela.eh_frame' at offset 0x388 contains 1 entry:
  Offset          Info           Type           Sym. Value    Sym. Name + Addend
000000000020  000200000002 R_X86_64_PC32     0000000000000000 .text + 0
$ readelf -x .rela.text  main.o
Hex dump of section '.rela.text':
  0x00000000 0c000000 00000000 02000000 0b000000 
  0x00000010 fcffffff ffffffff 13000000 00000000

The offset is the section offset of the reference that will need to be modified.
The symbol identifies the symbol that the modified reference should point to.
The type tells the linker how to modify the new reference.
The addend is a signed constant that is used by some types of relocations to bias the value of the modified reference.

ELF defines 32 different relocation types, we will see two of them

R_X86_64_PC32
- Relocate a reference that uses a 32-bit PC-relative address
R_X86_64_32
- Relocate a reference that uses a 32-bit absolute address

7.7.2 Relocating Symbol References

Relocation algorithm

foreach section s {
    foreach relocation entry r {
    refptr= s+ r.offset; /* ptr to reference to be relocated */

    /* Relocate a PC-relative reference */
    /* ADDR(s) means run-time address for section s */
    /* ADDR(r.symbol) means run-time address for r's symbol(when it is referred, where acutally it is, the 'real address') */
    if (r.type == R_X86_64_PC32) {
        refaddr = ADDR(s) + r.offset; /* ref’s run-time address */
        *refptr = (unsigned) (ADDR(r.symbol) + r.addend - refaddr);
    }

     /* Relocate an absolute reference */
     if (r.type == R_X86_64_32)
     *refptr = (unsigned) (ADDR(r.symbol) + r.addend);
     }
 }

Relocating PC-Relative References

1
2
3

# sum here is a reference
e: e8 00 00 00 00 		callq 13 <main+0x13> sum()
		f: R_X86_64_PC32 sum-0x4 Relocation entry

The assembler will generate a relocate entry for sum in .rela.text section

Later the when the linker executing the relocation algorithm:

Calculate the address of sum in symbol table

1	refptr = s + r.offset;

Modify it using PC relative principle

1 2	*refptr = (unsigned) (ADDR(r.symbol) + r.addend - refaddr); //PC_rela_address = target address + bias - run_time_refer_addr

The symbol’s address is determined and stored in Elf64_Symbol.value

Symbol table '.symtab' contains 12 entries:
   Num:    Value          Size Type    Bind   Vis      Ndx Name
11: 0000000000000000     0 NOTYPE  GLOBAL DEFAULT  UND sum
# value change from 0x0 to PC_rela_address

After that, every call to sum will use the address find in .symtab

Relocating Absolute References

Almost the same with PC relative method, just put the ADDR(s.symbol) directly into the Elf64_Symbol.value field

1 2	*refptr = (unsigned) (ADDR(r.symbol) + r.addend); //actual address + bias

7.8 Executable Object Files

Our example C program, which began life as a collection of ASCII text files, has been transformed into a single binary file that contains all of the information needed to load the program into memory and run it.

elf structure

The format of an executable object file is similar to that of a relocatable object file.(But NOT identical !)
It includes the program’s entry point, which is the address of the first instruction to execute when the program runs
The .init section defines a small function, called _init, that will be called by the program’s initialization code
Since the executable is fully linked (relocated), it needs no .rel sections.

ELF executables are designed to be easy to load into memory, with contiguous chunks of the executable file mapped to contiguous memory segments.

program header table

This mapping is described by the program header table.
For any segment s in objective file, the linker must choose a starting address, vaddr, such that
1
vaddr % align == off % align
- off is the offset of the segment’s first section in the object file
- align is the alignment specified in the program header $(2^{21} = (200000)_{16})$
This alignment requirement is an optimization that enables segments in the object file to be transferred efficiently to memory when the program executes

Both readelf and objdump is useful

1
2
3

# -l stand for --program-headers
$ readelf -l program 
$ objdump -x program

7.9 Loading Executable Object Files

If you want to run a non-built-in program, you type /path/program to run it

The shell runs for us by invoking some memory-resident operating system code known as the loader

Any Linux program can invoke the loader by calling the execve function
The loader copies the code and data in the executable object file from disk into memory and then runs the program by jumping to its first instruction, or entry point.

run time image

When the loader runs, it creates a memory image
Guided by the program header table, it copies chunks of the executable object file into the code and data segments.
Next, the loader jumps to the program’s entry point, which is always the address of the _start function
- This function is defined in the system object file crt1.o and is the same for all C programs.
- The _start function calls the system startup function, __libc_start_ main, which is defined in libc.so.
- It initializes the execution environment, calls the user-level main function, handles its return value, and if necessary returns control to the kernel.

7.10 Dynamic Linking with Shared Libraries

Static libraries still have some significant disadvantages

Static libraries, like all software, need to be maintained and updated periodically(relinking required in every update)
Almost every C program uses standard I/O functions. At run time, the code for these functions is duplicated in the text segment of each running process

Shared libraries are modern innovations that address the disadvantages of static libraries.

A shared library is an object module that, at either run time or load time, can be loaded at an arbitrary memory address and linked with a program in memory.

This process is known as dynamic linking and is performed by a program called a dynamic linker
Shared libraries are also referred to as shared objects, and on Linux systems they are indicated by the .so suffix

Shared libraries are “shared” in two different ways.

In any given file system, there is exactly one .so file for a particular library.
- The code and data in this .so file are shared by all of the executable object files that reference the library
- opposed to the contents of static libraries, which are copied and embedded in the executables that reference them
A single copy of the .text section of a shared library in memory can be shared by different running processes

A concrete example:

shared library dynamic link

1	$ gcc -shared -fpic -o libvector.so addvec.c mulvec.c

The -fpic flag directs the compiler to generate position-independent code
The -shared flag directs the linker to create a shared object file

1	$ gcc -o prog2l main2.c ./libvector.so

The basic idea is to do some of the linking statically when the executable file is created, and then complete the linking process dynamically when the program is loaded.
The linker ONLY copies some relocation and symbol table information that will allow references to code and data in libvector.so to be resolved at load time

Execution

When the loader loads and runs the executable prog2l, it loads the partially linked executable
Next, it notices that prog2l contains a .interp section, which contains the path name of the dynamic linker (e.g., ld-linux.so on Linux systems)
the loader loads and runs the dynamic linker
The dynamic linker then finishes the linking task by performing the following relocations:
- Relocating the text and data of libc.so into some memory segment
- Relocating the text and data of libvector.so into another memory segment
- Relocating any references in prog2l to symbols defined by libc.so and libvector.so
Finally, the dynamic linker passes control to the application

7.11 Loading and Linking Shared Libraries from Applications

It is also possible for an application to request the dynamic linker to load and link arbitrary shared libraries while the application is running.

Linux systems provide a simple interface to the dynamic linker that allows application programs to load and link shared libraries at run time.

#include <dlfcn.h>
void *dlopen(const char *filename, int flag);
//Returns: pointer to handle if OK. NULL on error.
// A handle is something like a `FILE*`

The dlopen function loads and links the shared library filename.
The external symbols in filename are resolved using libraries previously opened with the RTLD_ GLOBAL flag.

If the current executable was compiled with the -rdynamic flag, then its global symbols are also available for symbol resolution.
The flag argument must include either RTLD_NOW, which tells the linker to resolve references to external symbols immediately, or the RTLD_LAZY flag, which instructs the linker to defer symbol resolution until code from the library is executed.

Either of these values can be | with the RTLD_GLOBAL flag.

1
2
3

#include <dlfcn.h>
void *dlsym(void *handle, char *symbol);
//Returns: pointer to symbol if OK. NULL on error

The dlsym function takes a handle to a previously opened shared library and a symbol name and returns the address of the symbol, if it exists, or NULL otherwise.

1
2
3

#include <dlfcn.h>
int dlclose (void *handle);
//Returns: 0 if OK, −1 on error

The dlclose function unloads the shared library if no other shared libraries are still using it.

#include <dlfcn.h>
const char *dlerror(void);
//Returns: error message if previous call to dlopen, dlsym, or dlclose failed;
//NULL if previous call was OK

An example of using this interface to dynamically link libvector.so shared library at run time and then invoke its addvec routine

#include <stdio.h>
#include <stdlib.h>
#include <dlfcn.h>

int x[2] = {1, 2};
int y[2] = {3, 4};
int z[2];

int main()
 {
 void * handle;
 void (*addvec)(int *, int *, int *, int); // a function pointer to the function we want
 char *error;

 /* Dynamically load the shared library containing addvec() */
 handle = dlopen("./libvector.so", RTLD_LAZY);

 if (!handle) {
    fprintf(stderr, "%s\n", dlerror());
    exit(1);
 }

 /* Get a pointer to the addvec() function we just loaded */
 addvec = dlsym(handle, "addvec");
 if ((error = dlerror()) != NULL) {
    fprintf(stderr, "%s\n", error);
    exit(1);
 }

 /* Now we can call addvec() just like any other function */
 addvec(x, y, z, 2);
 printf("z = [%d %d]\n", z[0], z[1]);

 /* Unload the shared library */
 if (dlclose(handle) < 0) {
    fprintf(stderr, "%s\n", dlerror());
    exit(1);
 }
    return 0;
 }

$ gcc -rdynamic -o prog2r dll.c -ldl
# -ldl stand for libdl.a
# If we want to use <dlfnc.h>
$ locate libdl.a
/usr/lib/x86_64-linux-gnu/libdl.a

7.12 Position-Independent Code (PIC)

A key purpose of shared libraries is to allow multiple running processes to share the same library code in memory and thus save precious memory resources

Modern systems compile the code segments of shared modules so that they can be loaded anywhere in memory without having to be modified by the linker.
Users direct GNU compilation systems to generate PIC code with the -fpic option to gcc. Shared libraries must always be compiled with this option.

PIC Data References

Important fact : no matter where we load an object module (including shared object modules) in memory, the data segment is always the same distance from the code segment.

Compilers that want to generate PIC references to global variables exploit this fact by creating a table called the global offset table (GOT) at the beginning of the data segment.
The GOT contains an 8-byte entry for each global data object (procedure or global variable) that is referenced by the object module.
The compiler also generates a relocation record for each entry in the GOT.
At load time, the dynamic linker relocates each GOT entry so that it contains the absolute address of the object

got

PIC Function Calls

The compiler has no way of predicting the run-time address of the function, since the shared module could be loaded anywhere at run time.

GNU compilation systems solve this problem using an interesting technique, called lazy binding, that defers the binding of each procedure address until the first time the procedure is called.

Lazy binding is implemented with a compact yet somewhat complex interaction between two data structures: the GOT and the procedure linkage table (PLT).

The GOT is part of the data segment. The PLT is part of the code segment.
The PLT is an array of 16-byte code entries.
- PLT[0] is a special entry that jumps into the dynamic linker
- PLT[1] invokes the system startup function (__libc_start_main), which initializes the execution environment, calls the main function, and handles its return value.
- Entries starting at PLT[2] invoke functions called by the user code.
When used in conjunction with the PLT, GOT[0] and GOT[1] contain information that the dynamic linker uses when it resolves function addresses
- GOT[2] is the entry point for the dynamic linker in the ld-linux.so module
- Each of the remaining entries corresponds to a called function whose address needs to be resolved at run time. Each has a matching PLT entry.
- Initially, each GOT entry points to the second instruction in the corresponding PLT entry.

got and plt

Instead of directly calling addvec, the program calls into PLT[2], which is the PLT entry for addvec.
The first PLT instruction does an indirect jump through GOT[4]. Since each GOT entry initially points to the second instruction in its corresponding PLT entry, the indirect jump simply transfers control back to the next instruction in PLT[2].
After pushing an ID for addvec,(0x1) onto the stack, PLT[2] jumps to PLT[0]
PLT[0] pushes an argument for the dynamic linker indirectly through GOT[1] and then jumps into the dynamic linker indirectly through GOT[2].
The dynamic linker uses the two stack entries to determine the runtime location of addvec, overwrites GOT[4] with this address, and passes control to addvec.

Most of the job are done by the dynamic linker during run time, including fill in the GOT

7.13 Library Interpositioning

Linux linkers support a powerful technique, called library interpositioning, that allows you to intercept calls to shared library functions and execute your own code instead.

Using interpositioning, you could trace the number of times a particular library function is called, validate and trace its input and output values, or even replace it with a completely different implementation.
Basic idea: Given some target function to be interposed on, you create a wrapper function whose prototype is identical to the target function.

Using some particular interpositioning mechanism, you then trick the system into calling the wrapper function instead of the target function. The wrapper function typically executes its own logic, then calls the target function and passes its return value back to the caller.
Interpositioning can occur at compile time, link time, or run time as the program is being loaded and executed.

Example program

//int.c
#include <stdio.h>
#include <malloc.h>

int main()
{
int *p = malloc(32);
free(p);
return(0);
}

//malloc.h
#define malloc(size) mymalloc(size)
#define free(ptr) myfree(ptr)

void *mymalloc(size_t size);
void myfree(void *ptr);

//malloc.c
#ifdef COMPILETIME
#include <stdio.h>
#include <malloc.h>

/* malloc wrapper function */

void *mymalloc(size_t size)
{
void *ptr = malloc(size);
printf("malloc(%d)=%p\n",
 (int)size, ptr);
 return ptr;
 }

 /* free wrapper function */
 void myfree(void *ptr)
 {
 free(ptr);
 printf("free(%p)\n", ptr);
 }
 #endif

#ifdef LINKTIME
#include <stdio.h>

void *__real_malloc(size_t size);
void __real_free(void *ptr);

/* malloc wrapper function */
void *__wrap_malloc(size_t size)
{
 void *ptr = __real_malloc(size); /* Call libc malloc */
 printf("malloc(%d) = %p\n", (int)size, ptr);
 return ptr;
 }

 /* free wrapper function */
 void __wrap_free(void *ptr)
 {
 __real_free(ptr); /* Call libc free */
 printf("free(%p)\n", ptr);
 }
 #endif

7.13.1 Compile-Time Interpositioning

use the C preprocessor to interpose at compile time.

The local malloc.h header file instructs the preprocessor to replace each call to a target function with a call to its wrapper.

$ gcc -DCOMPILETIME -c mymalloc.c
# -D name means predefine name as a macro
$ gcc -I. -o intc int.c mymalloc.o
# -I tells the C preprocessor to look for malloc.h in the current directory before looking in the usual system directories
$ ./intc
malloc(32)=0x55999c4fb260
free(0x55999c4fb260)

Notice that the wrappers in mymalloc.c are compiled with the standard malloc.h header file(No -I option specified)

7.13.2 Link-Time Interpositioning

The Linux static linker supports link-time interpositioning with the --wrap f flag

This flag tells the linker to resolve references to symbol f as __wrap_f
and to resolve references to symbol __real_f as f.

1
2
3

$ gcc -D LINKTIME -c mymalloc.c
$ gcc -c int.c
$ gcc -Wl,--wrap,malloc -Wl,--wrap,free -o intl int.o mymalloc.o

The -Wl,option flag passes option to the linker.
Each comma in option will be replaced with a space.
So -Wl,--wrap,malloc passes --wrap malloc to the linker

7.13.3 Run-Time Interpositioning

Compile-time interpositioning requires access to a program’s source files. Linktime interpositioning requires access to its relocatable object files

However, there is a mechanism for interpositioning at run time that requires access only to the executable object file

This fascinating mechanism is based on the dynamic linker’s LD_PRELOAD environment variable.

If the LD_PRELOAD environment variable is set to a list of shared library pathnames (separated by spaces or colons), then when you load and execute a program, the dynamic linker (ld-linux.so) will search the LD_PRELOAD libraries first, before any other shared libraries, when it resolves undefined references.

//mymalloc.c
#ifdef RUNTIME
#define _GNU_SOURCE
#include <stdio.h>
#include <stdlib.h>
#include <dlfcn.h>

/* malloc wrapper function */
void *malloc(size_t size)
{
 void *(*mallocp)(size_t size);
 char *error;

 mallocp = dlsym(RTLD_NEXT, "malloc"); 
    /* RTLD_NEXT tells dlsym to use the second found malloc(libc version) rather than the first one(our own version)*/
    /* Get address of libc malloc */
 if ((error = dlerror()) != NULL) {
 fputs(error, stderr);
 exit(1);
 }
 char *ptr = mallocp(size); /* Call libc malloc */
 //printf("malloc(%d) = %p\n", (int)size, ptr);
 fprintf(stderr , "malloc(%d) = %p\n", (int)size, ptr);
 return ptr;
 }

 /* free wrapper function */
 void free(void *ptr)
 {
 void (*freep)(void *) = NULL;
 char *error;

 if (!ptr)
 return;

 freep = dlsym(RTLD_NEXT, "free"); /* Get address of libc free */
 if ((error = dlerror()) != NULL) {
 fputs(error, stderr);
 exit(1);
 }
 freep(ptr); /* Call libc free */
 //printf("free(%p)\n", ptr);
 fprintf(stderr , "free(%p)\n", ptr);
 }
 #endif
/*
The code on the book has some subtle errors, printf will use stdout, which result in conflict with other program's code.
Switch to stderr will thwart the error
*/

1
2
3

$ gcc -D RUNTIME -shared -fpic -o mymalloc.so mymalloc.c -ldl
$ gcc -o intr int.c
$ LD_PRELOAD="./mymalloc.so" ./intr

Notice that you can use LD_PRELOAD to interpose on the library calls of any executable program!

1 2	$ LD_PRELOAD="./mymalloc.so" /usr/bin/uptime $ LD_PRELOAD="./mymalloc.so" ls

7.14 Tools for Manipulating Object Files

ar Creates static libraries, and inserts, deletes, lists, and extracts members.
strings Lists all of the printable strings contained in an object file
strip Deletes symbol table information from an object file
nm Lists the symbols defined in the symbol table of an object file
size Lists the names and sizes of the sections in an object file
readelf Displays the complete structure of an object file, including all of the information encoded in the ELF header. Subsumes the functionality of size and nm
objdump The mother of all binary tools. Can display all of the information in an object file.
ldd Lists the shared libraries that an executable needs at run time.

7.15 Summary

Library interposition is very powerful

hook

GOT modification