StaticLink

Part 2 : Static Linkage

Simple compilation theory

compilation

1
2
3
4
# scanner
$ lex
# parser (yet another compiler compiler)
$ yacc

Question : address of the symbol ?

1
2
3
4
5
6
extern int array [10];
array[0] = 100;
// What the address of array ?
void fun(void);
fun();
// Where should CPU go in order to find function `fun` ?

In a word, the question is accessing symbols outside local module, whose address is unknown.

Solution : Linker

The main job of a linker :

  • address and memory allocation

  • symbol resolution

  • relocation

Object File format

file format

(.bss is only a placeholder, it do not take space when stored on disk. It will obtain memory during run time)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
int printf(const char * format , ...);
int global_init_var = 84 ;
int global_uninit_var ;

void func1(int i ){
printf("%d\n" , i);
}

int main(void){
static int static_var = 85 ;
static int static_var2 ;
int a = 1;
int b ;

func1( static_var + static_var2 + a + b);
return a ;

}
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
$ objdump -h main.o # examine file header (TOC of sections)
# Only show some important sections, ignoring ancillary sections

main.o: file format elf64-x86-64

Sections:
Idx Name Size VMA LMA File off Algn
0 .text 00000057 0000000000000000 0000000000000000 00000040 2**0
CONTENTS, ALLOC, LOAD, RELOC, READONLY, CODE
1 .data 00000008 0000000000000000 0000000000000000 00000098 2**2
CONTENTS, ALLOC, LOAD, DATA
2 .bss 00000004 0000000000000000 0000000000000000 000000a0 2**2
ALLOC
3 .rodata 00000004 0000000000000000 0000000000000000 000000a0 2**0
CONTENTS, ALLOC, LOAD, READONLY, DATA
4 .comment 0000002a 0000000000000000 0000000000000000 000000a4 2**0
CONTENTS, READONLY
5 .note.GNU-stack 00000000 0000000000000000 0000000000000000 000000ce 2**0
CONTENTS, READONLY
6 .eh_frame 00000058 0000000000000000 0000000000000000 000000d0 2**3
CONTENTS, ALLOC, LOAD, RELOC, READONLY, DATA

$ size main.o # section sizes
text data bss dec hex filename
179 8 4 191 bf main.o

Extract one section

1
2
3
4
5
$ dd if=main.o of=code.o bs=1 count=87 skip=64
$ objdump -D -b binary -m i386 code.o > codedump
$ objdump -d main.o > maindump
$ diff maindump codedump
# We can see that we have the same bytes

View hexdump

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
$ objdump -s main.o

main.o: file format elf64-x86-64

Contents of section .text:
0000 554889e5 4883ec10 897dfc8b 45fc89c6 UH..H....}..E...
0010 488d3d00 000000b8 00000000 e8000000 H.=.............
0020 0090c9c3 554889e5 4883ec10 c745f801 ....UH..H....E..
0030 0000008b 15000000 008b0500 00000001 ................
0040 c28b45f8 01c28b45 fc01d089 c7e80000 ..E....E........
0050 00008b45 f8c9c3 ...E...

Contents of section .data:
0000 54000000 55000000 T...U...
# 0x00000054 = 84, int global_init_var = 84;
# 0x00000055 = 85, static int static_var = 85;

Contents of section .rodata:
0000 25640a00 %d..
"%d\n"

Contents of section .comment:
0000 00474343 3a202855 62756e74 7520372e .GCC: (Ubuntu 7.
0010 352e302d 33756275 6e747531 7e31382e 5.0-3ubuntu1~18.
0020 30342920 372e352e 3000 04) 7.5.0.

Contents of section .eh_frame:
0000 14000000 00000000 017a5200 01781001 .........zR..x..
0010 1b0c0708 90010000 1c000000 1c000000 ................
0020 00000000 24000000 00410e10 8602430d ....$....A....C.
0030 065f0c07 08000000 1c000000 3c000000 ._..........<...
0040 00000000 33000000 00410e10 8602430d ....3....A....C.
0050 066e0c07 08000000 .n......

Tell gcc to put variable/function in specified section

1
__attribute__((section("foobar")) int var = 99;
1
2
3
4
$ readelf -x foobar test.o

Hex dump of section 'foobar':
0x00000000 63000000 c...

ELF file format

elf

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
$ readelf -S main.o # show all the sections
There are 13 section headers, starting at offset 0x448:

Section Headers:
[Nr] Name Type Address Offset
Size EntSize Flags Link Info Align
[ 0] NULL 0000000000000000 00000000
0000000000000000 0000000000000000 0 0 0
[ 1] .text PROGBITS 0000000000000000 00000040
0000000000000057 0000000000000000 AX 0 0 1
[ 2] .rela.text RELA 0000000000000000 00000338
0000000000000078 0000000000000018 I 10 1 8
[ 3] .data PROGBITS 0000000000000000 00000098
0000000000000008 0000000000000000 WA 0 0 4
[ 4] .bss NOBITS 0000000000000000 000000a0
0000000000000004 0000000000000000 WA 0 0 4
[ 5] .rodata PROGBITS 0000000000000000 000000a0
0000000000000004 0000000000000000 A 0 0 1
[ 6] .comment PROGBITS 0000000000000000 000000a4
000000000000002a 0000000000000001 MS 0 0 1
[ 7] .note.GNU-stack PROGBITS 0000000000000000 000000ce
0000000000000000 0000000000000000 0 0 1
[ 8] .eh_frame PROGBITS 0000000000000000 000000d0
0000000000000058 0000000000000000 A 0 0 8
[ 9] .rela.eh_frame RELA 0000000000000000 000003b0
0000000000000030 0000000000000018 I 10 8 8
[10] .symtab SYMTAB 0000000000000000 00000128
0000000000000198 0000000000000018 11 11 8
[11] .strtab STRTAB 0000000000000000 000002c0
0000000000000073 0000000000000000 0 0 1
[12] .shstrtab STRTAB 0000000000000000 000003e0
0000000000000061 0000000000000000 0 0 1
Key to Flags:
W (write), A (alloc), X (execute), M (merge), S (strings), I (info),
L (link order), O (extra OS processing required), G (group), T (TLS),
C (compressed), x (unknown), o (OS specific), E (exclude),
l (large), p (processor specific)
1
.rela.name # providing info for linker to relocate `name` section
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
# .strtab for storing strings used in source code
$ readelf -x .strtab main.o

Hex dump of section '.strtab':
0x00000000 006d6169 6e2e6300 73746174 69635f76 .main.c.static_v
0x00000010 61722e31 38303200 73746174 69635f76 ar.1802.static_v
0x00000020 6172322e 31383033 00676c6f 62616c5f ar2.1803.global_
0x00000030 696e6974 5f766172 00676c6f 62616c5f init_var.global_
0x00000040 756e696e 69745f76 61720066 756e6331 uninit_var.func1
0x00000050 005f474c 4f42414c 5f4f4646 5345545f ._GLOBAL_OFFSET_
0x00000060 5441424c 455f0070 72696e74 66006d61 TABLE_.printf.ma
0x00000070 696e00 in.

# .shstrtab for strings used in section header
$ readelf -x .shstrtab main.o

Hex dump of section '.shstrtab':
0x00000000 002e7379 6d746162 002e7374 72746162 ..symtab..strtab
0x00000010 002e7368 73747274 6162002e 72656c61 ..shstrtab..rela
0x00000020 2e746578 74002e64 61746100 2e627373 .text..data..bss
0x00000030 002e726f 64617461 002e636f 6d6d656e ..rodata..commen
0x00000040 74002e6e 6f74652e 474e552d 73746163 t..note.GNU-stac
0x00000050 6b002e72 656c612e 65685f66 72616d65 k..rela.eh_frame
0x00000060 00 .

Symbols

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
$ nm main.o
U _GLOBAL_OFFSET_TABLE_
0000000000000000 T func1
0000000000000000 D global_init_var
0000000000000004 C global_uninit_var
0000000000000024 T main
U printf
0000000000000004 d static_var.1802
0000000000000000 b static_var2.1803

$ readelf -s main.o

Symbol table '.symtab' contains 17 entries:
Num: Value Size Type Bind Vis Ndx Name
0: 0000000000000000 0 NOTYPE LOCAL DEFAULT UND
1: 0000000000000000 0 FILE LOCAL DEFAULT ABS main.c
2: 0000000000000000 0 SECTION LOCAL DEFAULT 1
3: 0000000000000000 0 SECTION LOCAL DEFAULT 3
4: 0000000000000000 0 SECTION LOCAL DEFAULT 4
5: 0000000000000000 0 SECTION LOCAL DEFAULT 5
6: 0000000000000004 4 OBJECT LOCAL DEFAULT 3 static_var.1802
7: 0000000000000000 4 OBJECT LOCAL DEFAULT 4 static_var2.1803
8: 0000000000000000 0 SECTION LOCAL DEFAULT 7
9: 0000000000000000 0 SECTION LOCAL DEFAULT 8
10: 0000000000000000 0 SECTION LOCAL DEFAULT 6
11: 0000000000000000 4 OBJECT GLOBAL DEFAULT 3 global_init_var
12: 0000000000000004 4 OBJECT GLOBAL DEFAULT COM global_uninit_var
13: 0000000000000000 36 FUNC GLOBAL DEFAULT 1 func1
14: 0000000000000000 0 NOTYPE GLOBAL DEFAULT UND _GLOBAL_OFFSET_TABLE_
15: 0000000000000000 0 NOTYPE GLOBAL DEFAULT UND printf
16: 0000000000000024 51 FUNC GLOBAL DEFAULT 1 main

$ objdump --syms main.o

main.o: file format elf64-x86-64

SYMBOL TABLE:
0000000000000000 l df *ABS* 0000000000000000 main.c
0000000000000000 l d .text 0000000000000000 .text
0000000000000000 l d .data 0000000000000000 .data
0000000000000000 l d .bss 0000000000000000 .bss
0000000000000000 l d .rodata 0000000000000000 .rodata
0000000000000004 l O .data 0000000000000004 static_var.1802
0000000000000000 l O .bss 0000000000000004 static_var2.1803
0000000000000000 l d .note.GNU-stack 0000000000000000 .note.GNU-stack
0000000000000000 l d .eh_frame 0000000000000000 .eh_frame
0000000000000000 l d .comment 0000000000000000 .comment
0000000000000000 g O .data 0000000000000004 global_init_var
0000000000000004 O *COM* 0000000000000004 global_uninit_var
0000000000000000 g F .text 0000000000000024 func1
0000000000000000 *UND* 0000000000000000 _GLOBAL_OFFSET_TABLE_
0000000000000000 *UND* 0000000000000000 printf
0000000000000024 g F .text 0000000000000033 main

Symbol decorate

How c++ support function overload and multi-namespace?

The compiler will “hash” a symbol and assign a signature to it

G++

d

Visual C++

vc

Since different compiler use different decoration strategy we have difficult in linking object files compiled by different compiler

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
$ cat test.cpp
int func(int) {}
double func(double){}
$ g++ -c test.cpp -o cpp.o
$ readelf -s cpp.o

Symbol table '.symtab' contains 10 entries:
Num: Value Size Type Bind Vis Ndx Name
0: 0000000000000000 0 NOTYPE LOCAL DEFAULT UND
1: 0000000000000000 0 FILE LOCAL DEFAULT ABS test.cpp
2: 0000000000000000 0 SECTION LOCAL DEFAULT 1
3: 0000000000000000 0 SECTION LOCAL DEFAULT 2
4: 0000000000000000 0 SECTION LOCAL DEFAULT 3
5: 0000000000000000 0 SECTION LOCAL DEFAULT 5
6: 0000000000000000 0 SECTION LOCAL DEFAULT 6
7: 0000000000000000 0 SECTION LOCAL DEFAULT 4
8: 0000000000000000 10 FUNC GLOBAL DEFAULT 1 _Z4funci
9: 000000000000000a 12 FUNC GLOBAL DEFAULT 1 _Z4funcd

C used to decorate symbols with a prefixed _, but it doesn’t do this nowadays, so we can prevent c++ symbol decoration by declaring that these codes are C codes

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
$ cat test.cpp
int func(int) {}
double func(double){}
extern "C" {
void func(void){}
}
$ g++ -c test.cpp -o cpp.o
$ readelf -s cpp.o

Symbol table '.symtab' contains 11 entries:
Num: Value Size Type Bind Vis Ndx Name
0: 0000000000000000 0 NOTYPE LOCAL DEFAULT UND
1: 0000000000000000 0 FILE LOCAL DEFAULT ABS test.cpp
2: 0000000000000000 0 SECTION LOCAL DEFAULT 1
3: 0000000000000000 0 SECTION LOCAL DEFAULT 2
4: 0000000000000000 0 SECTION LOCAL DEFAULT 3
5: 0000000000000000 0 SECTION LOCAL DEFAULT 5
6: 0000000000000000 0 SECTION LOCAL DEFAULT 6
7: 0000000000000000 0 SECTION LOCAL DEFAULT 4
8: 0000000000000000 10 FUNC GLOBAL DEFAULT 1 _Z4funci
9: 000000000000000a 12 FUNC GLOBAL DEFAULT 1 _Z4funcd
10: 0000000000000016 7 FUNC GLOBAL DEFAULT 1 func
Reverse c++ name decoration
1
2
$ c++filt  _Z4funci
func(int)
System programming idiom

C++ compiler will automatically define macro __cplusplus when compiling

1
2
3
4
5
6
7
8
9
#ifdef __cplusplus
extern "C" {
#endif

void * memset(void * , int value );

#ifdef __cplusplus
}
#endif
  • If compiled by C compiler, the extern "C" {} statement will not appear

  • If compiled by C++ compiler, the memset will not be decorated, so it can properly linked to C library function later

We can see this in <stdio.h>

1
2
3
4
5
6
7
8
9
10
11
# 34 "/usr/include/x86_64-linux-gnu/bits/libc-header-start.h" 2 3 4
# 28 "/usr/include/stdio.h" 2 3 4


# 29 "/usr/include/stdio.h" 3 4
extern "C" {



# 1 "/usr/lib/gcc/x86_64-linux-gnu/7/include/stddef.h" 1 3 4
# 216 "/usr/lib/gcc/x86_64-linux-gnu/7/include/stddef.h" 3 4

Strong symbol and weak symbol

Strong : function definition and initialized global variable

Weak : uninitialized global variable

1
2
// We can manually define weak or strong
__attribute__((weak)) int var = 3 ;

Strong reference and weak reference

Strong : the definition of the reference must be found, otherwise report error

Weak : If the definition not found, use a default value(usually 0)

1
2
//We can manually define weak or strong
__attribute__((weakref)) int printf(const char * , ...);

This can be used to determine if some symbol appeared or not

1
2
3
4
5
6
7
8
9
10
__attribute__((weakref)) void fun(void);
int main(){
if ( fun ) {
printf("fun is defined");
fun();
}
else{
printf("fun not found");
    }
}

We can exploit this feature to determine if some module or library is linked

However, this code doesn’t work. The theory in turn may also be wrong. Read the gcc manual.

关于weakref的用法

gcc attribute

Debug info

DWARF : Debug With Arbitrary Record Format

1
$ strip program # remove debug info for release