Some notes about reading the 《K&C The C Programming language》
Preface
C is not a big language, and it is not well served by a big book.
C is not a very high level
language, nor a big
one, and is not specialized to any particular area of application.
Besides showing how to make effective use of the language, we have also tried where possible to illustrate useful algorithms and principles of good style and sound design.
CH1 A Tutorial Introduction
The only way to learn a new programming language is by writing programs in it.
1 | /* integer division truncates: any fractional part is discarded. */ |
Magic Number
they convey little information to someone who might have to read the program later, and they are hard to change in a systematic way.
One way to deal with magic numbers is to give them meaningful names. A #define
line defines a symbolic name or symbolic constant to be a particular string of characters
Text
A text stream is a sequence of characters divided into lines; each line consists of zero or more characters followed by a newline character. It is the responsibility of the library to make each input or output stream confirm this model
1 |
|
When you need precise control over the character flow, use getchar
instead of scanf
!
Step to write a program :
Define the problem
questions like “What is a ‘word’”,”What is a ‘shortest path’”,etcCompose test cases
Try to include all boundary conditions, special cases and large/tiny scale of the problem(
len(input) == 0 or len(input) >= 2 ** 32
). And you can make the definition more conciseCome up with the ‘right’ algorithm and data structure, solve the problem on paper/iPad first.
You show always spend most of your time planning and solving problem on paper. After you get the right idea, writing code will not cost much time!!!
Write the code
Variables
local variables are also called automatic variables, as they born and recollected automatically, in contrary to static
and global/extern
variables
Before a function can use an external variable, the name of the variable must be made known to the function; the declaration is the same as before
except for the added keywordextern
.
1 | int returnglobal(void){ |
In certain circumstances, the extern declaration can be omitted. If the definition of the external variable occurs in the source file before its use in a particular function, then there is no need for an extern declaration in the function.
CH2 Types, Operators and Objects
Variable name
Names are made up of letters and digits; the first character must be a letter.
library routines often use name begin with a
_
Upper and lower case letters are distinct
lower for variable name
upper for symbolic constants(the macro)
Variable size
1 |
Constants
1 | 1234 // int |
1 | 0123 // oct number |
A constant expression is an expression that involves only constants. Such expressions may be evaluated at during compilation rather than run-time, and accordingly may be used in any place that a constant can occur, for example, array definition
The standard library function strlen(s)
returns the length of its character string argument s
, excluding the terminal '\0'
.
1 | 'a' // integer value 97 |
enumeration constant
An enumeration is a list of constant integer values.
Enumerations provide a convenient way to associate constant values with names, an alternative to #define
with the advantage that the values can be generated for you.
1 | enum name { var1 , var2 , var3 }; //definition |
1 | // Make a Python-like boolean type when <stdbool.h> is missing |
Arithmetic
The %
operator cannot be applied to a float or double.
1 | if (!valid) // read nicely by "If not valid" |
Characters
1 |
|
For portability, specify signed
or unsigned
if non-
character data is to be stored in char variables.
Operators
1 | ++c ; // increments c before its value is used |
1 | c += 1 + 2 // c += (1+2) |
1 | condition ? res1 : res2 ; // this is an expression |
1 | //write succinct code |
Precedence
Operators on the same line have the same precedence;
rows are in order of decreasing precedence
CH3 Control Flow
Loops
1 | for ( expr1 ; expr2 ; expr3){ |
Comma
A pair of expressions separated by a comma is evaluated left to right, and the type and value of the result are the type and value of the right operand.
The commas that separate function arguments, variables in declarations, etc., are not comma operators, and do not guarantee left to right evaluation.
Continue
expr3
will be executed after continue
1 | for (int i = 0 ; i < N ; i++){ |
Goto
Use it rarely, or try to avoid using it at all.
1 | for ( ... ){ |
CH4 Functions and Program Structure
C does not allow functions to be defined inside other functions(Not like Python), but declaration is OK.
1 |
|
Static
static
serve asprivate
in C, limit access to the source file contains it.static
variable is permanent and private in a function
Register Variables
A register declaration advises the compiler that the variable in question will be heavily used. The idea is that register variables are to be placed in machine registers, which may result in smaller and faster programs. But compilers are free to ignore the advice.
The register declaration can only be applied to automatic variables and to the formal parameters of a function.
it is not possible to take the address of a register variable
Block
1 | { |
An automatic variable declared and initialized in a block is initialized each time the block is entered
hide external variables and functions of the same name.
As a matter of style, it’s best to avoid variable names that conceal names in an outer scope; the potential for confusion and error is too great.
Recursion
A way to print an integer digit by digit other than using an array
1 | void printn(int n){ |
Macro substitution
Substitutions are made only for tokens, and do not take place within quoted strings. For example, if YES
is a defined name, there would be no substitution in printf("YES")
or in YESMAN
.
1 |
|
Names may be undefined with #undef, usually to ensure that a routine is really a function, not a macro
1 |
|
A parameter name is preceded by a #
in the replacement text, the combination will be expanded into a quoted string with the parameter replaced by the actual argument.
1 |
|
You can see the substitution result by preprocess the source file
1 | $ gcc -E source.c -o afterpre.c |
The preprocessor operator ##
provides a way to concatenate names during macro expansion.
1 |
|
Conditional Inclusion
The #if
line evaluates a constant integer expression. If the expression is non-zero, subsequent lines until an #endif
or #elif
or #else
are included.
The expression defined(name)
in a #if
is 1 if the name has been defined, and 0 otherwise.
1 | //check if header has already been included |
1 | //tests the name SYSTEM to decide which version of a header to include: |
CH5 Array and Pointer
Any pointer can be cast to void *
and back again without loss of information
1 | void * p ; //a generic pointer |
When an array name is passed to a function, what is passed is the address of the initial element. Within the called function, this argument is a local variable, and so an array name parameter is a pointer, that is, a variable containing an address, which can be changed.
1 | int array [N] ; |
String Pointers
C does not provide any operators for processing an entire string of characters as a unit. It is always an array
1 | char amessage[] = "now is the time"; /* an array */ |
1 | //strcpy, implementation friendly version |
1 | *p++ = val; /* push val onto stack */ |
Want to return a string in a function ?
We can only return pointer, so let’s have some private and permanent space
1 | char* duplicate(const char * string){ |
Array pointer
1 | char stringarray [10][100] ; |
The important advantage of the pointer array(int* pointers [10]
) over multi-dimension array(int array[10][20]
)is that the rows of the array may be of different lengths(not 20
).
Pointers to Functions
In C, a function itself is not a variable, but it is possible to define pointers to functions, which
can be assigned, placed in arrays, passed to functions, returned by functions, and so on.(Since it is just the entry address of a section of code!)
1 |
|
You can use it just like an array name
1 | //Define |
1 | void (*fp)(void); // fp is a pointer, pointing to a function |
1 | //used in sorting function |
How to read a C declaration ?
1 | //recall operator precedence |
1 | char (*(*x())[])() ; |
The book also provide a ‘recursion descend’ program to understand the declaration.
CH6 Structure
A structure is a collection of one or more variables, possibly of different types, grouped together under a single name for convenient handling.
1 | //struct name is optional |
Structures can be copied as a whole entity, passing to or returning by a function.
In
x86_64
architecture, according to my observation, bigstruct
is passing and returning via using stack space,rax or rdx
serve as the pointer pointing to returned space
1 |
|
We all know that struct
will be padding if they are not aligned. How can we tell the exact size ? We use sizeof
1 | struct { |
Structures are useful when implementing hash table, since pair structure is natural for it.
1 | /* simple hash table */ |
Mimic class
via combining struct
and function pointer
1 |
|
Union
In effect, a union is a structure in which all members have offset zero from the base, the structure is big enough to hold the ``widest’’ member, and the alignment is appropriate for all
of the types in the union.
As a result, the syntax is almost the same
A union may only be initialized with a value of the type of its first member
a union can be used to force a variable to be
aligned on a particular kind of storage boundary
1 |
|
Bit-fields
Alternative to bitwise operation
1 | struct { |
This defines a variable table called flags that contains three 1-bit fields
The number following the colon represents the field width in bits.
The fields are declared unsigned int to ensure that they are unsigned quantities.
Almost everything about fields is implementation-dependent.
They are not arrays and they do not have addresses, so the
&
operator cannot be applied on them.Fields may be declared only as ints; for portability, specify signed or unsigned explicitly
CH 7 Input and Output
printf
1 | :%s: :hello, world: |
Variable-length Argument Lists
the declaration ...
means that the number and types of these arguments may vary. The declaration ...
can only appear at the end of an argument list.
1 | void myprintf(char * fmt , ...); |
- The standard header
<stdarg.h>
contains a set of macro definitions that define how to step through an argument list.
The type
va_list
is used to declare a variable that will refer to each argument in turn;The macro
va_start
initializesva_list
to point to the first unnamed argument. It must be called once beforeva_list
is used. There must be at least one named argument (fmt
)Each call of
va_arg
returns one argument and stepsva_list
to the next;va_arg
uses a type name
to determine what type to return and how big a step to take.Finally,
va_end
does whatever cleanup is necessary. It must be called before the program returns
1 |
|
printf
instdio
has a similar logic here, and the printing loop is not controlled byn
but by the format stringfmt
So we can have the format string exploit
1 |
|
1 | target = 0x555555558044 |
1 | // printf.c |
Then in __vfprintf_internal
there are thousands lines of code, and there is a function called __find_specwc
to find the first “argument” to print, so just don’t dig in too deep.
By the way, if you want to know How is “__builtin_va_list” implemented?, you may need to study the source code of your compiler rather than C stdlibs.
File
When a C program is started, the operating system environment is responsible for opening three files(stdin,stdout,stderr
) and providing pointers for them.
1 | //getchar and putchar can be defined in terms of getc, putc, stdin, and stdout as follows: |
CH8 The UNIX System Interface
A directory is a file that contains a list of filenames and some indication of where they are located. The “location’’ is an index into another table called the inode
list. The inode
for a file is where all information about the file except its name is kept. A directory entry generally consists of only two items, the filename and an inode
number.
The whole point with “everything is a file” is not that you have some random filename (indeed, sockets and pipes show that “file” and “filename” have nothing to do with each other), but the fact that you can use common tools to operate on different things.
Is directory and regular file the same?
The answer is : Yes, but also NO
How files are organized is highly depend on the file system, and there are tons of file system
Let’s take EXT2
as an example here
How they can be the same?
They are both byte stream.
We have a directory organization like this in ext2
file system
1 | # left side |
1 | # right side |
We can see that:
1 | getcontent(file4) |
Content of regular file can be anything, characters, video data, binary instructions,…
Content of directory have fixed meaning, since they are created by file system, rather than arbitrary data generated by user. In ext2
, they are
1 | struct{ |
1 | # left |
1 | # right |
How they can be different ?
It is nonsense to read the content of a directory as raw bytes and people always want the files in the directory.
In modern times there are many kinds of filesystems that can be mounted by Linux and other unix-like systems (ext4, ZFS, NTFS!), some of which have complex directory formats. You can’t do anything sensible with the raw bytes of an arbitrary directory. So the kernel has taken on the responsibility of providing a generic interface to directories as abstract objects.
readdir
is the central piece of that interface.
1 |
|
1 | $ ./a.out |
However, we can still operate on the uniform interface provided by the standard library
1 |
|
1 | $ ./a.out |
More about file system on Linux (ext2
as example)
According to Linux documentation project, file system
We can hide files in a directory by mounting a file system at the directory, the files will be recovered after umounting.
The real file systems were separated from the operating system and system services by an interface layer known as the Virtual File system, or VFS.
All of the details of the Linux file systems are translated by software so that all file systems appear identical to the rest of the Linux kernel and to programs running in the system. Linux’s Virtual File system layer allows you to transparently mount the many different file systems at the same time.
Physical Layout of the EXT2 File system
In the EXT2 file system, the
inode
is the basic building block; every file and directory in the file system is described by one and only oneinode
.Inode
structure1
2
3
4
5
6
7class Inode():
def __init__(self):
self.mode
self.ownerInfo
self.size
self.timestamps
self.datablocksSuper block
The Super block contains a description of the basic size and shape of this file system. The information within it allows the file system manager to use and maintain the file system. Usually only the Super block in Block Group 0 is read when the file system is mounted but each Block Group contains a duplicate copy in case of file system corruption.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53$ apt install sleuthkit
$ fsstat file_s.img
FILE SYSTEM INFORMATION
--------------------------------------------
File System Type: Ext2
Volume Name:
Volume ID: bf800aaad8ff14851b4bf9289b25ea9d
Last Written at: 2022-06-29 02:28:03 (EDT)
Last Checked at: 2022-06-29 00:43:35 (EDT)
Last Mounted at: 2022-06-29 02:28:03 (EDT)
Unmounted Improperly
Last mounted on: /root/.cache/test/mountpoint
Source OS: Linux
Dynamic Structure
Compat Features: Ext Attributes, Resize Inode, Dir Index
InCompat Features: Filetype,
Read Only Compat Features: Sparse Super, Large File,
METADATA INFORMATION
--------------------------------------------
Inode Range: 1 - 17
Root Directory: 2
Free Inodes: 0
CONTENT INFORMATION
--------------------------------------------
Block Range: 0 - 99
Block Size: 1024
Reserved Blocks Before Block Groups: 1
Free Blocks: 75
BLOCK GROUP INFORMATION
--------------------------------------------
Number of Block Groups: 1
Inodes per group: 16
Blocks per group: 8192
Group: 0:
Inode Range: 1 - 16
Block Range: 1 - 99
Layout:
Super Block: 1 - 1
Group Descriptor Table: 2 - 2
Data bitmap: 3 - 3
Inode bitmap: 4 - 4
Inode Table: 5 - 6
Data Blocks: 7 - 99
Free Inodes: 0 (0%)
Free Blocks: 75 (75%)
Total Directories: 4Finding a file in
ext2
The first
inode
we need is theinode
for the root of the file system and we find its number in the file system’s super block. (To read anEXT2 inode
we must look for it in theinode
table of the appropriate Block Group)use this
inode
to find the data block which hold subdirectories’ names andinode
numbersuse subdirectory’s
inode
number to find the data block where the file’s name andinode
is holden.use
inode
of the file to find the actual data block
the VFS describes the system’s files in terms of super blocks and
inodes
in much the same way as the EXT2 file system uses super blocks andinodes
.How it related to
file descriptor
As a conclusion, I think file system is just like
device driver
Read more
use debugfs
to recover deleted file
copy inode
metadata with cp -a
Do experiment yourself
1 | mkdir mountpoint |