cachelab

Cache lab how to

Part A Build a Cache Simulator

It is not an acutal cache, it doesn’t store any data, it only simulate the working flow of the real cache

  • use getopt() in unistd.h to get parse arguments
  • fscanf is also very useful in this part
  • use malloc and free to manage memory

Part B Matrix Transpose

Use blocking techique, dividing matrix into submatrix

blocking works or not has a lot to do with the cache size !!!

The cache block size will decide the block size

Transpose example

blocking example

Multiplication example

Bad case(no blocking):

multiplication example

secong iter

Read the writeup

1
$ tar xvf cachelab-handout.tar

You will be modifying two files: csim.c and trans.c.

Reference Trace Files

The traces subdirectory of the handout directory contains a collection of reference trace files that we will use to evaluate the correctness of the cache simulator you write in Part A.

The trace files are generated by a Linux program called valgrind

Output format : [space]operation address,size

The operation field denotes the type of memory access: I denotes an instruction load, L a data load, S a data store, and M a data modify (i.e., a data load followed by a data store). There is never a space before each I. There is always a space before each M, L, and S

Part A: Writing a Cache Simulator

In Part A you will write a cache simulator in csim.c that takes a valgrind memory trace as input, simulates the hit/miss behavior of a cache memory on this trace, and outputs the total number of hits, misses, and evictions.

  • Your job for Part A is to fill in the csim.c file so that it takes the same command line arguments and produces the identical output as the reference simulator. Notice that this file is almost completely empty. You’ll need to write it from scratch.

  • For this lab, we are interested only in data cache performance, so your simulator should ignore all instruction cache accesses (lines starting with “I”). Recall that valgrind always puts “I” in the first column (with no preceding space), and “M”, “L”, and “S” in the second column (with a preceding space). This may help you parse the trace.

  • To receive credit for Part A, you must call the function printSummary, with the total number of hits, misses, and evictions, at the end of your main function:

    printSummary(hit_count, miss_count, eviction_count);

  • Your csim.c file must compile without warnings in order to receive credit.

    Your simulator must work correctly for arbitrary s, E, and b. This means that you will need to allocate storage for your simulator’s data structures using the malloc function. Type “man malloc” for information about this function.

  • For this lab, we are interested only in data cache performance, so your simulator should ignore all instruction cache accesses (lines starting with “I”). Recall that valgrind always puts “I” in the first column (with no preceding space), and “M”, “L”, and “S” in the second column (with a preceding space). This may help you parse the trace.

  • To receive credit for Part A, you must call the function printSummary with the total number of hits, misses, and evictions, at the end of your main function:printSummary(hit_count, miss_count, eviction_count);

  • For this this lab, you should assume that memory accesses are aligned properly, such that a single memory access never crosses block boundaries. By making this assumption, you can ignore the request sizes in the valgrind traces.

Part B: Optimizing Matrix Transpose

In Part B you will write a transpose function in trans.c that causes as few cache misses as possible.

  • You are allowed to define at most 12 local variables of type int per transpose function.

  • Your code only needs to be correct for these three cases(32×32,64×64,61×67) and you can optimize it specifically for these three cases. In particular, it is perfectly OK for your function to explicitly check for the input sizes and implement separate code optimized for each case.

  • The test-trans program saves the trace for function i in file trace.fi.They are invaluable debugging tools that can help you understand exactly where the hits and misses for each transpose function are coming from. To debug a particular function, simply run its trace through the reference simulator with the verbose option:

    ./csim-ref -v -s 5 -E 1 -b 5 -t trace.f0

  • Since your transpose function is being evaluated on a direct-mapped cache, conflict misses are a potential problem

Evaluation

Part A

1
2
3
4
5
6
7
8
$ ./csim -s 1 -E 1 -b 1 -t traces/yi2.trace
$ ./csim -s 4 -E 2 -b 4 -t traces/yi.trace
$ ./csim -s 2 -E 1 -b 4 -t traces/dave.trace
$ ./csim -s 2 -E 1 -b 3 -t traces/trans.trace
$ ./csim -s 2 -E 2 -b 3 -t traces/trans.trace
$ ./csim -s 2 -E 4 -b 3 -t traces/trans.trace
$ ./csim -s 5 -E 1 -b 5 -t traces/trans.trace
$ ./csim -s 5 -E 1 -b 5 -t traces/long.trace
1
2
$ make
$ ./test-csim

Part B

1
2
3
4
5
6
/* Header comment */
char trans_simple_desc[] = "A simple transpose";
void trans_simple(int M, int N, int A[N][M], int B[M][N]) {
/* your transpose code here */ }
registerTransFunction(trans_simple, trans_simple_desc);
registerTransFunction(transpose_submit, transpose_submit_desc);
1
2
$ make
$ ./test-trans -M 32 -N 32

All in one

1
$ ./driver.py

Tips

1
2
3
4
//headers for getopt
#include <getopt.h>
#include <stdlib.h>
#include <unistd.h>

Since your transpose function is being evaluated on a direct-mapped cache, conflict misses are a potential problem. Think about the potential for conflict misses in your code, especially along the diagonal. Try to think of access patterns that will decrease the number of these conflict misses.

My solution

Part A

When manipulating values in memory,one should Never create local variables !!! Create Pointers !!!

Debug python script

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
from termcolor import cprint
s = 2
E = 4
b = 3

T = 64-S-B

addresses = [0x00600aa0, 0x7ff000398, 0x7ff000390, 0x7ff000378, 0x7ff000370, 0x7ff000384, 0x7ff000384, 0x7ff000388, 0x7ff000388, 0x7ff000384 ]

def printf(address):
string = '{:064b}'.format(address)
cprint(string[:T] , 'blue' , end='')
cprint(string[T:64-b] , 'red' , end='')
cprint(string[64-b:], 'green')

for a in addresses:
printf(a)

If source code is avaliable and you are debugging programing works, vanilla gdb is better than pwndbg

1
$ p *arrayname@arraylen

Source code

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
#include "cachelab.h"

#include <stdio.h>
#include <stdlib.h>
#include <getopt.h>
#include <unistd.h>
#include <stdbool.h>

typedef struct Line {
bool HaveValue ;
long Tbits ;
int age ;
} line ;

void helpinfo();
void init(line* sets, int num);
bool evictReplace(line* set , long tbits, int num);


static bool verbose = false;

int main(int argc , char * argv [])
{
int s = -1;
int E = -1;
int b = -1;
FILE * t = NULL ;
int opt ;

int hits = 0;
int misses = 0;
int evictions = 0;

/* parse command line */
while( (opt = getopt(argc , argv, "s:E:b:t:hv")) != -1){
switch(opt){
case 'h': {helpinfo();exit(0);}
case 'v': { verbose = true ; break;}
case 's': { int res = atoi(optarg) ; if(res) s = res ; break ;}
case 'E': { int res = atoi(optarg) ; if(res) E = res ; break ;}
case 'b': { int res = atoi(optarg) ; if(res) b = res ; break ;}
case 't': { t = fopen(optarg, "rt"); if(t == NULL){fprintf(stderr, "Cannot open file.\n");exit(0);}}
}
}

if (s == -1 || E == -1 || b == -1 || t == NULL){
fprintf(stderr, "%s: Missing required command line argument\n" , argv[0]);
helpinfo();
exit(0);
}

/* malloc memory space */
int linenumber = (1<<s) * E ;
line * sets = (line*)malloc(sizeof(line) * linenumber);
init(sets , linenumber);

/* process input */

int tbitsnum = 64 - s - b;
long setmask = (long)((unsigned long)((1l<<63)>> (s-1)) >> tbitsnum);
long tmask = (1l<<63) >> (tbitsnum - 1);

int size = -1;
char op = -1;
long address = -1;

while( fscanf(t, " %c %lx,%d" , &op , &address , &size) != EOF){
if(op == 'I') continue; //skip instructions

/* cache match */
long offset = ( address & setmask ) >> b ;
long currentTbits = (address & tmask) >> (b+s);
line* set = sets + offset * E;


//age increase
for (int i = 0; i < E ; i++){ if (set[i].HaveValue) set[i].age++; }

// real cache do comparison simultaneously instead of looping
bool flag = false;

for (int i = 0; i < E ; i++){
line * target = set+i;
if (target->HaveValue == true && target->Tbits == currentTbits){
flag = true;
target->age = 0;
break;
}
}
if (verbose) printf("%c %lx,%d" , op , address , size);

/* match result handling */
if(flag){
switch(op){
case 'L':
case 'S':{ hits++; if (verbose){ printf(" hit\n");}break; }
case 'M':{ hits+=2 ; if (verbose){ printf(" hit hit\n");}}
}
}
else{
misses++;
if (verbose) printf(" miss");
bool evicted = evictReplace(set , currentTbits , E);
if (evicted){ evictions++;}
switch(op){
case 'L':
case 'S':{if (verbose){ printf("\n");}break; }
case 'M':{hits++; if (verbose){ printf(" hit\n"); }}
}
}

}
//end of loop

/* tidy up and print the result */
fclose(t);
free(sets);
printSummary(hits, misses, evictions);
return 0;
}

void helpinfo(){
printf("Usage: ./csim [-hv] -s <num> -E <num> -b <num> -t <file>\n");
printf("Options:\n");
printf(" -h Print this help message.\n");
printf(" -v Optional verbose flag.\n");
printf(" -s <num> Number of set index bits.\n");
printf(" -E <num> Number of lines per set.\n");
printf(" -b <num> Number of block offset bits.\n");
printf(" -t <file> Trace file.\n");
printf("\n");
printf("Examples:\n");
printf(" linux> ./csim -s 4 -E 1 -b 4 -t traces/yi.trace\n");
printf(" linux> ./csim -v -s 8 -E 2 -b 4 -t traces/yi.trace\n");
}

void init(line * sets , int num){
for (int i = 0; i < num ; i++){
sets[i].HaveValue = false ;
sets[i].age = 0;
}
}


bool evictReplace(line * set , long tbits , int num){
bool flag = true;
int oldestage = -1;
line* LRUp = NULL ;
for (int i = 0; i < num ; i++){
line* this = set+i;
if (this->HaveValue == false){
this->Tbits = tbits;
this->HaveValue = true;
flag = false;
break;
}

if (oldestage < this->age ){
oldestage = this->age;
LRUp = this;
}
}

if(flag){
LRUp->Tbits = tbits;
LRUp->age = 0;
if (verbose) printf(" eviction");
}

return flag;
}

Evaluation

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
$ make clean ; make ;
$ ./test-csim
Your simulator Reference simulator
Points (s,E,b) Hits Misses Evicts Hits Misses Evicts
3 (1,1,1) 9 8 6 9 8 6 traces/yi2.trace
3 (4,2,4) 4 5 2 4 5 2 traces/yi.trace
3 (2,1,4) 2 3 1 2 3 1 traces/dave.trace
3 (2,1,3) 167 71 67 167 71 67 traces/trans.trace
3 (2,2,3) 201 37 29 201 37 29 traces/trans.trace
3 (2,4,3) 212 26 10 212 26 10 traces/trans.trace
3 (5,1,5) 231 7 0 231 7 0 traces/trans.trace
6 (5,1,5) 265189 21775 21743 265189 21775 21743 traces/long.trace
27

TEST_CSIM_RESULTS=27