2021-09-29

csapp

cachelab

Cache lab how to

Part A Build a Cache Simulator

It is not an acutal cache, it doesn’t store any data, it only simulate the working flow of the real cache

use getopt() in unistd.h to get parse arguments
fscanf is also very useful in this part
use malloc and free to manage memory

Part B Matrix Transpose

Use blocking techique, dividing matrix into submatrix

blocking works or not has a lot to do with the cache size !!!

The cache block size will decide the block size

Transpose example

blocking example

Multiplication example

Bad case(no blocking):

multiplication example

secong iter

Read the writeup

1	$ tar xvf cachelab-handout.tar

You will be modifying two files: csim.c and trans.c.

Reference Trace Files

The traces subdirectory of the handout directory contains a collection of reference trace files that we will use to evaluate the correctness of the cache simulator you write in Part A.

The trace files are generated by a Linux program called valgrind

Output format : [space]operation address,size

The operation field denotes the type of memory access: I denotes an instruction load, L a data load, S a data store, and M a data modify (i.e., a data load followed by a data store). There is never a space before each I. There is always a space before each M, L, and S

Part A: Writing a Cache Simulator

In Part A you will write a cache simulator in csim.c that takes a valgrind memory trace as input, simulates the hit/miss behavior of a cache memory on this trace, and outputs the total number of hits, misses, and evictions.

Your job for Part A is to fill in the csim.c file so that it takes the same command line arguments and produces the identical output as the reference simulator. Notice that this file is almost completely empty. You’ll need to write it from scratch.
For this lab, we are interested only in data cache performance, so your simulator should ignore all instruction cache accesses (lines starting with “I”). Recall that valgrind always puts “I” in the first column (with no preceding space), and “M”, “L”, and “S” in the second column (with a preceding space). This may help you parse the trace.
To receive credit for Part A, you must call the function printSummary, with the total number of hits, misses, and evictions, at the end of your main function:

printSummary(hit_count, miss_count, eviction_count);
Your csim.c file must compile without warnings in order to receive credit.

Your simulator must work correctly for arbitrary s, E, and b. This means that you will need to allocate storage for your simulator’s data structures using the malloc function. Type “man malloc” for information about this function.
For this lab, we are interested only in data cache performance, so your simulator should ignore all instruction cache accesses (lines starting with “I”). Recall that valgrind always puts “I” in the first column (with no preceding space), and “M”, “L”, and “S” in the second column (with a preceding space). This may help you parse the trace.
To receive credit for Part A, you must call the function printSummary with the total number of hits, misses, and evictions, at the end of your main function:printSummary(hit_count, miss_count, eviction_count);
For this this lab, you should assume that memory accesses are aligned properly, such that a single memory access never crosses block boundaries. By making this assumption, you can ignore the request sizes in the valgrind traces.

Part B: Optimizing Matrix Transpose

In Part B you will write a transpose function in trans.c that causes as few cache misses as possible.

You are allowed to define at most 12 local variables of type int per transpose function.
Your code only needs to be correct for these three cases(32×32,64×64,61×67) and you can optimize it specifically for these three cases. In particular, it is perfectly OK for your function to explicitly check for the input sizes and implement separate code optimized for each case.
The test-trans program saves the trace for function i in file trace.fi.They are invaluable debugging tools that can help you understand exactly where the hits and misses for each transpose function are coming from. To debug a particular function, simply run its trace through the reference simulator with the verbose option:

./csim-ref -v -s 5 -E 1 -b 5 -t trace.f0
Since your transpose function is being evaluated on a direct-mapped cache, conflict misses are a potential problem

Evaluation

Part A

$ ./csim -s 1 -E 1 -b 1 -t traces/yi2.trace
$ ./csim -s 4 -E 2 -b 4 -t traces/yi.trace
$ ./csim -s 2 -E 1 -b 4 -t traces/dave.trace
$ ./csim -s 2 -E 1 -b 3 -t traces/trans.trace
$ ./csim -s 2 -E 2 -b 3 -t traces/trans.trace
$ ./csim -s 2 -E 4 -b 3 -t traces/trans.trace 
$ ./csim -s 5 -E 1 -b 5 -t traces/trans.trace
$ ./csim -s 5 -E 1 -b 5 -t traces/long.trace

1 2	$ make $ ./test-csim

Part B

/* Header comment */
char trans_simple_desc[] = "A simple transpose";
void trans_simple(int M, int N, int A[N][M], int B[M][N]) {
/* your transpose code here */ }
registerTransFunction(trans_simple, trans_simple_desc);
registerTransFunction(transpose_submit, transpose_submit_desc);

1 2	$ make $ ./test-trans -M 32 -N 32

All in one

1	$ ./driver.py

Tips

//headers for getopt
#include <getopt.h> 
#include <stdlib.h> 
#include <unistd.h>

Since your transpose function is being evaluated on a direct-mapped cache, conflict misses are a potential problem. Think about the potential for conflict misses in your code, especially along the diagonal. Try to think of access patterns that will decrease the number of these conflict misses.

My solution

Part A

When manipulating values in memory,one should Never create local variables !!! Create Pointers !!!

Debug python script

from termcolor import cprint
s = 2
E = 4
b = 3

T = 64-S-B

addresses = [0x00600aa0,   0x7ff000398,   0x7ff000390,  0x7ff000378,  0x7ff000370,  0x7ff000384,  0x7ff000384,  0x7ff000388,  0x7ff000388,  0x7ff000384 ]

def printf(address):
    string = '{:064b}'.format(address)
    cprint(string[:T] , 'blue' , end='')
    cprint(string[T:64-b] , 'red' , end='')
    cprint(string[64-b:], 'green')

for a in addresses:
    printf(a)

Print array in `gdb`

If source code is avaliable and you are debugging programing works, vanilla gdb is better than pwndbg

1	$ p *arrayname@arraylen

Source code

#include "cachelab.h"

#include <stdio.h>
#include <stdlib.h>
#include <getopt.h>
#include <unistd.h>
#include <stdbool.h>

typedef struct Line {
    bool HaveValue ; 
    long Tbits ;
    int age ;
} line ;

void helpinfo();
void init(line* sets, int num);
bool evictReplace(line* set , long tbits, int num);


static bool verbose = false;

int main(int argc , char * argv [])
{
    int s = -1;
    int E = -1;
    int b = -1;
    FILE * t = NULL ;
    int opt ;

    int hits = 0;
    int misses = 0;
    int evictions = 0;

    /* parse command line */
    while( (opt = getopt(argc , argv, "s:E:b:t:hv")) != -1){
        switch(opt){
        case 'h': {helpinfo();exit(0);}
        case 'v': { verbose = true ; break;}
        case 's': { int res = atoi(optarg) ; if(res) s = res ; break ;}
        case 'E': { int res = atoi(optarg) ; if(res) E = res ; break ;}
        case 'b': { int res = atoi(optarg) ; if(res) b = res ; break ;}
        case 't': { t = fopen(optarg, "rt"); if(t == NULL){fprintf(stderr, "Cannot open file.\n");exit(0);}}
        }
    }

    if (s == -1 || E == -1 || b == -1 || t == NULL){
        fprintf(stderr, "%s: Missing required command line argument\n" , argv[0]);
        helpinfo();
        exit(0);
    }

    /* malloc memory space */
    int linenumber = (1<<s) * E ; 
    line * sets = (line*)malloc(sizeof(line) * linenumber);
    init(sets , linenumber);

    /* process input */

    int tbitsnum =  64 - s - b;
    long setmask = (long)((unsigned long)((1l<<63)>> (s-1)) >> tbitsnum);
    long tmask = (1l<<63) >> (tbitsnum - 1);

    int size = -1; 
    char op  = -1;
    long address = -1;

    while( fscanf(t, " %c %lx,%d" , &op , &address , &size) != EOF){
        if(op == 'I') continue; //skip instructions
    
        /* cache match */
            long offset = ( address & setmask ) >> b ;
            long currentTbits = (address & tmask) >> (b+s);
            line* set = sets + offset * E;


            //age increase
            for (int i = 0; i < E ; i++){ if (set[i].HaveValue) set[i].age++; }

            // real cache do comparison simultaneously instead of looping
            bool flag = false;

            for (int i = 0; i < E ; i++){
                line * target = set+i;
                if (target->HaveValue == true && target->Tbits == currentTbits){
                    flag = true;
                    target->age = 0;
                    break;
                }
            }
        if (verbose) printf("%c %lx,%d" , op , address , size);

        /* match result handling */
            if(flag){
                switch(op){
                    case 'L':
                    case 'S':{ hits++; if (verbose){ printf(" hit\n");}break; }
                    case 'M':{ hits+=2 ; if (verbose){ printf(" hit hit\n");}}
                }
            }
            else{
                misses++;
                if (verbose) printf(" miss");
                bool evicted = evictReplace(set , currentTbits , E);
                if (evicted){ evictions++;}
                switch(op){
                    case 'L':
                    case 'S':{if (verbose){ printf("\n");}break; }
                    case 'M':{hits++; if (verbose){ printf(" hit\n"); }}
                }
            }

    }
    //end of loop

    /* tidy up and print the result */
    fclose(t);
    free(sets);
    printSummary(hits, misses, evictions);
    return 0;
}

void helpinfo(){
    printf("Usage: ./csim [-hv] -s <num> -E <num> -b <num> -t <file>\n");
    printf("Options:\n");
    printf("  -h         Print this help message.\n");
    printf("  -v         Optional verbose flag.\n");
    printf("  -s <num>   Number of set index bits.\n");
    printf("  -E <num>   Number of lines per set.\n");
    printf("  -b <num>   Number of block offset bits.\n");
    printf("  -t <file>  Trace file.\n");
    printf("\n");
    printf("Examples:\n");
    printf("  linux>  ./csim -s 4 -E 1 -b 4 -t traces/yi.trace\n");
    printf("  linux>  ./csim -v -s 8 -E 2 -b 4 -t traces/yi.trace\n");
}

void init(line * sets , int num){
   for (int i = 0; i < num ; i++){
        sets[i].HaveValue = false ;
        sets[i].age = 0;
   }
}


bool evictReplace(line * set , long tbits , int num){
    bool flag = true;
    int oldestage = -1;
    line* LRUp  = NULL ;
    for (int i = 0; i < num ; i++){
        line* this = set+i;
        if (this->HaveValue == false){
            this->Tbits = tbits;
            this->HaveValue = true;
            flag = false;
            break;
        }

        if (oldestage < this->age ){
            oldestage = this->age;
            LRUp = this;
        }
    }

    if(flag){
        LRUp->Tbits = tbits;
        LRUp->age = 0;
        if (verbose) printf(" eviction");
    }

    return flag;
}

Evaluation

$ make clean ; make ;
$ ./test-csim 
                        Your simulator     Reference simulator
Points (s,E,b)    Hits  Misses  Evicts    Hits  Misses  Evicts
     3 (1,1,1)       9       8       6       9       8       6  traces/yi2.trace
     3 (4,2,4)       4       5       2       4       5       2  traces/yi.trace
     3 (2,1,4)       2       3       1       2       3       1  traces/dave.trace
     3 (2,1,3)     167      71      67     167      71      67  traces/trans.trace
     3 (2,2,3)     201      37      29     201      37      29  traces/trans.trace
     3 (2,4,3)     212      26      10     212      26      10  traces/trans.trace
     3 (5,1,5)     231       7       0     231       7       0  traces/trans.trace
     6 (5,1,5)  265189   21775   21743  265189   21775   21743  traces/long.trace
    27

TEST_CSIM_RESULTS=27

Cache lab how to

Part A Build a Cache Simulator

Part B Matrix Transpose

Transpose example

Multiplication example

Read the writeup

Reference Trace Files

Part A: Writing a Cache Simulator

Part B: Optimizing Matrix Transpose

Evaluation

Part A

Part B

All in one

Tips

My solution

Part A

Debug python script

Print array in gdb

Source code

Evaluation

Print array in `gdb`