Study Guide 6

Exams are closed book. One written cheat sheet.

Topics:

Everything from Study Guide 5, plus

  • Intermediate threads

  • Memory hierarchy, primary and secondary storage, caches, temporal and spatial locality

  • Advanced pointers, pointer arithmetic, void*, function pointers, pass by pointer

  • Implementations of malloc and free. sbrk()

  • Code optimization best practices

Practice questions

Threads

1) Short Answers

  • What is a mutex? When do we need to use mutex?

  • What is a barrier? When do we need to use barrier?

  • What is a race condition?

  • What is deadlock?

  • What does it mean for a function to be thread safe?

2) The following multi-threaded program simulates two bank accounts. But it’s not working correctly. What is going on and how can we fix it?

struct account {
  pthread_mutex_t lock;
  int balance;
};

void *Transfer(void *args){
  struct account** accounts = (struct account**) args;
  struct account* fromAcct = accounts[FROM];
  struct account* toAcct = accounts[TO];

  for (int i = 0; i < 10000; i++) {
    pthread_mutex_lock(&fromAcct->lock);
    pthread_mutex_lock(&toAcct->lock);

    fromAcct->balance -= 25;
    toAcct->balance += 25;

    pthread_mutex_unlock(&fromAcct->lock);
    pthread_mutex_unlock(&toAcct->lock);
  }

  return NULL;
}

int main() {

  // create accounts for A and B
  struct account accountA;
  struct account accountB;
  accountA.balance = 100;
  accountB.balance = 50;
  pthread_mutex_init(&accountA.lock, NULL);
  pthread_mutex_init(&accountB.lock, NULL);
  printf("Starting balances:\n");
  printf("\tA: %d\n", accountA.balance);
  printf("\tB: %d\n", accountB.balance);

  // create threads
  struct account* transformA2B[2];
  transformA2B[0] = &accountA;
  transformA2B[1] = &accountB;

  struct account* transformB2A[2];
  transformB2A[0] = &accountB;
  transformB2A[1] = &accountA;

  pthread_t A2B, B2A;
  pthread_create(&A2B, NULL, Transfer, &transformA2B);
  pthread_create(&B2A, NULL, Transfer, &transformB2A);

  // wait for threads to finish
  pthread_join(A2B, NULL);
  pthread_join(B2A, NULL);

  printf("Ending balances:\n");
  printf("\tA: %d\n", accountA.balance);
  printf("\tB: %d\n", accountB.balance);

  // cleanup
  pthread_mutex_destroy(&accountA.lock);
  pthread_mutex_destroy(&accountB.lock);

  return 0;
}

3) The following multi-threaded program attempts to count the numbers in a list.

  • Which data is shared between threads?

  • Which data is only accessed by a single thread?

  • Which data is shared only between main and a single thread?

  • When the list size is greater than 1000 the program isn’t working correctly. What is going on and how can we fix it?

#define SIZE 100000

static unsigned long long count = 0;
struct thread_data {
  int start_index;
  int end_index;
  int* array;
};

void *countOver1000(void *userdata) {
  struct thread_data *data = (struct thread_data *) userdata;

  for (int i = data->start_index; i < data->end_index; i++) {
    if (data->array[i] > 1000) {
      count++;
    }
  }
  return (void*) NULL;
}

int main(int argc, char *argv[]) {
  srand(time(0));

  int values[SIZE];
  unsigned long long test = 0;
  for (int i = 0; i < SIZE; i++) {
    values[i] = rand() % SIZE;
    if (values[i] > 1000) test++;
  }

  printf("Test with 4 threads\n");
  pthread_t threads[4];
  struct thread_data data[4];
  int subsize = SIZE/4; // assume multiple of 4
  for (int i = 0; i < 4; i++) {
    data[i].array = values;
    data[i].start_index = subsize*i;
    data[i].end_index = subsize*(i+1);
    pthread_create(&threads[i], NULL, countOver1000, (void*) &data[i]);
  }

  for (int i = 0; i < 4; i++) {
    pthread_join(threads[i], NULL);
  }

  printf("Answer with threads: %llu\n", count);
  printf("Correct answer: %llu\n", test);

  return 0;
}

4) Suppose a serial program takes 10 seconds to run but only 2 seconds with 4 cores. What is the speed up?

 

5) Suppose a serial program takes 10 seconds to run but only 2 seconds with 4 cores. What is the efficiency?

 

6) Ahmdel’s Law. Suppose a program is 90% parallelizable and the serial version of the program (e.g. the one with one thread) takes 10 seconds to run.

  • What is the speedup if we have 2 cores?

 

  • What is the speedup if we have 5 cores?

 

  • What is the theoretical best case speedup if we have infinitely many cores?

 

Advanced pointers

1) Draw a stack diagram for the following program. Show all intermediate values.

#include <stdio.h>

int main() {
  int vals[4] = {1,2,3,4};

  printf("-------\n");
  for (int* ptr = vals; ptr < vals+4; ptr++) {
    printf("%p %d\n", ptr, *ptr);
  }

  return 0;
}

2) Fix the error in the following program. Then draw the function stack and heap for the working program.

#include <stdio.h>

int main() {
  void *gen_ptr;
  int x = 3;
  char ch = 'a';

  gen_ptr = &x;  // gen_ptr can be assigned the address of an int
  gen_ptr = &ch; // or the address of a char (or the address of any type)

  int* int_ptr;
  int_ptr = &x;

  printf("The int value is %d\n", *int_ptr);
  printf("The char value is %c\n", *gen_ptr);
  return 0;
}

3) Fix the error in the following program. Then draw the function stack and heap for the working program.

#include <stdio.h>

int main() {
  int* value = NULL;
  printf("value is %d\n", *value);

  int a = 4;
  value = &a;
  printf("value is %d\n", *value);
}

4) Draw the stack and heap diagram for the following program.

#include <stdio.h>
#include <string.h>
#include <stdlib.h>

struct headerT {
  int a;
  int b;
};

struct studentT {
  char name[64];
  int age;
  float gpa;
};

int main() {

  void* data = malloc(sizeof(struct headerT) + 2 * sizeof(struct studentT));
  struct headerT* header = (struct headerT*) data;
  header->a = 5;
  header->b = 7;

  struct studentT* student = (struct studentT*) (header + 1);
  strncpy(student->name, "Bill", 4);
  student->age = 18;
  student->gpa = 3.3;

  student++;
  strncpy(student->name, "Carol", 5);
  student->age = 22;
  student->gpa = 3.6;

  free(data);
  // draw the stack and heap here
  return 0;
}

Memory

1) Short answers

  • What is the memory hierarchy?

  • What are some examples of primary storage?

  • What are some examples of secondary storage?

  • What is locality and what impact does it have on performance?

  • What is sbrk()? Why do we use functions such as malloc and free, rather than call sbrk() directly from our programs?

2) In the following code, which variables have temporal locality? spatial locality?

#include <stdio.h>

int main() {
    int i, size = 0;

    // declare array of 10 ints
    int my_arr[10];

    // set the value of each array element
    for (i = 0; i < 10; i++) {
        my_arr[i] = i;
        size++;
    }

    // set value at position 3 to 100
    my_arr[3] = 100;

    // print the number of array elements
    printf("array of %d items:\n", size);

    // print each element of the array
    for (i = 0; i < 10; i++) {
        printf("%d\n", my_arr[i]);
    }

    return 0;
}

3) Is the following program guaranteed to have good spatial locality? Why or why not?

#include <stdio.h>
#include <stdlib.h>

void init_matrix(int** m, int rows, int cols) {
  int i, j;
  for (i = 0; i < rows; i++) {  // for each row i
    for (j = 0; j < cols; j++) { // iterate over each column in row i
        m[i][j] = 0;
    }
  }
}

int main() {
  int nrows = 50;
  int ncols = 100;

  int** matrix = malloc(sizeof(int*) * nrows);
  for (int i = 0; i < nrows; i++) {
    matrix[i] = malloc(sizeof(int) * ncols);
  }

  init_matrix(matrix, nrows, ncols);

  for (int i = 0; i < nrows; i++) {
    free(matrix[i]);
  }
  free(matrix);
  matrix = NULL;
  return 0;
}

4) Does the following code efficiently access memory? Why or why not?

int array[N][N];
...
int sum = 0;
for (int i = 0; i < N; i++) {
  for (int j = 0; j < N; j++) {
    sum += array[j][i];
  }
}

Code optimization

Short answers

  • In what ways can a program be optimized?

  • What is constant folding?

  • What is function inlining?

  • What is constant propagation?

  • What is dead code elimination?

  • What is loop unrolling?

  • What is memory aliasing?

1) How the following code would change after constant folding?

#define N 5
int test1 = N - 4;
int test2 = N + 4;

2) Can the compiler optimize the following code? If no, how can we make the following code more efficient?

char text[64];
scanf(" %s", text);
int sum = 0;
for (int i = 0; i < strlen(text); i++) {
  sum += text[i];
}

3) How would the following code change after constant propagation and dead code removed?

int debug = 0;

//sums up all the elements in an array
int doubleSum(int *array, int length){
    int i, total = 0;
    for (i = 0; i < length; i++){
        total += array[i];
        if (debug) {
            printf("array[%d] is: %d\n", i, array[i]);
        }
    }
    return 2 * total;
}

4) How would the following code change after unrolling the loop?

void loopDeLoo(float* pos, float scale){
    int i;
    for (i = 0; i < 3; i++){
      pos[i] = scale * pos[i];
    }
}

5) Why can’t the following code be optimized by the compiler?

void shiftAdd(int *a, int *b){
  *a = *a * 10; //multiply by 10
  *a += *b; //add b
}

6) What can’t the following code be optimized by the compiler?

int main() {
  char test[32];
  strcpy(test, "hello");
  for (int i = 0; i < strlen(test); i++) {
    printf("%c\n", test[i]);
    if (i == 0) {
      strcpy(test, "hello!");
    }
  }
}