Rutgers University CS 505 Spring 2006
Computer Structures
Second Examination

Read this question paper carefully. Write your answers in the bluebooks provided. Make sure to write your name on your bluebook. Do not write anything you want graded on the question paper. If you need another bluebook or if you have a question, please come to the front of the classroom. You may keep this question paper when you leave.

  1. Consider the following C code:
    #define N 1000
    void makex (int x[]) {
            int     i, j, t;
    
            for (i=0; i<N; i++)
                    x[i] = i;
            for (i=0; i<N; i++) {
                    j = rand () % N;
                    t = x[i];
                    x[i] = x[j];
                    x[j] = t;
            }
    }
    
    double foo (double A[], double B[]) {
            int     i;
            int     x[N];
    
            makex (x);
            for (i=0; i<N; i++)
                    A[x[i]] += B[x[i]];
    }
    
    Assume that the compiler can prove that the A and B arrays do not overlap in memory in any calling context for foo. Recall that rand() is a C function that returns a pseudo-random integer between 0 and 231-1.
  2. Consider the following C program:
    #include <stdio.h>
    #include <string.h>
    #include <stdlib.h>
    #include <time.h>
    
    #define LOTS	1000000	/* a large number of items */
    #define N	10	/* a number of bits */
    #define DOMAIN	(1<<N)	/* 2 to the Nth power possible integers in N bits */
    
    char A[DOMAIN]; /* A[i] is initially -1, then it is the N-bit parity of i */
    int stuff[LOTS]; /* lots of stuff goes into this array */
    
    /* compute the N-bit parity of x */
    int f2 (int x) {
    	int	i, count = 0;
    	for (i=0; i<N; i++) if (x & (1<<i)) count++;
    	return count & 1;
    }
    /* return the N-bit parity of x either by computing or remembering it */
    int f (int x) {
    	char *p = &A[x];
    	if (*p == -1) *p = f2 (x);
    	return *p;
    }
    int main () {
    	int i, j, t0, t1, sum;
    
    	/* initialize each element of A to -1 */
    	for (i=0; i<DOMAIN; i++) A[i] = -1;
    
    	/* fill the stuff array with LOTS of random N-bit numbers */
    	for (i=0; i<LOTS; i++) stuff[i] = rand () % DOMAIN;
    
    	sum = 0;
    	for (j=0; j<10; j++) 
    		for (i=0; i<LOTS; i++) 
    			sum += f (stuff[i]);
    	printf ("sum = %d\n", sum);
    	exit (0);
    }
    
    This program fills the stuff array with many random values between 0 and 2N-1, then goes through the array 10 times computing the N-bit parity of each of these integers using the f function. Each time it finds the parity of a number, the program adds it to a running total.

    The f function uses the A array to remember previously computed parity values; the first time the parity of an integer x is requested, it is computed with the f2 function and placed into the A array at the xth position. Each subsequent time the parity of x is requested, the value is simply read from the array rather than being computed again. (This is a common technique called memoization.) Let's call this program f.

    An alternate way of running the program to get the same output is to not do the memoization at all, and change the body of the inner loop to sum += f2 (stuff[i]);. This will cause the parity to be recomputed every time, so there will be 10 times as many parity computations compared with the original program. Let's call this new modified program f2.

    We can recompile the program with different values of N to get different behaviors. The following graph is a plot of the amount of time taken by f and f2 for different values of N on a real computer:

  3. Consider an SMT system capable of running 4 threads simultaneously. This system has a single branch predictor that is shared among all the threads. A parallel program runs 2 threads on this system. It has a certain misprediction rate in the branch predictor, i.e., a certain percentage of the conditional branches are predicted incorrectly. Now suppose the program runs the same algorithm but instead uses 4 threads.