Lecture 8: Parallel Computers

So far, we have only considered single-processor systems. To the user, it appears that only one instruction at a time is being executed, although we know that the microarchitecture can exploit some instruction-level parallelism to speed things up.

However, there is another model of computation that has been around for almost as long as computing itself. Parallel computers allow the user to explicitly manage data and instructions on multiple processing elements. Many workloads exhibit large amounts of parallelism at larger granularities than the instruction-level.

Today, we will see:

Parallel Computing

Why should we consider parallel computing as opposed to building better single-processor systems?

Parallel Software Models

There are two main parallel software models, i.e., two ways the multiple processing elements (nodes) are represented to the programmer. They differ in how they communicate data and how they synchronize. Synchronization is essentially how processors communicate control flow among nodes.

Message Passing Multicomputers
and Cluster Computers

This kind of architecture is a collection of computers (nodes) connected by a network. The processors may all be on the same motherboard, or each on different motherboards connected by some communication technology, or some of each.

Shared-Memory Multiprocessors

These systems have many processors but present a single, coherent address space to the threads.

Cache Coherence

Cache coherence is an issue with shared memory multiprocessors. Although the system conceptually uses a large shared memory, we know what is really going on behind the scenes: each processor has its own cache.

Granularity and Cost-Effectiveness of Parallel Computers

Future of Multiprocessing

Moore's Law gives us billions of transistors on a die, but with relatively slow wires. How can we build a computer out of these components?

Single-chip Multiprocessors

A quote from Andy Glew, a noted microarchitect:
"It seems to me that CPU groups fall back to explicit parallelism when they have run out of ideas for improving uniprocessor performance. If your workload has parallelism, great; even if it doesn't currently have parallelism, sometimes occasionally it is easy to write multithreaded code than single threaded code. But, if your workload doesn't have enough natural parallelism, it is far too easy to persuade yourself that software should be rewritten to expose more parallelism... because explicit parallelism is easy to microarchitect for."

Processors with DRAM (PIM)

Reconfigurable Processors