There seems to be a consensus that we are at an important inflection point in computing because of the emerging trend towards multicores on a chip near term and manycores on a chip in a not so distant future. If you are writing software, you need to have a solid understanding of what this means, or you’d be left behind.
If you are a user of SQL Server, the bulk of the burden is on the Microsoft SQL product team to fully take advantage of multicores and manycores, and you can more or less just kick back and enjoy the results. But I’d contend that it would still pay off handsomely for you to develop an understanding and get yourself oriented.
One way to accomplish that without getting too deep into all the details is to see how conventional wisdoms are being affected, as is often the case with computing. Fortunately, a group of prominent researchers at UC Berkeley have already compiled a fairly comprehensive list in a report. I encourage you to read the full report. But for your convenience, I’ve reproduced these conventional wisdoms below.
The Berkeley group lists the affected conventional wisdoms in pairs with an old conventional wisdom followed by the new conventional wisdom to illustrate the profound changes that are happening.
- Old: Power is free, but transistors are expensive.
New is the “Power wall”: Power is expensive, but transistors are “free”. That is, we can put more transistors on a chip than we have the power to turn on.
- Old: If you worry about power, the only concern is dynamic power.
New: For desktops and servers, static power due to leakage can be 40% of total power.
- Old: Monolithic uniprocessors in silicon are reliable internally, with errors occurring only at the pins.
New: As chips drop below 65 nm feature sizes, they will have high soft and hard error rates.
- Old: By building upon prior successes, we can continue to raise the level of abstraction and hence the size of hardware designs.
New: Wire delay, noise, cross coupling (capacitive and inductive), manufacturing variability, reliability (see above), clock jitter, design validation, and so on conspire to stretch the development time and cost of large designs at 65nm or smaller feature sizes.
- Old: Researchers demonstrate new architecture ideas by building chips.
New: The cost of masks at 65 nm feature size, the cost of Electronic Computer Aided Design software to design such chips, and the cost of design for GHz clock rates means researchers can no longer build believable prototypes. Thus, an alternative approach to evaluating architectures must be developed.
- Old: Performance improvements yield both lower latency and higher bandwidth.
New: Across many technologies, bandwidth improves by at least the square of the improvement in latency.
- Old: Multiply is slow, but load and store is fast.
New is the “Memory wall”: Load and store is slow, but multiply is fast. Modern microprocessors can take 200 clocks to access Dynamic Random Access Memory (DRAM), but even floating-point multiplies may take only four clock cycles.
- Old: We can reveal more instruction-level parallelism (ILP) via compilers and architecture innovation. Examples from the past include branch prediction, out-of-order execution, speculation, and Very Long Instruction Word systems.
New is the “ILP wall”: There are diminishing returns on finding more ILP.
- Old: Uniprocessor performance doubles every 18 months.
New is Power Wall + Memory Wall + ILP Wall = Brick Wall. In 2006, performance is a factor of three below the traditional doubling every 18 months that we enjoyed between 1986 and 2002. The doubling of uniprocessor performance may now take 5 years.
- Old: Don’t bother parallelizing your application, as you can just wait a little while and run it on a much faster sequential computer.
New: It will be a very long wait for a faster sequential computer.
- Old: Increasing clock frequency is the primary method of improving processor performance.
New: Increasing parallelism is the primary method of improving processor performance.
- Old: Less than linear scaling for a multiprocessor application is failure.
New: Given the switch to parallel computing, any speedup via parallelism is a success.