High-Performance Asynchronous Pipelines

ONE OF THE FOUNDATIONS of high-performance digital system design is the use of pipelining. In synchronous systems, for several decades, pipelining has been the fundamental technique used to increase
parallelism and hence boost system throughput  whether for high-performance processors, multimedia
and graphics units, or signal processors. This article provides an overview of pipelining in asynchronous, or clockless, digital systems. We do not attempt an exhaustive coverage, but rather introduce the basics of several leading representative styles. These pipelines naturally fall into two classes: those that use static logic versus those that use dynamic logic for the data path. Each class tends to use a distinct approach for its control and data storage. For static logic, we introduce the classic micropipeline of Sutherland,1 along with two highperformance variants: Mousetrap2 (which uses a standard cell design) and GasP3 (which uses a custom design). For dynamic logic, we present the classic PS0 pipeline of Williams and Horowitz,4,5 along with two high-performance variants: the precharge half-buffer (PCHB) pipeline6 (which provides greater timing robustness) and the high-capacity (HC) pipeline7 (which provides double the storage capacity).

We also briefly discuss design tradeoffs, performance evaluation, systemlevel analysis and optimization techniques, CAD tool support, testing, and recent industrial and academic applications.

Applications of pipelining in asynchronous systems
For synchronous systems, pipelining is a straightforward technique: complex function blocks are subdivided into smaller blocks, registers are inserted to separate them, and the global clock is applied to all registers. In contrast, for asynchronous systems, there is no global clock. Therefore, a protocol for the interaction of neighboring stages must be defined, as well as choices of data encoding and storage elements. In addition, an explicit distributed control structure must be designed. Together, this ensemble constitutes a template or skeleton for coordinating the blocks of a pipelined asynchronous system.

Several leading processors from the 1950s and 1960s used asynchronous circuits extensively, including
the Illiac and Illiac II (University of Illinois), the Atlas and MU-5 (University of Manchester), and designs from the Macromodules project (Washington University, St. Louis). 

The basic concept and design of an asynchronous pipeline were presented by David Muller in his seminal
paper from 1963.8 Since then, asynchronous pipelines have had broad application, from the early commercial graphics and flight simulation systems of Evans & Sutherland, whose LDS-1 (Line Drawing System-1) was first shipped to Bolt, Beranek and Newman (BBN) in 1969, to the foundational approaches of Chuck Seitz.9 More recently, highperformance asynchronous pipelines have been used commercially in 

  • Sun’s UltraSparc IIIi computers for fast memory access;
  • the Speedster FPGAs of Achronix Semiconductor (http://www.achronix.com), which, at a peak performance of 1.5 GHz, are currently claimed as the world’s fastest;10 and
  • the Nexus Ethernet switch chips of Fulcrum Microsystems, an asynchronous start-up company recently acquired by Intel11 (http://www.fulcrummicro.com).

No comments:

Post a Comment

leave your opinion