Seminar report on " 3D INTEGRATION "



This paper surveys recent publications on the new class of layered circuit integration techniques termed 3D integration. We describe both the potential benefits and major pitfalls of 3D integration. Several competing layering approaches are described and compared. We also forecast the impact that a move to 3D integration would have on CAD tools and circuit design flows. Sequential fabrication of CMOS devices one above the other on isolated crystalline templates could enable the manufacture of 3D Integrated circuits with more than a million inter-layer wires per square millimeter. 3D IC technologies can help to improve circuit performance and lower power consumption by reducing wire length. Also, 3D IC technology can be used to realize heterogeneous system-on-chip design, by integrating different modules together with less interference with each other. Through strategic modification of the architectures to take advantage of 3D, significant improvement in speed and reduction in power consumption can be achieved.

Introduction- One of the main factors that limit the performance of today’s integrated circuits (ICs) is their architecture. In today’s integrated circuits the building blocks or the transistors are laid out like singlestoried buildings spread out over large areas on the top surface of the silicon wafer. Imagine how compact these circuits could be made, if the transistors were stacked one above the other – to resemble the tall skyscrapers in downtown Manhattan. Bringing the components closer to each other in this manner could not only make these systems faster owing to a reduction in the average length of the interconnect
wires, but also more versatile because more and more transistors could be crammed in a relatively small area . The ‘3D-ness’ of an IC can be assessed by comparing the density of vertical interconnect wires running between the different device levels to the number of wires (vias) per unit area of a conventional IC. 3D-IC promises to offer multiple advantages over conventional 2D-IC, including alleviating the communication bottleneck, integration of heterogeneous materials, and enabling novel architectures. 3D-ICs present challenges at all fronts of technology and design. If the 3D-IC is simply a stacking of the 2D circuit blocks with no significant modification in architecture, the gain in performance will be very limited, if any. A strategy in architecture and function partitioning across layers must be developed to take advantage of the third dimension while managing the overall complexity. The performance advantages of 3D architectures will be illustrated with two examples: 3D-FPGA and 3D-SRAM.

The first and most obvious, potential motivation is miniaturization. However, through silicon 3D integration is rarely justified by the desire for miniaturization alone. For most circumstances, if volume reduction is the only goal, then it is much more cost effective to stack and wire-bond. This technology is already in wide-spread use in cell phones, and continues. However, one exception that is being widely explored is for memories. Wire-bonding cannot be easily used to stack identical memory chips, as they are all the same size. In addition, there are systems advantages to thinning and stacking multiple memory die such that the aggregate memory has the same end form factor as one memory package. For example, this technology could enable a credit card sized video storage and viewing device containing 100s of hours of video.
3D IC technologies can help to improve circuit performance and lower power consumption by reducing wire length. Also, 3D IC technology can be used to realize heterogeneous system-on-chip design, by integrating different modules together with less interference with each other.
In this section we motivate (A) higher levels of integration, (B) the shortening of interconnect, (C) heterogeneous integration, and (D) fine grained testing. 3D integration facilitates each of these tasks.

A. Systems on Chip
The System on Chip (SoC) has become an attractive option due to technology scaling. A SoC is a complete electronic system, including digital logic, memory, and analog circuitry, in a single chip. The reasons for this development are clear. In the fabrication of integrated circuits, yield drops off dramatically with increased die area. For this reason, die areas have only slowly increased over the years. Transistor density, on the other hand, has maintained an exponential growth trajectory for decades. As VLSI transistor sizes decrease, more functionality can be integrated onto a given die area. At the same time, pads and PCB traces outside of a chip become larger relative to those smaller transistors within chip; both the power and latency of off-chip communication increase relative to on-chip communication. In addition to growth in the number of pins available on chip packages is relatively slow; this limits off-chip communication bandwidth. Therefore, there is both the capability and motivation to integrate more system components onto a single die.

B. Interconnect shortening
With increasing per-die circuit size, both in SoCs and in complex monolithic instruction processors, interconnect delay has become the dominant factor for circuit performance. Larger circuit sizes mean global wires connecting opposite ends of the die are larger relative to transistors; their delay now dominates gate delay. The dominance of interconnect delay has also created a timing closure problem for CAD tools. Physical layout and routing decisions must be fed back into the tools that synthesize logic and size gates in order to verify that timing deadlines are indeed met; as interconnect delay becomes more dominant, it becomes more difficult to predict path delays in the early stages and thus the chances of meeting deadlines after physical design are lowered. Infinite iteration between high and low level designs can result. Repeater insertion is another headache for hierarchical design flows. Delay along global wires can be reduced by breaking them up with repeaters. However, these repeaters must be squeezed into valuable silicon area underneath the wire. Routing is usually done after placement, so we have a feedback situation akin to timing closure: It would be highly advantageous to reduce the delay penalties and design complexities due to long interconnects. We can see that 3D integration does this by simple geometry. A given square area A has maximum Manhattan wire length 2√A. The same area split into two layers reduces the wire length to √2√A+lv where lv is the length of a via between layers. In general, n layers give a maximum Manhattan wire length of
2√ (A/n) + (n - 1)lv.

C. Heterogeneous integration
In a mixed design, like a SoC, we prefer to fabricate each type of circuit in its own ideal technology; a 3D system allows layers built with different processes to be combined into a single chip. It is possible to fabricate digital logic, memories, DSPs, analog and RF devices on a single die using one technology but this is suboptimal in terms of performance, area, and power. Putting the components on different dies also allows us to better isolate sensitive analog circuitry. Even within the same process, it may be desirable to have layers with different voltage and performance requirements or clocking domains. Looking ahead, heterogeneous layering also would allow the upper layer to be dedicated to optical I/O and low-skew optical clock distribution.

D. Commodity dies
For all but the largest production runs, tremendous cost savings might be realized by assembling systems from a collection of commodity “prefab” dies rather than creating a new mask set. Mask prices for cutting-edge processes have been increasing steadily, so mask reuse is critical. Prefabbed dies for certain components, especially analog devices, could be used for many years while timing-critical digital logic dies are continually updated for newer processes.

E. Component testing and replacement
Yield would be significantly improved if chip components could be individually tested and repaired
Yield is poor in large monolithic 2D ICs because just a single fault in that large chip area dooms the entire chip. On the other hand, a 3D IC might be tested in parts. Chips would be built only from Known Good Dies. Yield would increase dramatically since each fault causes only a fraction of the entire system to be discarded. Increased yield would translate into lower cost and higher feasible circuit area.

3D TECHNOLOGIES:
Several options have been proposed for 3D integration. Only die stacking has yet reached production.
A. Die stacking

The simplest option for 3D integration is stacking of successively smaller dies, as shown in figure 1. The product is called a Multiple Chip Module (MCM). In this approach, die alignments requirements are not very precise; wire bonding, tape automated bonding, or controlled-collapse chip connections (C4 solder) are possibilities for connections between layers. Die-stacked chips with wire bonding are already on the market. While simple to produce, the benefits of die stacking are limited. Compared to the discrete chip case, interdie connections are lower in impedance but they are still limited in number.

B. Wafer bonding
Wafer bonding is the process of joining two or more wafers prior to dicing and packaging. Such bonding must create interlayer vias while isolating the transistors from adjoining layers. Figure 2 illustrates several of the proposed processes. In all three processes, through-silicon vias are sunk deep into the substrate and later revealed by temporarily attaching the wafer to a glass “handle” and thinning its back end down. In the face-to-back processes (figure 2 a. and c.), thinning the upper layer exposes vias that contact the lower layer. In the face to- face process (figure 2 b.) thinning the upper layer exposes wire-bonding pads for packaging. The SOI process scales better to many layers since each layer is very thin, which aids in dissipating heat from the lower layers; we will see that heat is an important consideration.
However, SOI processes carry their own disadvantages. Wafer bonding requires very precise alignment of wafers during bonding. Current alignment precision is limited to about ±2чm. Through-silicon vias would be relatively easy to drive;

C. Silicon growth
There is also a class of 3D integration proposals based on growing substrate layers above complete, metalized wafers. These techniques include Beam Recrystallization, Epitaxial Growth, and Solid Phase Crystallization. One drawback of these techniques is that layers must be homogeneous since layers are created in immediate succession on the same equipment. Another problem is that the underlying copper metallization layers are somewhat sensitive; they must be kept below 450 degrees Celsius while the higher layers are created.


ILLUSTRATIONS:
3D-FPGA:

The design and prototyping costs of cell-based ASIC have become prohibitive, making FPGAs increasingly popular. However, FPGAs, when compared with cell-based ASICs, have 10−40 times lower logic density, 3−4 times higher delay, and 5−12 times higher dynamic power dissipation. 3D integration can help close this performance gap in several ways.

a) Heterogeneous Stacking
A significant fraction of the area in a modern FPGA is occupied by hard IPs, such as memory blocks, microprocessors, and DSP blocks. Using wafer stacking, these IPs can be stacked on top of the FPGA fabric. This reduces the FPGA footprint, resulting in shorter interconnects, lower delay, and lower power. The performance benefits of this approach, however, are limited by the fraction of FPGA area occupied by IPs.

b) Homogeneous Stacking
In this approach, multiple identical FPGA fabrics are stacked using wafer stacking and their switch boxes are vertically connected using through silicon vias (TSVs). Simple analysis of this scenario shows that with TSV pitch of 3−5 times that of inter-metal via in the base CMOS technology, 4−6 fabric layers can be stacked with small area overhead. However, power dissipation in the intermediate layers and TSV parasitic may limit the potential performance benefits of this approach. This approach also entails significant modifications to the FPGA routing architecture and associated CAD tools.

c) Programming Overhead Stacking
FPGAs have lower performance than cell-based ASICs because around 80% of the FPGA area is occupied by programming overhead, including the configuration memory, the interconnect multiplexers and switches. Stacking this overhead on top of the logic does not require too many layers, but can result in significant reduction in FPGA footprint. This approach requires monolithic 3D integration because of the high density of inter-layer vias required. The study has shown that such stacking can achieve 3.2 times higher logic density, 1.7 times lower delay, and 1.7 times lower dynamic power consumption than a baseline 2D-FPGA implemented in 65nm CMOS technology. These improvements are achieved with appropriate optimization of buffer and transistor sizes, but without any change to the FPGA architecture. In a subsequent study, we explored architectural modification to take advantage of 3D. In particular, by merging the connection boxes and switch boxes into a switch layer on top of the logic boxes, as illustrated in Fig. 1, the routing fabric can be further simplified. The memory layer is split into two layers to provide better local vertical connectivity and relax the requirement on memory cell size. As shown in Fig. 2, the benchmark circuits achieve improvement of 1.7 to 2.9 times in critical path delay, and reductionof 2.5 to 3.2 times in dynamic power.
The stacking approaches discussed above can be combined to achieve performance that approaches that of 2D cell-based ASICs. The idea is to use monolithic stacking to reduce the programming overhead, and to use homogeneous and heterogeneous wafer stacking to achieve further reduction in interconnect length. Additional benefits can be obtained by changing the basic fabric architecture to take full advantage of 3D.

3D-SRAM:
The bit-line delay usually constitutes the majority of the total access time of SRAM. A major portion of the active power dissipation is also associated with the bit-line because a large number of bit lines discharge every time a word-line is asserted. Hierarchical bit-line architecture reduces the total bit-line
capacitance by isolating the cell junction capacitances from the global bit-lines. However, the overall reduction in bit-line capacitance is limited because a larger portion of the bit-line capacitance comes from the metal coupling, which is proportional to the length and hence is virtually unchanged for the same number of cells per bit-line. In our proposed 3D-SRAM architecture shown in Fig. 3, the local bit-lines extend upward, through an inter-layer via that connects SRAM cells vertically. The local bit-line connects through a select transistor to the global bit-line routed on the bottom layer. The overall bit-line capacitance can be reduced significantly because the length of the global bit-line is reduced by a factor of the number of layers. Thus this SRAM array will be denser, faster, and lower power than a conventional design.
The SRAM cell is similar to a conventional 2D-SRAM cell and hence does not require sophisticated 3D technology. Select transistors that connect the vertical local-bit-line and global bit-line are located on the first layer between SRAM cells. If the inter-layer vias are assumed to be similar in size to that of inter-metal vias, the area overhead will only be 18% per cell. Area efficiency is maximized by reusing this area for inter-layer vias in the upper layers.

THERMAL CHALLENGES:
Aside from costs and technological hurdles, the main challenge facing 3D integration is heat dissipation. IC cooling has traditionally been handled by a heatsink attached to the top surface of the chip package. In this model, we can consider the chip to be a one dimensional thermal system; heat flows straight up from the chip to the heat sink. A move to 3D integration decreases the chip footprint and therefore increases power density at the heat sink interface. Another problem is that upper layers insulate lower layers from the heatsink. Silicon has high thermal resistance, so we expect a sharp vertical temperature gradient to develop in the chip.

CONCLUSION:
3D integration was considered as far back as 1979. However, there was no reason to give it much consideration while 2D technology was scaling so well. It is natural that it is being reconsidered now given the myriad problems in deep-submicron billion-transistor circuits. In general, 3D is justified in designs where interconnect resources dominate performance. By permitting a reduction in wire lengths, or an increase in bandwidth, there are many examples where 3D integration can improve performance and/or power consumption by 20% or more. This is equivalent to about two technology nodes of scaling. However, 3D does complicate design. A 3D specific design flow is needed, and thermal design, test and yield management, all require careful attention.

No comments:

Post a Comment

leave your opinion