Pierce I-Jen Chuang
High-Performance, Energy-Efficient CMOS Arithmetic Circuits
Manoj Sachdev and Vincent Gaudet
In a modern microprocessor, data-path/arithmetic circuit has always been an important building block in delivering high-performance, energy-efficient computing, because arithmetic operations such as addition and binary number comparison are two of the most commonly used computing instructions. In this thesis, a constant-delay (CD) logic style is proposed targeting full-custom high-speed applications. The constant delay characteristic of this logic style (regardless of the logic type) makes it suitable to implement complicated logic expressions such as addition. CD logic exhibits a unique characteristic where the output is pre-evaluated before the inputs from the preceding stage are ready. This feature enables performance advantage over static and dynamic domino logic styles in a single cycle, multi-stage circuit block. Several design considerations including timing window width adjustment and clock distribution are discussed. Using 65-nm general-purpose CMOS technology, the proposed logic style demonstrates an average speedup of 94% and 56% over static and dynamic domino logic, respectively, in five different logic gates. Simulation results of 8-bit ripple carry adders conclude that CD logic is 39% and 23% faster than the static and dynamic-based adders, respectively. CD logic also demonstrates 39% speedup and 64% (22%) energy-delay product reduction from static logic at 100% (10%) data activity in 32-bit carry lookahead adders. To confirm CD logic's potential, a 148 ps, single-cycle 64-bit Ling adder with CD logic implemented in the critical path is fabricated in a 65-nm, 1-V CMOS process. A new 64-bit Ling adder architecture, which utilizes both inversion and absorption properties to minimize the number of CD logic and the number of logic stage in the critical path, is also proposed. At 1-V supply, this adder's measured worst-case power and leakage power are 135 mW and 0.22 mW, respectively. A single-cycle 64-bit binary comparator utilizing a radix-2 tree structure is also proposed. This comparator architecture is specifically designed for static logic to achieve both low-power and high-performance operation, especially in low input data activity environments. At 65-nm technology with 25% (10%) data activity, the proposed design demonstrates 2.3X (3.5X) and 3.7X(5.8X) power and energy-delay product efficiency, respectively. This comparator is also 2.7X faster at iso-energy (80 fJ) or 3.3X more energy efficient at iso-delay (200 ps) than existing designs. An improved comparator, where CD logic is utilized in the critical path to achieve high performance without sacrificing the overall energy efficiency, is also realized in a 65-nm 1-V CMOS process. At 1-V supply, the proposed comparator's measured delay is 167 ps, and has an average power and a leakage power of 2.34 mW and 0.06 mW, respectively. At 0.3-pJ iso-energy or 250-ps iso-delay budget, the proposed comparator with CD logic is 20% faster or 17% more energy-efficient compared to a comparator implemented with just the static logic.