Adam John Neale
Design and Analysis of an Adjacent Multi-bit Error Correcting Code for Nanoscale SRAMs
Increasing SRAM bit-cell density is a major driving force for semiconductor technology scaling. The industry standard 2x reduction in SRAM bit-cell area per technology node has lead to a proliferation in memory intensive applications as larger memory systems can be realized per unit area. Coupled with this increasing capacity, is an exponentially increasing SRAM system-level soft error rate (SER). Soft errors, caused by galactic radiation and radioactive chip packaging material corrupt a bit-cell’s data-state and are a potential cause of catastrophic system failures. Further, reductions in device geometries, design rules, and sensitive node capacitances increase the probability of multiple adjacent bit-cells being upset per particle strike to over 30% of the total SER below the 45nm node. Traditionally, these upsets have been addressed using a simple error correction code (ECC) combined with word interleaving. With continued scaling however, errors beyond this setup begin to emerge. Although more powerful ECCs exist, they come at an increased overhead in terms of area and latency. Additionally, interleaving adds complexity to the system and may not always be feasible for the given architecture.
In this work, a new class of ECC targeted toward adjacent multi-bit upsets (MBU) is proposed and analyzed. These codes present a tradeoff between the currently popular single error correcting-double error detecting (SEC-DED) ECCs used in SRAMs (that are unable to correct MBUs), and the more robust multi-bit ECC schemes used for MBU reliability. The proposed codes are evaluated and compared against other ECCs using a custom test suite developed in Matlab and Verilog HDL implementations synthesized using Synopsys Design Compiler and a commercial 65nm bulk CMOS standard cell library. Simulation results show a 2.35x improvement in corrected-SER over a Bose-Chaudhuri- Hocquenghem (BCH) double error correcting (DEC) code while requiring 3 fewer check-bits, 85% less ECC circuit area, and 15% less error correction delay. Further, an alternative 2-bit adjacent error correcting implementation provides a corrected-SER approximately equal to the BCH DEC code for the same check-bit overhead as a conventional SEC-DED code in the same error channel.
For further verification, a 0.4V 75kb single-cycle SRAM macro protected with a programmable, up-to-3-adjacentbit correcting, implementation of the proposed multi-bit ECC has been fabricated in a commercial 28nm bulk CMOS process. The circuit has undergone neutron radiation testing at the TRIUMF Neutron Irradiation Facility in Vancouver, Canada. Measurements results show an 189x improvement in SER over an unprotected memory with no ECC enabled and a 5x improvement over a traditional single-error-correction (SEC) code at 0.5V for the same number of check-bits. Measurement results confirm an average active energy of 0.015fJ/bit at 0.4V, and average 80mV reduction in VDDM I N across eight packaged chips by enabling the ECC. Both the memory array and ECC circuit were designed for low voltage applications using a full-custom design flow.