Addressing multi-bit errors in DRAM/memory subsystem

Thumbnail Image
Date
2020-01-01
Authors
Yeleswarapu, Ravikiran
Major Professor
Advisor
Arun K. Somani
Committee Member
Journal Title
Journal ISSN
Volume Title
Publisher
Altmetrics
Abstract

As DRAM technology continues to evolve towards smaller feature sizes and increased densities, faults in DRAM subsystem are becoming more severe. Current servers mostly use CHIPKILL based schemes to tolerate up-to one/two symbol errors per DRAM beat. Such schemes may not detect multi-symbol errors arising due to faults in multiple data buses and/or chips. In this work, we introduce Single Symbol Correction Multiple Symbol Detection (SSCMSD) - a novel error handling scheme to correct single-symbol errors and detect multi-symbol errors. Our scheme makes use of a hash in combination with Error Correcting Code (ECC) to avoid silent data corruptions (SDCs). SSCMSD also enhances the capability of detecting errors in address bits.

We develop a novel scheme that deploys 32-bit CRC along with Reed-Solomon code to implement SSCMSD for a x4 based DDRx system. Simulation based experiments show that our scheme effectively prevents SDCs in the presence of multi-symbol errors (in data) as well as address bit errors only limited by the aliasing probability of the hash. Our novel design enabled us to achieve this without introducing additional READ latency. We need 19 chips per rank (storage overhead of 18.75 percent), 76 data bus-lines and additional hash-logic at the memory controller.

Series Number
Journal Issue
Is Version Of
Versions
Series
Academic or Administrative Unit
Type
dissertation
Comments
Rights Statement
Copyright
Wed Jan 01 00:00:00 UTC 2020
Funding
Supplemental Resources
Source