# Flipping Bits in Memory Without Accessing Them Yoongu Kim\*, Ross Daly, Jeremie Kim\*, Chris Fallin, Ji Hye Lee\*, Donghyuk Lee\*, Chris Wilkerson (Intel Labs), Konrad Lai, Onur Mutlu\* (CMU\*)

#### **OVERVIEW**

- Disturbance Error: When an access to one memory address corrupts data stored in some other memory address(es)
- Exposition: Commodity DRAM modules from recent years exhibit disturbance errors

#### **CHARACTERIZATION METHODOLOGY**

**[ISCA '14]** 

- 8 FPGAs programmed with customized test engine
- 129 DDR3 DRAM modules (972 DRAM chips)
- Heat chamber regulated to 50 ± 2°C
- Row is opened/closed once every 55ns for 128ms
- Pathology: Repeatedly "opening" and "closing" a DRAM row causes cells in nearby rows to lose charge

### **DISTURBANCE ERRORS ARE WIDESPREAD**



A Row of DRAM cells

Victim Row

**Aggressor Row** 

Victim Row

A Row of DRAM cells

### **REAL SYSTEM DEMONSTRATION**



#### • 110 out of 129 modules are affected

| 1 | <u>disturb</u> :   | 1 : | no-disturb:           |
|---|--------------------|-----|-----------------------|
| 2 | mov (X), %eax      | 2   | mov (X), %eax         |
| 3 | mov (Y), %ebx      | 3   | clflush (X)           |
| 4 | clflush (X)        | 4   |                       |
| 5 | clflush (Y)        | 5   |                       |
| 6 | mfence             | 6   | mfence                |
| 7 | jmp <u>disturb</u> | 7   | jmp <u>no-disturb</u> |
|   |                    |     |                       |

- Addresses X and Y must map to *different* rows in the same bank
- Alternating accesses to different rows cause both of them to be opened and closed
- Number of errors induced using a 2GB module:

| Bit-Flips | Intel SNB | Intel IVB | Intel HSW | AMD PLD |
|-----------|-----------|-----------|-----------|---------|
| '0' → '1' | 7,992     | 10,273    | 11,404    | 47      |

- As many as 1 out of every 1.7K cells is affected
- As few as 139K accesses induce an error
- To eliminate all errors, requires ~8x refresh rate
- As many as 4 errors per 64-bit word

## **SUMMARY OF OTHER FINDINGS IN PAPER**

- Most aggressors induce errors in two rows or less
- Almost all errors are attributed to charge loss
- RowStripe data has 100x more errors than Solid
- For a given victim cell, errors are highly repeatable
- Victim cells ≠ Weak cells
- Errors are not strongly affected by temperature

## **OUR SOLUTION**

- Every time a row is closed, refresh its neighbors with some low probability

