1 (edited by job 2015-05-17 07:09:13)

Topic: DDR fatal error on boot due to SPD flash corruption

Hi,

Today I had a failure on boot:

U-Boot SPL 2014.10-rc3-00039-gc5efead (Oct 17 2014 - 19:41:27)
Unsupported device width, fatal.
Fatal DDR error 0x03

Turns out my SPD eeprom in the SODIMM is corrupt (you can only see this after recovering):

/usr/sbin/i2cdump -y 0 0x50
No size specified (using byte-data access)
     0  1  2  3  4  5  6  7  8  9  a  b  c  d  e  f    0123456789abcdef
00: 00 00 00 03 04 21 00 00 00 00 00 00 00 00 00 00    ...??!..........
10: 69 78 69 30 69 11 18 81 20 08 3c 3c 00 f0 83 01    ixi0i??? ?<<.???

this should look like:

$ /usr/sbin/i2cdump -y 0 0x50
No size specified (using byte-data access)
     0  1  2  3  4  5  6  7  8  9  a  b  c  d  e  f    0123456789abcdef
00: 92 13 0b 03 04 21 02 01 03 11 01 08 0a 00 fe 00    ?????!???????.?.
10: 69 78 69 30 69 11 18 81 20 08 3c 3c 00 f0 83 01    ixi0i??? ?<<.???

In the normal U-Boot SPL this SPD flash content is parsed and will crash if faulty.
A special SPL can be found here (for now) that can be booted over USB:
https://nas.xobs.io/novena/debugspl/

Compile the loader on your PC, plug in a micro USB in the side connector (USB OTG) and boot the Novena with P_USB shorted

gcc imx-usb-loader.c -o imx-usb-loader `pkg-config libusb-1.0 --cflags --libs`
<Boot Novena here with micro USB attached ...>
./imx-usb-loader -v SPL
found i.MX6q USB device [15a2:0054]
main dcd length 8
sub dcd length 4
loading binary file(SPL) to 00907400, skip=0x0, fsize=48128 type=170...
binary file successfully loaded
jumping to 0x00907400

You can now install the file "SPL" on the micro-SD in the Novena or you can fix the SPD flash.
Installing the "SPL":

# novena-install-spl -s ./SPL
Successfully wrote ./SPL to /dev/disk/by-path/platform-2198000.usdhc

To fix the SPD flash, I wrote the first 16 bytes back. This can be done per byte with i2cset from the i2c-tools package or at once with 'eeprog' from http://www.codesink.org/eeprog.html

i2cset 0 0x50 0x0 0x92
i2cset 0 0x50 0x1 0x13
i2cset 0 0x50 0x2 0x0b
...

With eeprog you need the data in binary form:

// To read
./eeprog -x -f -8 -r 0:16 /dev/i2c-0 0x50
// To write
./eeprog -8 -f /dev/i2c-0 0x50 -w 0 < spd.bin

After this my system was back to normal.
Maybe it would be good to check the values in your SPD with /usr/sbin/i2cdump -y 0 0x50
GregRob noticed some zeroes in his first line, which should not be there.
My current content looks like this:

i2cdump -y 0 0x050
No size specified (using byte-data access)
     0  1  2  3  4  5  6  7  8  9  a  b  c  d  e  f    0123456789abcdef
00: 92 13 0b 03 04 21 02 01 03 11 01 08 0a 00 fe 00    ?????!???????.?.
10: 69 78 69 30 69 11 18 81 20 08 3c 3c 00 f0 83 01    ixi0i??? ?<<.???
20: 00 00 00 00 00 00 00 00 00 86 00 00 00 00 00 00    .........?......
30: 00 00 00 00 00 00 00 00 00 00 00 00 2f 11 01 00    ............/??.
40: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00    ................
50: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00    ................
60: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00    ................
70: 00 00 00 00 00 01 98 01 14 41 c0 0e 31 8b 7d 53    .....????A??1?}S
80: 39 39 30 35 34 36 39 2d 31 34 35 2e 41 30 30 4c    9905469-145.A00L
90: 46 20 00 00 80 2c 00 00 00 00 00 00 00 00 00 00    F ..?,..........
a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00    ................
b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00    ................
c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00    ................
d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00    ................
e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00    ................
f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 5a    ...............Z

2 (edited by GregRob 2015-05-16 21:36:31)

Re: DDR fatal error on boot due to SPD flash corruption

FYI, here is my current output of my ircdump:

$ /usr/sbin/i2cdump -y 0 0x50
No size specified (using byte-data access)
     0  1  2  3  4  5  6  7  8  9  a  b  c  d  e  f    0123456789abcdef
00: 00 00 00 03 04 21 02 01 03 11 01 08 0a 00 fe 00    ...??!???????.?.
10: 69 78 69 30 69 11 18 81 20 08 3c 3c 00 f0 83 01    ixi0i??? ?<<.???
20: 00 00 00 00 00 00 00 00 00 86 00 00 00 00 00 00    .........?......
30: 00 00 00 00 00 00 00 00 00 00 00 00 2f 11 01 00    ............/??.
40: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00    ................
50: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00    ................
60: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00    ................
70: 00 00 00 00 00 01 98 01 14 41 f7 0e 5e 64 7d 53    .....????A??^d}S
80: 39 39 30 35 34 36 39 2d 31 34 35 2e 41 30 30 4c    9905469-145.A00L
90: 46 20 00 00 80 2c 00 00 00 00 00 00 00 00 00 00    F ..?,..........
a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00    ................
b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00    ................
c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00    ................
d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00    ................
e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00    ................
f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 5a    ...............Z

Re: DDR fatal error on boot due to SPD flash corruption

The pre-release boards we were using had non-writeable SPD EEPROMs, but it looks like the modules that were included in the released boards are writeable.  And some errant process is nulling out the first few bytes.

To compensate, I've updated the U-Boot SPL to check the checksum, and if it fails will use a hardcoded SPD.

Given that it has become a problem recently, I think this is a good stopgap solution.  It's less intrusive than forcing the SPD to become permanently locked.  I haven't found out what causes the failure, but in reality anyone could accidentally cause it with an errant i2cset.

Re: DDR fatal error on boot due to SPD flash corruption

I also experienced the boot problem with DDR error 0x3. Per request on IRC here's the i2cdump:

root@jens-novena:~# /usr/sbin/i2cdump -y 0 0x50
No size specified (using byte-data access)
     0  1  2  3  4  5  6  7  8  9  a  b  c  d  e  f    0123456789abcdef
00: 00 00 00 00 04 21 00 00 00 00 00 00 00 00 00 00    ....?!..........
10: 69 78 69 30 69 11 18 81 20 08 3c 3c 00 f0 83 01    ixi0i??? ?<<.???
20: 00 00 00 00 00 00 00 00 00 86 00 00 00 00 00 00    .........?......
30: 00 00 00 00 00 00 00 00 00 00 00 00 2f 11 01 00    ............/??.
40: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00    ................
50: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00    ................
60: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00    ................
70: 00 00 00 00 00 01 98 01 14 41 f8 0e a7 64 7d 53    .....????A???d}S
80: 39 39 30 35 34 36 39 2d 31 34 35 2e 41 30 30 4c    9905469-145.A00L
90: 46 20 00 00 80 2c 00 00 00 00 00 00 00 00 00 00    F ..?,..........
a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00    ................
b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00    ................
c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00    ................
d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00    ................
e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00    ................
f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 5a    ...............Z

Re: DDR fatal error on boot due to SPD flash corruption

Here's a script you can run to check/fix your SDP:

#!/usr/bin/python

import subprocess, re

correct_dump = '''
     0  1  2  3  4  5  6  7  8  9  a  b  c  d  e  f    0123456789abcdef
00: 92 13 0b 03 04 21 02 01 03 11 01 08 0a 00 fe 00    ?????!???????.?.
10: 69 78 69 30 69 11 18 81 20 08 3c 3c 00 f0 83 01    ixi0i??? ?<<.???
20: 00 00 00 00 00 00 00 00 00 86 00 00 00 00 00 00    .........?......
30: 00 00 00 00 00 00 00 00 00 00 00 00 2f 11 01 00    ............/??.
40: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00    ................
50: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00    ................
60: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00    ................
70: 00 00 00 00 00 01 98 01 14 41 c0 0e 31 8b 7d 53    .....????A??1?}S
80: 39 39 30 35 34 36 39 2d 31 34 35 2e 41 30 30 4c    9905469-145.A00L
90: 46 20 00 00 80 2c 00 00 00 00 00 00 00 00 00 00    F ..?,..........
a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00    ................
b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00    ................
c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00    ................
d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00    ................
e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00    ................
f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 5a    ...............Z
'''

offset_re = re.compile('^[0-9a-f][0-9a-f]: .*')
hex_re = re.compile('^[0-9a-f][0-9a-f]$')

sdp_bus = '0'
sdp_addr = '0x50'

def dump_to_bytelist(s) :
    last_offset = None
    byte_list = []
    for l in s.split('\n') :
        m = offset_re.match(l)
        if m is not None :
            this_offset = int(l[0:2], 16)
            assert(last_offset is None or this_offset == last_offset + 16)
            arr = l.split()
            assert(len(arr) > 17)
            for i in arr[1:17] :
                assert(hex_re.match(i))
                byte_list.append(int(i, 16))
            last_offset = this_offset
    assert(len(byte_list) == 256)
    return byte_list

def dump_sdp() :
    p = subprocess.Popen(['/usr/sbin/i2cdump', '-y', sdp_bus, sdp_addr], stdout=subprocess.PIPE)
    dump_text = p.stdout.read()
    ret_code = p.wait()
    if ret_code != 0 :
        print 'i2cdump failed'
        sys.exit(1)
    return dump_to_bytelist(dump_text)

def set_sdp_bytes(byte_list) :
    for idx, byte_val in byte_list :
        cmd = ['/usr/sbin/i2cset', '-y', sdp_bus, sdp_addr, str(idx), str(byte_val)]
        #print cmd
        #ret_code = 0
        ret_code = subprocess.call(cmd)
        if ret_code != 0 :
            print 'i2cset failed'
            sys.exit(1)

def compare(correct_bytes, current_bytes) :
    diff_list = []
    for idx in range(0, len(correct_bytes)) :
        if current_bytes[idx] != correct_bytes[idx] :
            diff_list.append( (idx, current_bytes[idx], correct_bytes[idx]) )
    return diff_list

def show_bytes(byte_list) :
    line_offset = 0
    byte_dict = dict(byte_list)
    while line_offset < 256 :
        line = ('%02x: '%line_offset) + ' '.join('%02x'%byte_dict[v] if v in byte_dict else '__' for v in range(line_offset, line_offset+16))
        print line
        line_offset += 16

correct_bytes = dump_to_bytelist(correct_dump)
current_bytes = dump_sdp()

diff_list = compare(correct_bytes, current_bytes)

if len(diff_list) == 0 :
    print 'SDP is correct'
else :
    print 'SDP differs, current bytes incorrect'
    show_bytes([(a[0], a[1]) for a in diff_list])
    print 'correct bytes'
    show_bytes([(a[0], a[2]) for a in diff_list])
    y_n = raw_input('Fix? ')
    print y_n
    if y_n.lower() == 'y' :
        set_sdp_bytes([(a[0], a[2]) for a in diff_list])

6 (edited by GregRob 2015-05-20 23:32:44)

Re: DDR fatal error on boot due to SPD flash corruption

jbj1, Some of the bytes that your python script reports as wrong are actually just different from DDR to DDR.  For example 0x7a–0x7d are the module serial number.

In job's post he only re-wrote the first 16 bytes of the DDR EEPROM.

This wikipedia section has a good description of what the different bytes mean: http://en.wikipedia.org/wiki/Serial_pre … DDR3_SDRAM

Re: DDR fatal error on boot due to SPD flash corruption

I triggered another one.

While doing reboot tests to debug the LCD init problem in u-boot, I hit a failing resume from hibernate.
After a hard reboot:

     0  1  2  3  4  5  6  7  8  9  a  b  c  d  e  f    0123456789abcdef
00: 92 13 0b 03 04 21 00 00 00 00 00 00 00 00 00 00    ?????!..........
10: 69 78 69 30 69 11 18 81 20 08 3c 3c 00 f0 83 01    ixi0i??? ?<<.???

I will see if it helps to hang a buspirate on the bus, but I'm not too hopeful.
If it is corrupted comms then no idea what the buspirate will make of it.