Troubleshooing ECC errors


Results 1 to 3 of 3

Thread: Troubleshooing ECC errors

Threaded View

  1. #1
    Join Date
    Aug 2011
    Posts
    33

    Troubleshooing ECC errors

    Hello everyone,

    I have a new build I built a few months ago and ever since I got the thing running - long story - I have been getting ECC errors. I have replaced all the cheap components and I want to check my suspects before going out and replacing the expensive components.

    The build:
    AMD Opteron 2435 x2
    SuperMicro H8DAE-2
    XFX AMD Radeon HD 6750
    DDR2-400 RAM - Various 12 DIMM (see below)
    Enermax NAXN 750AWT
    OCZ Petrol SSD
    Slackware64 13.37

    The errors - I get thousands of these 3 line errors with 0 UECs:
    Code:
    [47175.704033] [Hardware Error]: MC4_STATUS: Corrected error, other errors lost: yes, CPU context corrupt: no, CECC Error
    [47175.704046] [Hardware Error]: Northbridge Error, node 0: DRAM ECC error detected on the NB.
    [47175.704051] [Hardware Error]: Transaction: RD (MEM), no timeout, Cache Level: L3/GEN, Participating Processor: SRC
    The only things that change between errors are as follows:

    "Northbridge Error, node 0"
    alternates between
    "Northbridge Error, node 1.
    This alternation is sometimes immediate or could be minutes later.

    "Participating Processor: SRC"
    is sometimes
    "Participating Processor: RES"
    this seems to be random.

    The RAM:
    8x HP 1GiB - Cheap eBay RAM
    PC2-3200R-333-10
    4x Hynix HYMP351R72AMP4-E3 4GiB - SuperMicro tested RAM list
    PC2-3200R-333-12

    What I have done:
    Replaced cheap eBay ram with SuperMicro suggested RAM.
    Changed BIOS settings for scrubbing CPU cache.

    What I haven't done but plan to do and plan to update this thread:
    Test DIMM by DIMM on each CPU in a 1CPU configuration - test each DIMM twice, once on each CPU - I already did this with the HP RAM on 1 CPU and none seemed bad until I added more than 2 DIMM.
    Disable ECC and run memtest86 overnight

    I'm looking for some other ECC diagnostics tests as I have already done the "try each DIMM" test on one CPU with the HP RAM.

    I've only recently installed the second CPU so it's been in a 1 CPU configuration for a few months.

    System Temps:
    Code:
    sensors
    w83793-i2c-1-2c
    Adapter: SMBus nForce2 adapter at 2e00
    VcoreA:      +1.22 V  (min =  +1.08 V, max =  +1.62 V)   
    VcoreB:      +1.24 V  (min =  +1.08 V, max =  +1.62 V)   
    in2:         +1.09 V  (min =  +1.08 V, max =  +1.33 V)   
    in3:         +0.86 V  (min =  +0.00 V, max =  +4.08 V)   
    in4:         +0.85 V  (min =  +0.00 V, max =  +4.08 V)   
    in5:         +1.81 V  (min =  +1.62 V, max =  +1.98 V)   
    in6:         +1.82 V  (min =  +1.62 V, max =  +1.98 V)   
    +5V:         +5.19 V  (min =  +4.64 V, max =  +5.65 V)   
    5VSB:        +5.09 V  (min =  +4.64 V, max =  +5.65 V)   
    Vbat:        +3.06 V  (min =  +2.96 V, max =  +3.63 V)   
    fan1:          0 RPM  (min =  712 RPM)  ALARM
    fan2:          0 RPM  (min =  712 RPM)  ALARM
    fan3:          0 RPM  (min =  712 RPM)  ALARM
    fan4:          0 RPM  (min =  712 RPM)  ALARM
    fan5:          0 RPM  (min =  712 RPM)  ALARM
    fan6:          0 RPM  (min =  712 RPM)  ALARM
    fan7:       2005 RPM  (min =  712 RPM)
    fan8:       1867 RPM  (min =  712 RPM)
    temp1:       +13.0�C  (high = +65.0�C, hyst = +60.0�C)  sensor = thermal diode
    temp2:       +13.0�C  (high = +65.0�C, hyst = +60.0�C)  sensor = thermal diode
    beep_enable:disabled
    
    w83627hf-isa-0290
    Adapter: ISA adapter
    in0:         +1.49 V  (min =  +1.34 V, max =  +1.65 V)   
    in1:         +1.39 V  (min =  +1.25 V, max =  +1.54 V)   
    in2:         +3.39 V  (min =  +2.96 V, max =  +3.62 V)   
    in3:         +3.06 V  (min =  +4.08 V, max =  +2.03 V)   ALARM
    in4:         +3.18 V  (min =  +2.83 V, max =  +3.47 V)   
    in5:         +0.59 V  (min =  +0.42 V, max =  +0.88 V)   
    in6:         +0.75 V  (min =  +4.06 V, max =  +2.93 V)   ALARM
    in7:         +3.31 V  (min =  +2.98 V, max =  +3.63 V)   
    in8:         +3.09 V  (min =  +2.96 V, max =  +3.62 V)   
    fan1:          0 RPM  (min = 3040 RPM, div = 2)  ALARM
    fan2:          0 RPM  (min =    0 RPM, div = 2)
    fan3:          0 RPM  (min = 11842 RPM, div = 2)  ALARM
    temp1:       +42.0�C  (high = +80.0�C, hyst = +75.0�C)  sensor = thermistor
    temp2:       +35.5�C  (high = +80.0�C, hyst = +75.0�C)  sensor = thermistor
    temp3:       +32.5�C  (high = +80.0�C, hyst = +75.0�C)  sensor = thermistor
    cpu0_vid:   +1.550 V
    beep_enable:enabled
    mcelog.txt
    Last edited by WrinkledCheese; 11-27-2012 at 11:53 PM. Reason: Added MCELOG

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •