Bad RAM or bad CPU + random freezing


Results 1 to 6 of 6

Thread: Bad RAM or bad CPU + random freezing

Threaded View

  1. #1
    Join Date
    Aug 2011
    Posts
    33

    Bad RAM or bad CPU + random freezing

    Hey everyone,

    I have two issues I believe are unrelated, but I will hopefully get them resolved in this thread.

    My first issue is one that is more of a lack of skill. I'm Googling around and looking through the logs I have and I don't see anything that is helping me find an answer to my issue.

    I have a system with 2x Opteron 2435 and 8x1GB of DDR2 400 registered RAM. I am getting a LOT of ECC correction messages. My first though was to take all the memory out and try the system with each DIMM individually. I did that and I thought I found the culprit 2 DIMMs. I don't think that is the case anymore. Here is what I noticed.

    When I boot my system everything usually goes fine, sometimes I will notice an ECC correction message during boot but I only saw that once. Once I boot the machine, I start X and browse the web for a bit. It will usually happen within the first 5 minutes. For each DIMM, I spent an hour, or more, browsing the net. I found that with 2 DIMMs, I get ECC messages within 5 minutes and after about 10 minutes after the first message, there are about 50 corrections. The other 6 DIMMs seem fine.

    So I put the 6 DIMMs in and I go about my merry way. I boot the system and within 5 minutes, again, ECC correction messages. So I tried 1 CPU and then I tested the 6 good DIMMs again. They all tested fine. So I tested them in pairs. That seemed to be fine. Once I go beyond 2 DIMMs it seems that ECC messages like to pop up.

    Here is 5 minutes of errors. Coincidentally the first error happens at nearly 1 second into the 5th minute of the system running. This is just a coincidence. You will notice that sometimes it's only one message and sometimes it's many messages.

    Code:
    [  300.704037] [Hardware Error]: MC4_STATUS: Corrected error, other errors lost: yes, CPU context corrupt: no, CECC Error
    [  300.704044] [Hardware Error]: Northbridge Error, node 0: DRAM ECC error detected on the NB.
    [  300.704048] [Hardware Error]: Transaction: RD (MEM), no timeout, Cache Level: L3/GEN, Participating Processor: SRC
    [  300.704052] [Hardware Error]: Machine check events logged
    [  450.704032] [Hardware Error]: MC4_STATUS: Corrected error, other errors lost: yes, CPU context corrupt: no, CECC Error
    [  450.704040] [Hardware Error]: Northbridge Error, node 0: DRAM ECC error detected on the NB.
    [  450.704044] [Hardware Error]: Transaction: RD (MEM), no timeout, Cache Level: L3/GEN, Participating Processor: SRC
    [  450.704047] [Hardware Error]: Machine check events logged
    [  525.704033] [Hardware Error]: MC4_STATUS: Corrected error, other errors lost: yes, CPU context corrupt: no, CECC Error
    [  525.704040] [Hardware Error]: Northbridge Error, node 0: DRAM ECC error detected on the NB.
    [  525.704045] [Hardware Error]: Transaction: RD (MEM), no timeout, Cache Level: L3/GEN, Participating Processor: SRC
    [  525.704048] [Hardware Error]: Machine check events logged
    [  563.204023] [Hardware Error]: MC4_STATUS: Corrected error, other errors lost: yes, CPU context corrupt: no, CECC Error
    [  563.204031] [Hardware Error]: Northbridge Error, node 0: DRAM ECC error detected on the NB.
    [  563.204035] [Hardware Error]: Transaction: RD (MEM), no timeout, Cache Level: L3/GEN, Participating Processor: SRC
    [  581.954032] [Hardware Error]: MC4_STATUS: Corrected error, other errors lost: yes, CPU context corrupt: no, CECC Error
    [  581.954040] [Hardware Error]: Northbridge Error, node 0: DRAM ECC error detected on the NB.
    [  581.954044] [Hardware Error]: Transaction: RD (MEM), no timeout, Cache Level: L3/GEN, Participating Processor: SRC
    [  591.329039] [Hardware Error]: MC4_STATUS: Corrected error, other errors lost: yes, CPU context corrupt: no, CECC Error
    [  591.329047] [Hardware Error]: Northbridge Error, node 0: DRAM ECC error detected on the NB.
    [  591.329051] [Hardware Error]: Transaction: RD (MEM), no timeout, Cache Level: L3/GEN, Participating Processor: SRC
    [  591.329056] [Hardware Error]: Machine check events logged
    [  596.016008] [Hardware Error]: MC4_STATUS: Corrected error, other errors lost: yes, CPU context corrupt: no, CECC Error
    [  596.016016] [Hardware Error]: Northbridge Error, node 0: DRAM ECC error detected on the NB.
    [  596.016021] [Hardware Error]: Transaction: RD (MEM), no timeout, Cache Level: L3/GEN, Participating Processor: SRC
    [  596.016024] [Hardware Error]: Machine check events logged
    [  598.359023] [Hardware Error]: MC4_STATUS: Corrected error, other errors lost: yes, CPU context corrupt: no, CECC Error
    [  598.359032] [Hardware Error]: Northbridge Error, node 0: DRAM ECC error detected on the NB.
    [  598.359036] [Hardware Error]: Transaction: RD (MEM), no timeout, Cache Level: L3/GEN, Participating Processor: SRC
    [  599.530020] [Hardware Error]: MC4_STATUS: Corrected error, other errors lost: yes, CPU context corrupt: no, CECC Error
    [  599.530028] [Hardware Error]: Northbridge Error, node 0: DRAM ECC error detected on the NB.
    [  599.530033] [Hardware Error]: Transaction: RD (MEM), no timeout, Cache Level: L3/GEN, Participating Processor: RES
    [  600.115020] [Hardware Error]: MC4_STATUS: Corrected error, other errors lost: yes, CPU context corrupt: no, CECC Error
    [  600.115031] [Hardware Error]: Northbridge Error, node 0: DRAM ECC error detected on the NB.
    [  600.115036] [Hardware Error]: Transaction: RD (MEM), no timeout, Cache Level: L3/GEN, Participating Processor: SRC
    [  600.407041] [Hardware Error]: MC4_STATUS: Corrected error, other errors lost: yes, CPU context corrupt: no, CECC Error
    [  600.407065] [Hardware Error]: Northbridge Error, node 0: DRAM ECC error detected on the NB.
    [  600.407085] [Hardware Error]: Transaction: RD (MEM), no timeout, Cache Level: L3/GEN, Participating Processor: SRC
    [  600.553030] [Hardware Error]: MC4_STATUS: Corrected error, other errors lost: yes, CPU context corrupt: no, CECC Error
    [  600.553037] [Hardware Error]: Northbridge Error, node 0: DRAM ECC error detected on the NB.
    [  600.553041] [Hardware Error]: Transaction: RD (MEM), no timeout, Cache Level: L3/GEN, Participating Processor: SRC
    [  600.626036] [Hardware Error]: MC4_STATUS: Corrected error, other errors lost: yes, CPU context corrupt: no, CECC Error
    [  600.626044] [Hardware Error]: Northbridge Error, node 0: DRAM ECC error detected on the NB.
    [  600.626048] [Hardware Error]: Transaction: RD (MEM), no timeout, Cache Level: L3/GEN, Participating Processor: SRC
    [  600.663010] [Hardware Error]: MC4_STATUS: Corrected error, other errors lost: yes, CPU context corrupt: no, CECC Error
    [  600.663030] [Hardware Error]: Northbridge Error, node 0: DRAM ECC error detected on the NB.
    [  600.663036] [Hardware Error]: Transaction: RD (MEM), no timeout, Cache Level: L3/GEN, Participating Processor: SRC
    [  600.682002] [Hardware Error]: MC4_STATUS: Corrected error, other errors lost: yes, CPU context corrupt: no, CECC Error
    [  600.682005] [Hardware Error]: Northbridge Error, node 0: DRAM ECC error detected on the NB.
    [  600.682013] [Hardware Error]: Transaction: RD (MEM), no timeout, Cache Level: L3/GEN, Participating Processor: SRC
    [  600.692002] [Hardware Error]: MC4_STATUS: Corrected error, other errors lost: yes, CPU context corrupt: no, CECC Error
    [  600.692005] [Hardware Error]: Northbridge Error, node 0: DRAM ECC error detected on the NB.
    [  600.692007] [Hardware Error]: Transaction: RD (MEM), no timeout, Cache Level: L3/GEN, Participating Processor: SRC
    [  600.702006] [Hardware Error]: MC4_STATUS: Corrected error, other errors lost: yes, CPU context corrupt: no, CECC Error
    [  600.702021] [Hardware Error]: Northbridge Error, node 0: DRAM ECC error detected on the NB.
    [  600.702029] [Hardware Error]: Transaction: RD (MEM), no timeout, Cache Level: L3/GEN, Participating Processor: SRC
    [  600.712041] [Hardware Error]: MC4_STATUS: Corrected error, other errors lost: yes, CPU context corrupt: no, CECC Error
    [  600.712049] [Hardware Error]: Northbridge Error, node 0: DRAM ECC error detected on the NB.
    [  600.712054] [Hardware Error]: Transaction: RD (MEM), no timeout, Cache Level: L3/GEN, Participating Processor: SRC
    [  600.722044] [Hardware Error]: MC4_STATUS: Corrected error, other errors lost: yes, CPU context corrupt: no, CECC Error
    [  600.722052] [Hardware Error]: Northbridge Error, node 0: DRAM ECC error detected on the NB.
    [  600.722056] [Hardware Error]: Transaction: RD (MEM), no timeout, Cache Level: L3/GEN, Participating Processor: SRC
    [  600.732028] [Hardware Error]: MC4_STATUS: Corrected error, other errors lost: yes, CPU context corrupt: no, CECC Error
    [  600.732035] [Hardware Error]: Northbridge Error, node 0: DRAM ECC error detected on the NB.
    [  600.732039] [Hardware Error]: Transaction: RD (MEM), no timeout, Cache Level: L3/GEN, Participating Processor: RES
    [  600.742025] [Hardware Error]: MC4_STATUS: Corrected error, other errors lost: yes, CPU context corrupt: no, CECC Error
    [  600.742033] [Hardware Error]: Northbridge Error, node 0: DRAM ECC error detected on the NB.
    [  600.742037] [Hardware Error]: Transaction: RD (MEM), no timeout, Cache Level: L3/GEN, Participating Processor: SRC
    [  600.752046] [Hardware Error]: MC4_STATUS: Corrected error, other errors lost: yes, CPU context corrupt: no, CECC Error
    [  600.752052] [Hardware Error]: Northbridge Error, node 0: DRAM ECC error detected on the NB.
    [  600.752056] [Hardware Error]: Transaction: RD (MEM), no timeout, Cache Level: L3/GEN, Participating Processor: SRC
    [  600.762514] [Hardware Error]: MC4_STATUS: Corrected error, other errors lost: yes, CPU context corrupt: no, CECC Error
    [  600.762522] [Hardware Error]: Northbridge Error, node 0: DRAM ECC error detected on the NB.
    [  600.762526] [Hardware Error]: Transaction: RD (MEM), no timeout, Cache Level: L3/GEN, Participating Processor: SRC
    [  600.772038] [Hardware Error]: MC4_STATUS: Corrected error, other errors lost: yes, CPU context corrupt: no, CECC Error
    [  600.772046] [Hardware Error]: Northbridge Error, node 0: DRAM ECC error detected on the NB.
    [  600.772051] [Hardware Error]: Transaction: RD (MEM), no timeout, Cache Level: L3/GEN, Participating Processor: SRC
    [  600.782034] [Hardware Error]: MC4_STATUS: Corrected error, other errors lost: yes, CPU context corrupt: no, CECC Error
    [  600.782041] [Hardware Error]: Northbridge Error, node 0: DRAM ECC error detected on the NB.
    [  600.782045] [Hardware Error]: Transaction: RD (MEM), no timeout, Cache Level: L3/GEN, Participating Processor: SRC
    [  600.792048] [Hardware Error]: MC4_STATUS: Corrected error, other errors lost: yes, CPU context corrupt: no, CECC Error
    [  600.792055] [Hardware Error]: Northbridge Error, node 0: DRAM ECC error detected on the NB.
    [  600.792059] [Hardware Error]: Transaction: RD (MEM), no timeout, Cache Level: L3/GEN, Participating Processor: SRC
    [  600.802024] [Hardware Error]: MC4_STATUS: Corrected error, other errors lost: yes, CPU context corrupt: no, CECC Error
    [  600.802032] [Hardware Error]: Northbridge Error, node 0: DRAM ECC error detected on the NB.
    [  600.802036] [Hardware Error]: Transaction: RD (MEM), no timeout, Cache Level: L3/GEN, Participating Processor: SRC
    [  600.813021] [Hardware Error]: MC4_STATUS: Corrected error, other errors lost: yes, CPU context corrupt: no, CECC Error
    [  600.813029] [Hardware Error]: Northbridge Error, node 0: DRAM ECC error detected on the NB.
    [  600.813034] [Hardware Error]: Transaction: RD (MEM), no timeout, Cache Level: L3/GEN, Participating Processor: SRC
    [  600.823020] [Hardware Error]: MC4_STATUS: Corrected error, other errors lost: yes, CPU context corrupt: no, CECC Error
    [  600.823026] [Hardware Error]: Northbridge Error, node 0: DRAM ECC error detected on the NB.
    [  600.823029] [Hardware Error]: Transaction: RD (MEM), no timeout, Cache Level: L3/GEN, Participating Processor: SRC
    [  600.833008] [Hardware Error]: MC4_STATUS: Corrected error, other errors lost: yes, CPU context corrupt: no, CECC Error
    [  600.833022] [Hardware Error]: Northbridge Error, node 0: DRAM ECC error detected on the NB.
    [  600.833026] [Hardware Error]: Transaction: RD (MEM), no timeout, Cache Level: L3/GEN, Participating Processor: SRC
    [  600.843040] [Hardware Error]: MC4_STATUS: Corrected error, other errors lost: yes, CPU context corrupt: no, CECC Error
    [  600.843050] [Hardware Error]: Northbridge Error, node 0: DRAM ECC error detected on the NB.
    [  600.843055] [Hardware Error]: Transaction: RD (MEM), no timeout, Cache Level: L3/GEN, Participating Processor: RES
    [  600.853016] [Hardware Error]: MC4_STATUS: Corrected error, other errors lost: yes, CPU context corrupt: no, CECC Error
    [  600.853024] [Hardware Error]: Northbridge Error, node 0: DRAM ECC error detected on the NB.
    [  600.853029] [Hardware Error]: Transaction: RD (MEM), no timeout, Cache Level: L3/GEN, Participating Processor: RES
    [  600.863039] [Hardware Error]: MC4_STATUS: Corrected error, other errors lost: yes, CPU context corrupt: no, CECC Error
    [  600.863046] [Hardware Error]: Northbridge Error, node 0: DRAM ECC error detected on the NB.
    [  600.863051] [Hardware Error]: Transaction: RD (MEM), no timeout, Cache Level: L3/GEN, Participating Processor: SRC
    [  600.883023] [Hardware Error]: MC4_STATUS: Corrected error, other errors lost: yes, CPU context corrupt: no, CECC Error
    [  600.883043] [Hardware Error]: Northbridge Error, node 0: DRAM ECC error detected on the NB.
    [  600.883048] [Hardware Error]: Transaction: RD (MEM), no timeout, Cache Level: L3/GEN, Participating Processor: SRC
    [  600.923004] [Hardware Error]: MC4_STATUS: Corrected error, other errors lost: yes, CPU context corrupt: no, CECC Error
    [  600.923024] [Hardware Error]: Northbridge Error, node 0: DRAM ECC error detected on the NB.
    [  600.923027] [Hardware Error]: Transaction: RD (MEM), no timeout, Cache Level: L3/GEN, Participating Processor: SRC
    How can I test the whole system to see if it's indeed bad RAM, possibly a motherboard or possibly a CPU? I will be running memtest86 on this tonight, but other than that I'm out of ideas.

    OS: Slackware 13.37, nearly fresh install. Installed GConf, ORBit2 and any other dependencies required fro Google Chrome. I installed them from SlackBuilds.org downloads.
    Motherboard: New SuperMicro H8DAE-2
    Processor: 2 x New Opteron 2435 (hex-core Istanbul)
    Memory: 8x Used 1GB HP Stickers, different actual brands. PC2-3200R-333 - Some are labeled CL3 some are not, they all have the same HP part numbers.
    Graphics: New XFX ATI HD 6750 1GB w/ closed driver
    Hard Drive: OCZ Petrol 128GB SSD
    Power Supply: New Enermax NAXN 750W Modular Power Supply.
    Keyboard/Mouse: New Logitech USB keybaord and mouse

    My second issue is random freezing. I was experiencing this on another system: http://forums.justlinux.com/showthre...eezing-no-logs
    I now have a completely new system, except for the hard drive. EVERYTHING is new. Old system is as described:

    OS: Slackware 13.37 fresh install. I had lost a hard drive and this issue started about a week after getting the new one.
    Motherboard: Asus A8V-Deluxe
    Processor: AMD Athlon 64 3#00+ Not exactly sure. I'm sure I could check the basement if someone wanted me to.
    Memory: 4x 1GB DDR 400 random brands
    Graphics: nVidia Geforce E6200 w/ closed driver
    Hard Drive: OCZ Petrol 128GB SSD
    Power Supply: Thermaltake 420W
    Keyboard/Mouse: Future Shop brand Dynex Keybaord and Logitech mouse.

    Only two things remain the same. The Operating System, and the Hard Drive.

    I would love to get to the bottom of the hard drive issue, but the memory diagnostics would be of more interest since those messages are more annoying...although random freezing seems to be quite annoying as well.

    [EDIT]
    Random freezing still occurred while I only had 2 DIMMs and no ECC messages.
    Last edited by WrinkledCheese; 09-16-2012 at 11:17 PM.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •