[Linux-aus] pcie errors

Russell Coker russell at coker.com.au
Sat Oct 11 14:29:17 AEDT 2025


On Saturday, 11 October 2025 07:37:54 AEDT Adam Nielsen via linux-aus wrote:
> > What I would like from the experts here is any suggestions about things I
> > may have missed or misunderstood.  Am I right in interpreting this as a
> > PCIe error related to the CPU root port?  Is reseating the CPU the thing
> > to do for that? Am I right in thinking that the change of kernel version
> > is extremely unlikely to be connected to the problem?
> 
> It's unlikely to be the kernel version given that the PCIe code doesn't
> often change that significantly, and that physical actions (like the
> cleaning you mentioned, or just moving a computer in a vehicle) are
> notorious for causing these kinds of problems.

OK.

> In my experience the majority of these issues are caused by the RAM
> making poor electrical contact in its socket.  Often reseating the
> memory DIMMs helps as they are very sensitive to less than perfect
> electrical connections to the socket.

I doubt that memory errors are happening as the system uses ECC RAM so any 1 
bit error will be corrected and logged and every 2 bit error will be logged 
and flagged.  As I have an odd number of DIMMs installed the advanced ECC mode 
isn't available so if there are 3 or 4 bit errors (EG one chip is bad on a 
DIMM) then they could get past.

> If you have some electrical contact cleaner (available from automotive
> shops and some hardware stores), spraying it liberally into the memory
> socket and across the pins on the stick of memory itself will help wash
> off any stubborn dust and oils from the contacts that could be
> preventing a clean electrical connection.

https://www.bunnings.com.au/wd-40-290g-specialist-fast-drying-contact-cleaner_p6100409

Is hardware store stuff good enough for cleaning DIMM contacts?  A quick 
search turned up WD-40 at Bunnings, it seems that WD-40 is the brand and they 
have a range of different types of cleaning spray in the same type of can 
which is annoying.

Al Maclang suggested isopropyl alcohol.  Would that be the main ingredient in 
those contact cleaners?

> The second most common cause of strange issues is a deteriorating power
> supply not able to maintain stable enough voltages, so if cleaning the
> memory doesn't help, trying an alternate power supply is often a good
> idea, although I think with all the voltage regulation on modern
> motherboards this is probably less of an issue than it once was.

The z640 systems I have aren't ones with hot-swap PSUs so I'm not enthusiastic 
about swapping those, it's a significant pain to take them out.  Has anyone 
made a device for monitoring PC power?  It would be nice to have a PCIe card 
that connects to a SATA power cable and tells you the voltage on each line.  I 
did a quick check on AliExpress and couldn't find any such device.

-- 
My Main Blog         http://etbe.coker.com.au/
My Documents Blog    http://doc.coker.com.au/





More information about the linux-aus mailing list