[Linux-aus] pcie errors

Russell Coker russell at coker.com.au
Fri Oct 10 23:35:08 AEDT 2025


00:02.0 PCI bridge: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 PCI 
Express Root Port 2 (rev 02)

I had been getting the below errors on my PC (HP z640 with E5-2696v3 CPU).  
Above is the lspci line that matches.  I have had another problem with that PC 
in that the 3rd DIMM slot (identified as CPU0-DIMM6) doesn't work (5 beeps 
from the BIOS on boot if a DIMM is installed).  The errors started when I 
upgraded from linux-image-6.12.27-amd64 (Debian/Testing) to linux-
image-6.12.48+deb13-amd64 (latest Debian/Trixie update) and blasted the inside 
of the PC with compressed air to get 5 months of dust and fluff out of it.

Some of the errors seemed to have no affect.  But one time kwin_wayland (the 
graphics program) hung and I needed to use loginctl terminate-session, and 
another time I was playing a movie with mpv and the screen repeatedly got 
corrupted in a way that appeared to be hardware decoding with missing data.

My google searches for this only returned results on how to make the kernel 
stop displaying such warnings if the error is not a problem.  But for me it is 
a problem and the google hits weren't helpful.

I took the CPU out and reseated it with new heatsink paste.  This has been 
reported as a solution to a problem of E5-26xx CPUs having some banks of RAM 
not work.  This did not affect the RAM issue, the DIMM socket could be damaged 
- I bought the system cheap in "unknown condition" and the previous owner 
could have damaged it.  The same CPU had previously worked correctly in a HP 
ML-110 Gen9 with 8 DIMMs installed.  I can't rule out the possibility that I 
damaged the CPU when transferring it from the ML-110 to the z640 in a way that 
caused the issue with one DIMM socket.

The system is now working again, for the moment at least.  I completed 
watching the movie in question without screen corruption.


What I would like from the experts here is any suggestions about things I may 
have missed or misunderstood.  Am I right in interpreting this as a PCIe error 
related to the CPU root port?  Is reseating the CPU the thing to do for that?  
Am I right in thinking that the change of kernel version is extremely unlikely 
to be connected to the problem?

If the problem comes back is it likely to be caused by the CPU or the 
motherboard?

If the problem comes back would it be a possible solution to never use the 
PCIe slot that the GPU is currently in?

If anyone here has had such a situation before please let me know how it went.


As an aside before cleaning out the dust the system was annoyingly loud when 
BOINC was using 9 CPU cores,  Now I can hardly hear it when BOINC is using all 
18 cores and my head is within 1 meter of it.

Oct 10 20:46:36 xev kernel: pcieport 0000:00:02.0: AER: found no error details 
for 0000:00:02.0
Oct 10 20:46:36 xev kernel: pcieport 0000:00:02.0: AER: found no error details 
for 0000:00:02.0
Oct 10 20:46:36 xev kernel: pcieport 0000:00:02.0: AER: Correctable error 
message received from 0000:00:02.0
Oct 10 20:46:36 xev kernel: pcieport 0000:00:02.0: AER: found no error details 
for 0000:00:02.0
Oct 10 20:46:36 xev kernel: pcieport 0000:00:02.0: AER: Correctable error 
message received from 0000:00:02.0
Oct 10 20:46:36 xev kernel: pcieport 0000:00:02.0: AER: found no error details 
for 0000:00:02.0
Oct 10 20:46:36 xev kernel: pcieport 0000:00:02.0: AER: Correctable error 
message received from 0000:00:02.0
Oct 10 20:46:36 xev kernel: pcieport 0000:00:02.0: AER: found no error details 
for 0000:00:02.0
Oct 10 20:46:36 xev kernel: pcieport 0000:00:02.0: AER: Correctable error 
message received from 0000:00:02.0
Oct 10 20:46:36 xev kernel: pcieport 0000:00:02.0: AER: found no error details 
for 0000:00:02.0
Oct 10 20:46:37 xev kernel: pcieport 0000:00:02.0: AER: Multiple Correctable 
error message received from 0000:00:02.0
Oct 10 20:46:37 xev kernel: pcieport 0000:00:02.0: PCIe Bus Error: 
severity=Correctable, type=Data Link Layer, (Transmitter ID)
Oct 10 20:46:37 xev kernel: pcieport 0000:00:02.0:   device [8086:2f04] error 
status/mask=00001040/00002000
Oct 10 20:46:37 xev kernel: pcieport 0000:00:02.0:    [ 6] BadTLP                
Oct 10 20:46:37 xev kernel: pcieport 0000:00:02.0:    [12] Timeout               
Oct 10 20:46:37 xev kernel: pcieport 0000:00:02.0: AER:   Error of this Agent 
is reported first
Oct 10 20:46:37 xev kernel: amdgpu 0000:02:00.0: PCIe Bus Error: 
severity=Correctable, type=Data Link Layer, (Transmitter ID)
Oct 10 20:46:37 xev kernel: amdgpu 0000:02:00.0:   device [1002:6987] error 
status/mask=00001000/00002000
Oct 10 20:46:37 xev kernel: amdgpu 0000:02:00.0:    [12] Timeout               
Oct 10 20:46:37 xev kernel: snd_hda_intel 0000:02:00.1: PCIe Bus Error: 
severity=Correctable, type=Data Link Layer, (Transmitter ID)
Oct 10 20:46:37 xev kernel: snd_hda_intel 0000:02:00.1:   device [1002:aae0] 
error status/mask=00001000/00002000
Oct 10 20:46:37 xev kernel: snd_hda_intel 0000:02:00.1:    [12] Timeout               
Oct 10 20:46:37 xev kernel: pcieport 0000:00:02.0: AER: Correctable error 
message received from 0000:00:02.0
Oct 10 20:46:37 xev kernel: pcieport 0000:00:02.0: AER: found no error details 
for 0000:00:02.0
Oct 10 20:46:37 xev kernel: pcieport 0000:00:02.0: AER: Multiple Correctable 
error message received from 0000:00:02.0
Oct 10 20:46:37 xev kernel: pcieport 0000:00:02.0: AER: found no error details 
for 0000:00:02.0
Oct 10 20:46:37 xev kernel: pcieport 0000:00:02.0: AER: Multiple Correctable 
error message received from 0000:00:02.0
Oct 10 20:46:37 xev kernel: pcieport 0000:00:02.0: PCIe Bus Error: 
severity=Correctable, type=Data Link Layer, (Transmitter ID)
Oct 10 20:46:37 xev kernel: pcieport 0000:00:02.0:   device [8086:2f04] error 
status/mask=00001040/00002000
Oct 10 20:46:37 xev kernel: pcieport 0000:00:02.0:    [ 6] BadTLP                
Oct 10 20:46:37 xev kernel: pcieport 0000:00:02.0:    [12] Timeout               
Oct 10 20:46:37 xev kernel: pcieport 0000:00:02.0: AER:   Error of this Agent 
is reported first
Oct 10 20:46:37 xev kernel: amdgpu 0000:02:00.0: PCIe Bus Error: 
severity=Correctable, type=Data Link Layer, (Transmitter ID)
Oct 10 20:46:37 xev kernel: amdgpu 0000:02:00.0:   device [1002:6987] error 
status/mask=00001100/00002000
Oct 10 20:46:37 xev kernel: amdgpu 0000:02:00.0:    [ 8] Rollover              
Oct 10 20:46:37 xev kernel: amdgpu 0000:02:00.0:    [12] Timeout               
Oct 10 20:46:37 xev kernel: snd_hda_intel 0000:02:00.1: PCIe Bus Error: 
severity=Correctable, type=Data Link Layer, (Transmitter ID)
Oct 10 20:46:37 xev kernel: snd_hda_intel 0000:02:00.1:   device [1002:aae0] 
error status/mask=00001100/00002000
Oct 10 20:46:37 xev kernel: snd_hda_intel 0000:02:00.1:    [ 8] Rollover              
Oct 10 20:46:37 xev kernel: snd_hda_intel 0000:02:00.1:    [12] Timeout        

-- 
My Main Blog         http://etbe.coker.com.au/
My Documents Blog    http://doc.coker.com.au/





More information about the linux-aus mailing list