[Linux-aus] pcie errors
Russell Coker
russell at coker.com.au
Fri Oct 10 23:35:08 AEDT 2025
00:02.0 PCI bridge: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 PCI
Express Root Port 2 (rev 02)
I had been getting the below errors on my PC (HP z640 with E5-2696v3 CPU).
Above is the lspci line that matches. I have had another problem with that PC
in that the 3rd DIMM slot (identified as CPU0-DIMM6) doesn't work (5 beeps
from the BIOS on boot if a DIMM is installed). The errors started when I
upgraded from linux-image-6.12.27-amd64 (Debian/Testing) to linux-
image-6.12.48+deb13-amd64 (latest Debian/Trixie update) and blasted the inside
of the PC with compressed air to get 5 months of dust and fluff out of it.
Some of the errors seemed to have no affect. But one time kwin_wayland (the
graphics program) hung and I needed to use loginctl terminate-session, and
another time I was playing a movie with mpv and the screen repeatedly got
corrupted in a way that appeared to be hardware decoding with missing data.
My google searches for this only returned results on how to make the kernel
stop displaying such warnings if the error is not a problem. But for me it is
a problem and the google hits weren't helpful.
I took the CPU out and reseated it with new heatsink paste. This has been
reported as a solution to a problem of E5-26xx CPUs having some banks of RAM
not work. This did not affect the RAM issue, the DIMM socket could be damaged
- I bought the system cheap in "unknown condition" and the previous owner
could have damaged it. The same CPU had previously worked correctly in a HP
ML-110 Gen9 with 8 DIMMs installed. I can't rule out the possibility that I
damaged the CPU when transferring it from the ML-110 to the z640 in a way that
caused the issue with one DIMM socket.
The system is now working again, for the moment at least. I completed
watching the movie in question without screen corruption.
What I would like from the experts here is any suggestions about things I may
have missed or misunderstood. Am I right in interpreting this as a PCIe error
related to the CPU root port? Is reseating the CPU the thing to do for that?
Am I right in thinking that the change of kernel version is extremely unlikely
to be connected to the problem?
If the problem comes back is it likely to be caused by the CPU or the
motherboard?
If the problem comes back would it be a possible solution to never use the
PCIe slot that the GPU is currently in?
If anyone here has had such a situation before please let me know how it went.
As an aside before cleaning out the dust the system was annoyingly loud when
BOINC was using 9 CPU cores, Now I can hardly hear it when BOINC is using all
18 cores and my head is within 1 meter of it.
Oct 10 20:46:36 xev kernel: pcieport 0000:00:02.0: AER: found no error details
for 0000:00:02.0
Oct 10 20:46:36 xev kernel: pcieport 0000:00:02.0: AER: found no error details
for 0000:00:02.0
Oct 10 20:46:36 xev kernel: pcieport 0000:00:02.0: AER: Correctable error
message received from 0000:00:02.0
Oct 10 20:46:36 xev kernel: pcieport 0000:00:02.0: AER: found no error details
for 0000:00:02.0
Oct 10 20:46:36 xev kernel: pcieport 0000:00:02.0: AER: Correctable error
message received from 0000:00:02.0
Oct 10 20:46:36 xev kernel: pcieport 0000:00:02.0: AER: found no error details
for 0000:00:02.0
Oct 10 20:46:36 xev kernel: pcieport 0000:00:02.0: AER: Correctable error
message received from 0000:00:02.0
Oct 10 20:46:36 xev kernel: pcieport 0000:00:02.0: AER: found no error details
for 0000:00:02.0
Oct 10 20:46:36 xev kernel: pcieport 0000:00:02.0: AER: Correctable error
message received from 0000:00:02.0
Oct 10 20:46:36 xev kernel: pcieport 0000:00:02.0: AER: found no error details
for 0000:00:02.0
Oct 10 20:46:37 xev kernel: pcieport 0000:00:02.0: AER: Multiple Correctable
error message received from 0000:00:02.0
Oct 10 20:46:37 xev kernel: pcieport 0000:00:02.0: PCIe Bus Error:
severity=Correctable, type=Data Link Layer, (Transmitter ID)
Oct 10 20:46:37 xev kernel: pcieport 0000:00:02.0: device [8086:2f04] error
status/mask=00001040/00002000
Oct 10 20:46:37 xev kernel: pcieport 0000:00:02.0: [ 6] BadTLP
Oct 10 20:46:37 xev kernel: pcieport 0000:00:02.0: [12] Timeout
Oct 10 20:46:37 xev kernel: pcieport 0000:00:02.0: AER: Error of this Agent
is reported first
Oct 10 20:46:37 xev kernel: amdgpu 0000:02:00.0: PCIe Bus Error:
severity=Correctable, type=Data Link Layer, (Transmitter ID)
Oct 10 20:46:37 xev kernel: amdgpu 0000:02:00.0: device [1002:6987] error
status/mask=00001000/00002000
Oct 10 20:46:37 xev kernel: amdgpu 0000:02:00.0: [12] Timeout
Oct 10 20:46:37 xev kernel: snd_hda_intel 0000:02:00.1: PCIe Bus Error:
severity=Correctable, type=Data Link Layer, (Transmitter ID)
Oct 10 20:46:37 xev kernel: snd_hda_intel 0000:02:00.1: device [1002:aae0]
error status/mask=00001000/00002000
Oct 10 20:46:37 xev kernel: snd_hda_intel 0000:02:00.1: [12] Timeout
Oct 10 20:46:37 xev kernel: pcieport 0000:00:02.0: AER: Correctable error
message received from 0000:00:02.0
Oct 10 20:46:37 xev kernel: pcieport 0000:00:02.0: AER: found no error details
for 0000:00:02.0
Oct 10 20:46:37 xev kernel: pcieport 0000:00:02.0: AER: Multiple Correctable
error message received from 0000:00:02.0
Oct 10 20:46:37 xev kernel: pcieport 0000:00:02.0: AER: found no error details
for 0000:00:02.0
Oct 10 20:46:37 xev kernel: pcieport 0000:00:02.0: AER: Multiple Correctable
error message received from 0000:00:02.0
Oct 10 20:46:37 xev kernel: pcieport 0000:00:02.0: PCIe Bus Error:
severity=Correctable, type=Data Link Layer, (Transmitter ID)
Oct 10 20:46:37 xev kernel: pcieport 0000:00:02.0: device [8086:2f04] error
status/mask=00001040/00002000
Oct 10 20:46:37 xev kernel: pcieport 0000:00:02.0: [ 6] BadTLP
Oct 10 20:46:37 xev kernel: pcieport 0000:00:02.0: [12] Timeout
Oct 10 20:46:37 xev kernel: pcieport 0000:00:02.0: AER: Error of this Agent
is reported first
Oct 10 20:46:37 xev kernel: amdgpu 0000:02:00.0: PCIe Bus Error:
severity=Correctable, type=Data Link Layer, (Transmitter ID)
Oct 10 20:46:37 xev kernel: amdgpu 0000:02:00.0: device [1002:6987] error
status/mask=00001100/00002000
Oct 10 20:46:37 xev kernel: amdgpu 0000:02:00.0: [ 8] Rollover
Oct 10 20:46:37 xev kernel: amdgpu 0000:02:00.0: [12] Timeout
Oct 10 20:46:37 xev kernel: snd_hda_intel 0000:02:00.1: PCIe Bus Error:
severity=Correctable, type=Data Link Layer, (Transmitter ID)
Oct 10 20:46:37 xev kernel: snd_hda_intel 0000:02:00.1: device [1002:aae0]
error status/mask=00001100/00002000
Oct 10 20:46:37 xev kernel: snd_hda_intel 0000:02:00.1: [ 8] Rollover
Oct 10 20:46:37 xev kernel: snd_hda_intel 0000:02:00.1: [12] Timeout
--
My Main Blog http://etbe.coker.com.au/
My Documents Blog http://doc.coker.com.au/
More information about the linux-aus
mailing list