[Linux-aus] CPU errors
Steven Ellis
steven.ellis at gmail.com
Tue Feb 4 11:57:45 AEDT 2025
I've had a bunch of older AM2 processors fail over time.
I had a couple of Asus M2NPV-VM from my old MythTV development days. One
machine was upgraded from an X2 3600+ to an Athlon X2 5050e. After about 10
years of constant use I had serious stability issues, and CPU stress tests
confirmed it was the CPU. Swapped back to the original 3600+ I was able to
get a good couple of extra years usage out of the motherboard and RAM
before the motherboard finally gave out.
On Tue, Feb 4, 2025 at 1:51 PM Dan Kortschak via linux-aus <
linux-aus at lists.linux.org.au> wrote:
> On Tue, 2025-02-04 at 11:12 +1100, Russell Coker via linux-aus wrote:
> > https://www.theregister.com/2021/06/04/google_chip_flaws/
> >
> > I've been considering this issue of flaws in CPUs ever since it first
> > was
> > reported 4 years ago.
> >
> > https://arxiv.org/pdf/2102.11245
> >
> > There is no good data published about how common such problems are
> > but
> > Facebook states "hundreds" of CPUs out of "hundreds of thousands" of
> > systems
> > which implies something like 1/1000.
> >
> > Over the years the number of machines I've run (with actual root
> > access - not
> > counting cloud VMs) adds up to more than 1000. I expect that there
> > are people
> > on this list who have run 1000+ machines at one time.
> >
> > If something has an incidence of 1/1000 there's a good chance that it
> > has
> > happened to a system that I run, and the probability that none of the
> > systems
> > run by people on this list have had the problem would be very low.
> >
> > Has anyone seen such things and known it? If not does that imply
> > that some of
> > us are just losing data for ourselves and our clients without knowing
> > it?
> >
> > The nearest I've come to seeing this is a Pentium D system that I got
> > from
> > corporate rubbish back when AMD64 systems were still new and rare. I
> > tried to
> > install Debian on it and it got SEGVs on uncompressing packages. I
> > replaced
> > the RAM with no change (desktop system without ECC support) and then
> > just sent
> > it to e-waste without any further thought. In retrospect I should
> > have done
> > more research on that system to find out what was wrong, but at the
> > time I was
> > more focussed on getting working systems than on studying computer
> > engineering.
> >
> > Modern CPUs have caches that are bigger than the hard drives in early
> > Linux
> > systems. The PC I'm using to write this message has 46M of CPU cache
> > which is
> > larger than the storage of the iPaQs I was running Linux on 20 years
> > ago. Has
> > anyone written a recovery image that will lock itself into the cache,
> > verify
> > checksums, and then test things like RAM for errors?
>
> I have an old i5 laptop that can stay up for about 2-4 hours and then
> will black screen with a consistent kernel panic, suggesting a specific
> logic flaw that has arisen in the CPU. It's used for playing DVDs now
> since that uptime is reasonable in this use. It worked fine for many
> years, so this is wear degradation, but the linked article also
> referred to those.
>
> Dan
>
> _______________________________________________
> linux-aus mailing list
> linux-aus at lists.linux.org.au
> https://lists.linux.org.au/mailman/listinfo/linux-aus
>
> To unsubscribe from this list, send a blank email to
> linux-aus-unsubscribe at lists.linux.org.au
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.linux.org.au/pipermail/linux-aus/attachments/20250204/6af5c0a7/attachment.htm>
More information about the linux-aus
mailing list