[LCP]register question

Tue Nov 20 02:06:09 UTC 2001

----- Original Message -----
From: Joachim Bauernberger <bj at gmx.net>
> Hi,
> I am just experimenting with some code to find out what are the
benefits of
> declaring a variable as register int. are.
>   ............
> Some of the register types will be used and some not depending
> how many registers are available in the CPU at the time..."
>
> My question. When do I know that using register will gain me some
processing
> speed. (Apart from when I am actually running the program ant checking
with
> time)
>
> Is there really any benefit from declaring an int with register on
modern
> Pentium class CPU's?
> Or is the compiler already smart enough to know when he should
internally
> speed things up so it wouldn't really be necesarry with gcc?
>
Joachim,
    Whether the specification "register" will speed up processing
depends upon the CPU architecture and the quality of compiler
optimization. The only time you should consider using "register" is:
1) The CPU has lots of registers
2) The optimizer doesn't work very well
3) You are writing for just this one machine
4) You are a pretty competent assembler programmer
5) Speed is REALLY critical
    If there aren't a lot of registers there won't be any to spare for
the purpose. If the optimizer already makes wise choices when to leave
something in a register, there's nothing to improve (but note that
optimaization may refer to space instead of time or to some compromise).
If your code runs on more than one machine, the answers would be
different (any benefits from "register" are not portable). If you aren't
an experienced programmer, you won't be able to judge which assignement
of registers would be fastest -- essentially one is optimizing speed of
assembly code*. And at best, "register" isn't going to be able to speed
things up very much**. If you need a critical portion of this program to
run at least 4 or 5 times faster it's better logic*** you need, not
"register".

    By and large, you are unlikely to be able to get much improvement
from "register", especially since the Pentium devices have only a few
registers and gcc optimizes well.

Mike

* essentially what you'd need to do is examine the assembly code
produced by the compiler and see where things would be improved by a
different assignment of registers, then attempt to get the compiler to
do just that by SELECTIVELY specifying "register". In other words, you
have to be able to spot WHICH item should get the "register"
specification. This is a situation where "more may be worse" because
almost surely the compiler will NOT be able to optimize that choice (if
you specify three things "register" and it has room to only do that for
one it will be chance whether it picks the one which makes the real
difference.

** by having an item in a register you save at most one load and one
store (in other words, execute one rather than three machine
instructions). OK, depending upon architecture, instructions may or may
not require different times. Note that on devices with a lot of "on chip
cache" you shouldn't assume that register operations are a lot faster
(individual loads and stores may not be limited by bus speed). I was
unable to utilize my "tuning exercise" in the way I intended because it
immediately became obvious by the speed at which the routine ran
"pre-tuning" that it was NOT actually making as many "bus operations" as
I imagined.

*** if speed matters, we are presumably in an "inner loop". Serious
improvements in speed are much more likely by seeing which if any
computations, index calculations, etc. can be done outside the loop.
This requires some experience because at first glance your "improved"
code may appear to be doing more, not less (might have more different
lines to be executed) -- but of course it isn't a matter of how many
different lines are executed at least once but how many TOTAL executions
there are.

    You'll probably need an example to see what I am talking about.
Let's say we have a loop which processes a deeply indexed "buckets" (in
other words, we will be executing the loop for all the different buckets
in turn). Well it might well pay to stick in a move of each bucket
before the loop to a fixed location variable and a move of that location
to the proper bucket afterwards, adding two "unnecessary" indexed
operations but saving repeated calculations of that  index value within
the loop. Note that even in this case matters like "how many registers"
and "how good an optimizer" play a role, because with a good optimizer
and enough registers the compiler might be smart enough and be able to
afford leaving the fully computed index value in a register for the
duration of loop execution. On the other hand......

    A really experienced coder might "by habit" put no calculations
(either explicit or index) within a loop than can possibly be done
outside it. The theory here is that "there are never really enough
registers" and code so written will be close to optimal speed from the
get go. No need later to go back over it looking for places where it
might be speeded up. Note also that unlike specifications like
"register", this kind of tuning is going to be portable.