[LC++]optimizing array access: x86 instruction timings

Thu Jan 29 13:49:01 UTC 2004

Hi,

I am trying to work out efficiency issues with using
loops.

If I have:

  int const n=600000;
  int a[n];
  int b[n];
  int c[n];

I can either do:

  for (int i=0; i!=n; ++i) {
    a[i]=1; b[i]=1; c[i]=1;
  }

or I can do:

  {
    int i;
    int* ap;
    int* bp;
    int* cp;
    for (i=0, ap=&(a[0]), bp=&(b[0]), cp=&(c[0]); i!=n; 
        ++i, ++ap, ++bp, ++cp) {
      *ap=1; *bp=1; *cp=1;
    }
  }

among other variations.

I want to know when pointer incrementing will be more efficient
than indexing and vice versa.  So I compiled my test program
using g++ -S and noticed that with indexing you get assembler
commands like:

        movl    $0, -808(%ebp,%eax,4)

whereas with pointer incrementing you get commands like:

        movl    $1, (%edx)

although later you have to update the pointer with:

        addl    $4, %edx

But the question is: do these instructions have different
timings, or are they all the same?  Ie is the addressing mode
used for indexing expensive enough that pointer incrementing
is worth the effort, or not?

But I haven't been able to find much in the way of x86 timings
for instructions such as these, and what information I have
found is conflicting.  Ie, one place it said that from around
the 80386 and later all addressing modes took the same amount
of time, meaning pointer incrementing is a bad solution.  But
elsewhere seemed to indicate this was not true.  Does anyone
know / is anyone able to point me to a good source for this
kind of information?

As an alternative approach I tried to write some test programs 
and time them.  Unfortunately I've been having problems with 
this as well.  To get test programs that take a reasonable 
time to run one needs to write redundant code, ie code that 
does a useless thing over and over.  Running the compiler 
with optimization can get rid of a lot of this redundancy, 
hence stuffing up the timings.  But we want to compare the 
timings of different methodologies _with_ optimization on.  
There are probably ways of doing this, but it seems non 
trivial.

Cheers,

Mark.