[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [LCP]Address out of bounds error on Linux 7.3



Let an old IBM mainframe programmer explain how a program can work on one machine but not another (or work until a trivial change in data storage has been made and not thereafter).

In the implementation of most operating systems, the check for "out of bounds" isn't by exact bytes of storage defined but in terms of "pages" of some specified size. It's simply easier to do it that way (on the mainframes with hardware protection by page, MUCH easier). Remember, it's virtual memory getting translated to real memory somehow, and that's usually by "page". In other words, if your program "owns" any of a page it owns all of it and won't be considered "oob" if it steps on the undefined (to the program) portion of a page it "owns". Many times a puzzled programmer has brought this sort of problem to me and I remember very well a bug of this sort which survived in production over 20 years causing much merriment when it finally crashed the application after a minor change (size of buffers increased) because by then the programmer originally responsible for the bad code was the senior vice president in charge of all DP and supposedly safely beyond "winning" the purple weiner (a little "trophy" passed to whoever caused the latest serious production hang and had to keep until it could be passed on to the next "winner").

So.......first rule out that you have indeed stepped out of bounds in spite of the fact that the program works on another system (for which page size might be different, which might be "paged" while the new system is "exact", which might assign space in different order, etc. ---- all of these can cause an actual "oob" error to be noticed on one system/machine but not another --- and remember the rule, that's NOT a bug in the system which failed to catch the error because the only guarantee is that a "correct" system works with correct code (all bets are off about what happens with bad code)

An undefined pointer or runaway subscript wouldn't be the cause (that's always hang) , but subscripting just a little too far could work on one system but not another. How about the assignment of space for your structures? (don't know your style; I rarely use space IN my programs but allocate space at runtime, so if it were mine I would check the LAST in physical allocation).

Mike