Trailing-Edge
-
PDP-10 Archives
-
decus_20tap1_198111
-
decus/20-0004/shallo.txt
There are no other files named shallo.txt in the archive.
Date: 22 April 1976
to: INTERLISP Users
From: W. Teitelman
Subject: Shallow Binding in INTERLISP-10
XEROX
Overview
Beginning with the 12-APR-76 release, INTERLISP-10 will
use the "shallow binding" scheme for binding variables, as
opposed to the "deep binding" scheme previously used. Under
the deep binding scheme, associated with each variable was a
special cell called its value cell which contained its "top
level" value. When a variable was rebound, its new value
was stored on the stack. Thus, the current value of a
variable was that found in its value cell only if the
variable had not been rebound. To obtain the value of a
variable, the stack would be searched to find where
(whether) the variable was bound, a process which could be
time consuming, especially if the depth of the computation,
i.e. stack, was great.
In order to minimize this lookup time, whenever a
compiled function was entered, the stack would be searched
(once only) for the bindings of all of the variables used
freely by that function. This in turn led to the
introduction of the LOCALFREEVAR kluge to override this
lookup for variables that were used freely somewhere within
Page 2
a block but nevertheless were local to the block, i.e.
bound elsewhere within the block. Similarly, the concept of
a global variable was introduced into the deep binding
system in an attempt to circumvent the lookup time by
informing the compiler not to bother to search the stack
when the indicated variable was referenced, but simply to
always reference its value cell (top level value). This in
turn led to the introduction of RESETVAR, so that global
variables could be rebound, albeit expensively.
All of this goes away under shallow binding. Under
shallow binding, the current value of a variable is always
stored in its value cell. When a variable is rebound, its
old value is stored on the stack, and the new value put in
its value cell. Thus, obtaining the value of or setting a
variable is fast and independent of the depth of the
computation. Essentially, under shallow binding every
variable acts like a global variable does with deep binding.
In addition, whereas the benefit of declaring a variable to
be global under deep binding is only realized in compiled
code, in the shallow binding system, variable lookup time is
eliminated for interpreted as well as compiled code.
Shallow binding also makes it less important to block
compile, since one of the main advantages of block
compilation in the deep binding system is the reduction of
time spend in free variable lookup, because free variables
are looked up only once for the entire block. (However, it
Page 3
is still the case that function calls within a block are
much faster than calls to a function outside of a block.)The
cost of shallow binding is that bind and unbinding are more
complicated operations than under deep binding. As a
result, stack operations involving context switches, e.g.
RETFROM, RETEVAL, STKEVAL, etc., are less efficient.
However, preliminary studies indicated that far more time
was spent in variable lookup then in variable binding. The
timing comparisons between the two systems bear this out.
Timing Comparisons
Here are some timing results of several benchmark test
cases (time in seconds):
(1) DWIMIFYing a fairly large function with lots of record
expressions and iterative statements
deep shallow
block compiled 5.3 5.6
compiled 14.8 7.6
(2) CLISPIFYing a large function
deep shallow
block compiled 5.8 5.2
(3) doing a MAKEFILE
deep shallow
block compiled 13.6 13.7
(4) J. Moore's theorem prover was converted to the shallow
binding system and run on a typical test case. This program
does not perform any context switching.
deep shallow
compiled 20.9 15.5
(5) Various examples using FLIP, a format directed list
processor. This program has not been tuned for either
system (and is in fact ten years old).
deep shallow
interpreted 7.03 4.54
compiled .83 .42
Page 4
block compiled .49 .40
(6) Finally, since switching contexts is more expensive in
shallow binding than deep binding, a test case was selected
which would be a "worst case" for shallow binding, in that
it would not use any free variables, and do a lot of context
switching. The example is to compare the leaves of two
trees, using coroutine and resume (the example found on page
12.12 of the INTERLISP manual). The two trees were lists of
length 30, whose first element was 30 cars deep, whose
second element was 29 cars deep, etc. The two trees were in
fact equal, so that the program had to run through all of
their leaves.
deep shallow
interpreted 5.36 5.60
compiled .54 .92
block compiled .43 .75
In addition, two large experimental systems under
development at PARC that make heavy use of the spaghetti
stack capability were converted to the shallow binding
system. Both ran slightly faster block compiled under
shallow binding than deep binding, and considerably faster
interpreted and compiled normally.
Converting to the Shallow Binding System
Conversion to the shallow binding system is quite
straightforward. For most programs, it simply requires
recompilation. For the rest, conversion can be
semi-automated using EDITFNS, as described below. In no
case has conversion taking more than a few hours. Here are
the things to look for:
Page 5
GETTOPVAL and SETTOPVAL
GETTOPVAL and SETTOPVAL work in the shallow binding
system. However, finding the top level value of a variable
involves searching the entire stack looking for bindings of
this variable. If the variable has been rebound, its top
level value will be stored on the stack at the place where
it was first rebound. Note that the entire stack will have
to be searched even in the case where the variable was not
rebound. In other words, asking for the top level value of
a variable that you are not rebinding is very costly, and
the exact opposite of what you probably intend. Thus, you
probably will want to change all references to the top level
value of a variable to simply reference the value of the
variable. In other words, instead of writing (GETTOPVAL X),
write (EVAL X) or (EVALV X) (the latter is much faster).
(If the top level value is being referenced as a result of
the variable being declared a GLOBALVAR, you don't have to
make any changes; GLOBALVAR declarations are simply ignored
by the compiler under shallow binding.) For backwards
compatibility with the deep binding system, the functions
GETATOMVAL, SETATOMVAL, and /SETATOMVAL are defined in both
the new shallow binding system and the current deep binding
system. In deep binding, they reference the top level
value, i.e. are the same as GETTOPVAL, SETTOPVAL, and
/SETTOPVAL. In the shallow binding system, they reference
the current value. In both systems, they compile open where
Page 6
possible. Thus, to convert to the shallow binding system,
simply change each occurrence of GETTOPVAL, SETTOPVAL and
/SETTOPVAL to GETATOMVAL, SETATOMVAL, and /SETATOMVAL,
respectively.
RESETVAR and RESETSAVE
RESETVAR works in shallow binding, but is inefficient
and totally unnecessary: you can simply rebind the variable
in question. Thus, in the shallow binding system, RESETVAR
could be replaced by an equivalent PROG. However, for
backwards compatibility, there is a new function, RESETVARS,
which is defined in both the shallow binding system and the
current deep binding system.. The form of RESETVARS is the
same as that of PROG, i.e. (RESETVARS variables expression
... expression). Labels are permitted, and a RETURN is
required to specify the value of the RESETVARS form. In the
deep binding system, RESETVARS compiles to an equivalent
expression using RESETVAR. In the shallow binding system,
RESETVARS is in fact simply PROG. Since RESETSAVE in
conjunction with RESETLST is occasionally used to rebind and
reset global variables in the deep binding system, you also
will need to examine all RESETSAVE expressions to see if
they are being used to reset a variable, and if so, to
rebind the variable using RESETVARS, and reset it with a
simple SETQ.
Page 7
RPAQ, RPAQQ, ADDTOVAR
These functions work on the current value of a variable
in the shallow binding system. If you don't use them in
your program, don't worry; PRETTYDEF and LOAD are properly
interfaced. If you do use them, you will have to look at
each case and decide what is the proper thing to do.
CAR of an atom
is no longer its value. CAR of an atom is now undefined,
except for CAR of NIL, which is guaranteed to be NIL. (CDR
of an atom continues to be its property list, but you really
should use GETPROPLIST instead in case this changes in the
future. However, you can be assured that CDR of NIL will
always be NIL. ) Thus, you should check your programs for
expressions of the form (CAR (QUOTE --)), (RPLACA (QUOTE --)
--), and (/RPLACA (QUOTE --) --). The first should be
changed to a simple variable reference, the second to a
simple SETQ, and the third to (/SETATOMVAL (QUOTE --) --).
You undoubtedly will have places where you "cleverly" took
CAAR of a list knowing that CAR was an atom, and CAR the
value of that atom. (I am still finding some of these in
the INTERLISP code!) There isn't anyway to find these except
to run your programs.
Page 8
Automating the conversion using EDITFNS
Thus, to convert to the shallow binding system, load
your file, and perform EDITFNS(filename (EXAM (CAR (QUOTE
--)) (RPLACA (QUOTE --) --) (/RPLACA (QUOTE --) --)
GETTOPVAL SETTOPVAL /SETTOPVAL RESETVAR RESETSAVE RPAQQ RPAQ
ADDTOVAR)), and make the indicated changes. Note that
(RESETVAR X Y form) needs to be changed to (RESETVARS ((X
Y)) (RETURN form)). However, several nested RESETVAR
expressions can be combined into a single RESETVARS
expression.
Block Compiling under Shallow Binding
If you blockcompile, you may see some new error
messages of the form ****(X SHOULD BE A SPECVAR). This
message means exactly what it says. The reason why this
message did not occur previously is the reason why some
spaghetti-stack programs did not work block compiled. If a
variable is used freely in a block, any function within the
block that binds the variable must make a new frame that
contains this binding. Otherwise, if you RESUME, RETFROM,
or otherwise change contexts from within the block, and
later come back, the variable will not be correctly
restored. If the variable is already declared to be a
LOCALFREEVAR, you don't have to do anything else (and in
fact won't even see this error message), because
Page 9
LOCALFREEVARS are the same as SPECVARS in the shallow
binding system. If not, simply declare the variable to be
either a SPECVAR or LOCALFREEVAR using the standard block
declaration format.
GLOBALVAR declarations are treated the same as SPECVAR
declarations in shallow binding system, except that the
compiler will not include the names of GLOBALVARS in the
list of free variables that it prints when compiling your
function.
Tuning Block Compiled Programs for Shallow Binding
Although there is some difference in efficiency
tradeoffs between shallow binding and deep binding, unless
you are a fanatic, it probably won't pay to bother tuning
your programs for shallow binding. As an exercise, I spent
a lot of time tuning up things like DWIM for shallow
binding, and the best improvement I was able to obtain was
about 10%. However, the process was instructive, and I
include it here for general education.
The main inefficiency introduced with shallow binding
comes as a result of the fact that binding is now more
expensive, especially if it results in having to make an
extra frame on an internal function call in the block.
(Strictly speaking, the extra frame is not an artifact of
shallow binding, but of the spaghetti stack. If we had not
Page 10
converted to shallow binding, it would have been necessary
to require this extra frame in the deep bound spaghetti
system in order to make block compiled code work properly,
as described earlier.) For example, suppose FOO binds the
variable X, and calls FIE which uses X freely, and FOO and
FIE are both in the same block. Every time that FOO is
called within the block, a new frame will be (has to be)
made. If FIE is called very often from FOO, i.e. if FOO
does a lot of computation, then this cost is probably not
significant. However, if FOO is called often, and FOO only
calls FIE once or twice, it may be worthwhile to make X be
an argument of FIE. Note that this only wins if X is the
only SPECVAR that FOO binds. If there are others, and a new
frame is going to be made anyway, don't worry about it; the
time required to bind another variable is not significant.
MASTERSCOPE is invaluable for finding these sorts of
situations. For example, you can say "WHO IN BLOCKS OF
function BINDS ANY IN SPECVARS OF function", and then look
at those functions which bind only one or two specvars and
see if you can get rid of these bindings.
Another related inefficiency comes about when you have
a variable which really has to be a free variable, and then
also use it elsewhere within the block as a local variable.
For example, suppose FOO calls FOO1, FOO2, and FOO3, and
Page 11
they all share the free variable TAIL, communicating among
one another by setting and resetting this variable so that
it is awkward to have to pass it back and forth as an
argument. Suppose FOO, FOO1, FOO2, and FOO3 are all in
single block along with the function FIE which also binds a
variable named TAIL. Each time that FIE is called, a new
frame will have to be made because TAIL, a SPECVAR, is being
rebound. (As far as the compiler is concerned, it's the
same TAIL.) If FIE doesn't call anybody that uses the value
of its TAIL, then this frame is unnecessary (but the
compiler doesn't know this!). The best thing to do in this
case is to change the name of TAIL to something else, e.g.
$TAIL.
For this situation, again I found MASTERSCOPE
invaluable. Basically, what you want to know is which
functions in a block bind a SPECVAR and don't call anybody
in the block that use the variable freely, without going
through a function which rebinds the variable. I found a
lot of these in DWIM (but remember eliminating all of them
only resulted in about 10% improvement) by asking
MASTERSCOPE for each SPECVAR "WHO IN functions BINDS
variable AND IS NOT ON PATH TO ANY USING variable FREELY
NOTRACE ANY BINDING variable". Since then, Larry Masinter
has included in MASTERSCOPE a function called MSCHECKBLOCKS
which given a file name performs this check, as well as
several other useful checks, e.g. functions which are not
Page 12
called by any functions in the block and also are not
entries, functions which are called by functions outside of
the block and are not entries, etc. MSCHECKBLOCKS is in the
shallow binding system. You don't even have to load the
file.
Other Miscellaneous Differences
The stack will look a little different when you do a
backtrace in the shallow binding system because there are
many more frames than previously. As described earlier, in
the shallow binding system, whenever a function in a block
binds a SPECVAR or LOCALFREEVAR, it must make a new frame.
If this function is a RETFN, the name of the frame is the
name of the function, i.e. if you do a STKPOS searching for
that function from below that point on the stack, the search
will stop at this frame. If the function is not a RETFN,
then its name will not appear on the stack. Instead, the
function name associated with that frame, i.e. the name
printed by backtrace, will be the result of putting a * at
the beginning and end of the function name, e.g. *FOO*,
*CLISPATOM*. Actually, such "function names" can be given
as arguments to the various stack functions, but this
procedure is frowned upon. In particular, the corresponding
programs will not work interpreted.
Page 13
When a new frame is created on a internal function call
within a block, only those variables that are SPECVARS or
LOCALFREEVARS will have their names stored with the
bindings. For example, if FOO is called from within a
block, and binds X and Y, and Y is a SPECVAR, if you do a
STKSCAN for X from below FOO, the search will not see the X,
even though there is a frame named *FOO* on the stack,
containing a binding for Y. The binding for X will be
"unnamed." However, for aid in debugging, BREAK will supply
the names of such variables when it can, using the same
convention of putting a * in front and behind the variable
name. For example, if you go into a break below FOO (i.e.
below *FOO*), and use the @ command to "point" at this
frame, and then type ?= you will see:
*X* = value of X
Y = value of Y
The same procedure is followed when a function is called
that is not inside of a block, but some of whose variables
are declared to be LOCALVARs (not LOCALFREEVARs).