Trailing-Edge
-
PDP-10 Archives
-
decus_20tap1_198111
-
decus/20-0004/03lisp.doc
There are no other files named 03lisp.doc in the archive.
SECTION 3
DATA TYPES, STORAGE ALLOCATION
1
GARBAGE COLLECTION, AND OVERLAYS
2
INTERLISP operates in an 18-bit address space. This address space is
divided into 512 word pages with a limit of 512 pages, or 262,144 words,
but only that portion of address space currently in use actually exists
on any storage medium. INTERLISP itself and all data storage are
contained within this address space. A pointer1 to a data element such
as a number, atom, etc., is simply the address of the data element in
this 18-bit address space.
3.1 DATA TYPES
The data-types2 of INTERLISP are lists,3 atoms,4 pnames,5 arrays,6 large and
small integers,78 floating point numbers,9 string characters10 and string
pointers.11 There is also a way to define new data-types. Compiled code
and hash arrays12 are currently included with arrays.
In the descriptions of the various data-types given below, for each data
type, first the input syntax and output format are described, that is,
what input sequence will cause the INTERLISP read program to construct
an element of that type, and how the INTERLISP print program will print
such an element. Next, those functions that construct elements of that
data-type are given. Note that some data-types cannot be input, they
can only be constructed, e.g. arrays. Finally, the format in which an
element of that data-type is stored in memory is described.
------------------------------------------------------------------------
1
This section was written by A. K. Hartley and J. W. Goodwin.
2
INTERLISP is currently implemented on (or implementations are in
progress for) at least four different machines. This section treats
subjects that are for the most part somewhat implementation
dependent. Where this is the case, the discussion refers to
INTERLISP-10, the implementation for the DEC PDP-10, on which
INTERLISP was first implemented.
3.1
3.1.1 LITERAL ATOMS
A literal atom13 is input as any string of non-delimiting characters that
cannot be interpreted as a number. The syntatic characters that delimit
atoms called separator or break characters (Section 14) and normally are
3
space,14 end-of-line, 15 line-feed,16 %17 (18 )19 "20 ]21 and [.22 However, these
characters may be included in atoms by preceding them with the escape
character26 %.
Literal atoms are printed by print27 and prin228 as a sequence of characters
with %'s inserted before all delimiting characters (so that the atom
will read back in properly). Literal atoms are printed by prin129 as a
sequence of characters without these extra %'s. For example, the atom
consisting of the five characters A, B, C, (, and D will be printed as
ABC%(D by print and ABC(D by prinl. The extra %'s are an artifact of
the print program; they are not stored in the atom's pname.30
Literal atoms can be constructed by 31pack, 32mkatom, and 33gensym (which uses
mkatom).
Literal atoms34 are unique. In other words, if two literal atoms have the
same pname, i.e. print the same, they will always be the same identical
atom, that is, they will always have the same address in memory, or
4
equivalently, they will always be eq. Thus if pack35 or mkatom36 is given a
list of characters corresponding to a literal atom that already exists,
they return a pointer to that atom, and do not make a new atom.
Similarly, if the read program is given as input of a sequence of
characters for which an atom already exists, it returns a pointer to
that atom.
A literal atom is a datum consisting of the following components: a
property list, initially NIL, (accessed by getproplist and setproplist,
Section 7), a top level value, (accessed by gettopval and settopval,
Section 5), a function definition, initially NIL, (accessed by getd and
putd, Section 8), and a pname, (not directly accessible).
------------------------------------------------------------------------
3
An end-of-line23 character is transmitted by 24TENEX when it sees a
25carriage-return.
4
Note that this is not true for strings, large integers, floating
point numbers, and lists, i.e. they all can print the same without
being eq.
3.2
3.1.2 PNAMES
5
The pnames37 of atoms comprise another data-type with storage assigned as
it is needed. This data-type only occurs as a component of an atom or a
string. It does not appear, for example, as an element of a list.
Pnames have no input syntax or output format as they cannot be directly
referenced by user programs.
A pname is a sequence of 7 bit characters packed 5 to a word, beginning
at a word boundary. The first character of a pname contains its length;
thus the maximum length of a pname is 126 characters.
3.1.3 NUMERICAL ATOMS
Numerical atoms, or simply numbers, do not have property lists, value
cells, functions definition cells, or explicit pnames. There are
currently two types of numbers in INTERLISP: integers,39 and floating
point numbers.40
INTEGERS
The input syntax for an integer is an optional sign (+ or -) followed by
6
a sequence of digits, followed by an optional Q. 41 If the Q is present,
the digits are interpreted in octal,42 otherwise in decimal, e.g. 77Q and
63 both correspond to the same integers, and in fact are
indistinguishable internally since no record is kept of how integers
were created.
The setting of radix43 (Section 14), determines how integers are printed:
signed or unsigned, octal or decimal.
Integers are created by pack44 and mkatom45 when given a sequence of
characters observing the above syntax, e.g.
(PACK (LIST 1 2 (QUOTE Q))) = 10. Integers are also created as a result
of arithmetic operations, as described in Section 13.
An integer is stored in one 36 bit word; thus its magnitude must be less
------------------------------------------------------------------------
5
All INTERLISP pointers have pnames,38 since we define a pname simply
to be how that pointer is printed. However, only literal atoms and
strings have their pnames explicitly stored. Thus, the use of the
term pname in a discussion of data-types or storage allocation means
pnames of atoms or strings, and refers to a sequence of characters
stored in a certain part of INTERLISP's memory.
6
and terminated by a delimiting character. Note that some data-types
are self-delimiting, e.g. lists.
3.3
7
than 2^35. To avoid having to store (and hence garbage collect) the
values of small integers,46 a few pages of address space, overlapping the
INTERLISP-10 machine language code, are reserved for their
representation. The small number pointer itself, minus a constant, is
the value of the number. Currently the range of "small" integers is
-1536 thru +1535. The predicate smallp47 is used to test whether an
integer is "small".
While small integers have a unique representation, large integers do
not. In other words, two large integers may have the same value, but
not the same address in memory, and therefore not be eq. For this
reason the function eqp48 (or equal) should be used to test equality of
large integers.49
FLOATING POINT NUMBERS
A floating point number is input as a signed integer, followed by a
decimal point, followed by another sequence of digits called the
fraction, followed by an exponent (represented by E followed by a signed
8
integer). 50 Both signs are optional, and either the fraction following
the decimal point, or the integer preceding the decimal point may be
omitted.51 One or the other of the decimal point or exponent may also be
omitted, but at least one of them must be present to distinguish a
floating point number from an integer. For example, the following will
be recognized as floating point numbers:
5. 5.00 5.01 .3 5E2 5.1E2
5E-3 -5.2E+6
Floating point numbers52 are printed using the facilities provided by
TENEX.53 INTERLISP-10 calls the floating point number to string conversion
9
routines using the format control specified by the function fltfmt
(Section 14). fltfmt54 is initialized to T, or free format. For example,
the above floating point numbers would be printed free format as:
5.0 5.0 5.01 .3 500.0 510.0
.005 -5.2E6
Floating point numbers55 are also created by 56pack and 57mkatom, and as a
result of arithmetic operations as described in section 13.
------------------------------------------------------------------------
7
If the sequence of digits used to create the integer is too large,
the high order portion is discarded. (The handling of overflow as a
result of arithmetic operations is discussed in Section 13.)
8
and terminated by a delimiter.
9
Additional information concerning these conversions may be obtained
from the TENEX JSYS Manual.
3.4
A floating point number is stored in one 36 bit word in standard PDP-10
format. The range is +2.94E-39 thru +1.69E38 (or 2^-128 thru 2^127).
3.1.4 LISTS
10
The input syntax for a list is a sequence (at least one) of INTERLISP
data elements, e.g. literal atoms numbers, other lists,59 etc. enclosed
in parentheses or brackets. A bracket can be used to terminate several
lists, e.g. (A (B (C], as described in Section 2.
If there are two or more elements in a list, the final element can be
preceded by a .60 (delimited on both sides), indicating that cdr of the
final node in the list is to be the element immediately following the .,
e.g. (A . B) or (A B C . D), otherwise cdr of the last node in a list
11
will be NIL. Note that the input sequence (A B C . NIL) is thus
equivalent to (A B C), and that (A B . (C D)) is thus equivalent to
(A B C D). Note however that (A B . C D) will create a list containing
the five literal atoms A B . C and D.
Lists are constructed by the primitive functions cons61 and list.62
Lists are printed by printing a left parenthesis, and then printing the
12
first element of the list, then printing a space, then printing the
second element, etc. until the final node is reached. Lists are
considered to terminate when cdr of some node is not a list. If cdr of
this terminal node is NIL (the usual case), car of the terminal node is
printed followed by a right parenthesis. If cdr of the terminal node is
not NIL, car of the terminal node is printed, followed by a space, a
period, another space, cdr of the terminal node, and then the right
parenthesis. Note that a list input as (A B C . NIL) will print as
(A B C), and a list input as (A B . (C D)) will print as (A B C D).
Note also that printlevel63 affects the printing of lists to teletype, and
that carriage returns may be inserted where dictated by linelength,64 as
described in Section 14.
A list is stored as a chain of list nodes.65 A list node is stored in one
36 bit word, the right half containing car of the list (a pointer to the
first element of the list), and the left half containing cdr of the list
(a pointer to the next node of the list).
------------------------------------------------------------------------
10
()58 is read as the atom NIL.
11
Note that in INTERLISP terminology, a list does not have to end in
NIL, it is simply a structure composed of one or more conses.
12
The individual elements of a list are printed using prin2 if the
list is being printed by print or prin2, and by prin1 if the list is
being printed by prin1.
3.5
3.1.5 ARRAYS
An array66 in INTERLISP is a one dimensional block of contiguous storage
of arbitrary length. Arrays do not have input syntax; they can only be
created by the function array.67 Arrays are printed by both 68print, 69prin2,
and 70prin1, as #71 followed by the address of the array pointer72 (in 73octal).
Array elements can be referenced by the functions elt74 and eltd,75 and set
by the functions seta76 and setd,77 as described in Section 10.
Arrays are partitioned into four sections: a header,78 a section
containing unboxed numbers,79 a section containing INTERLISP pointers, and
a section containing relocation information.80 The last three sections can
each be of arbitrary length (including 0); the header is two words long
and contains the length of the other sections as indicated in the
diagram below. The unboxed number region of an array is used to store
36 bit quantities that are not INTERLISP pointers, and therefore not to
be chased from during garbage collections, e.g. machine instructions.
The relocation informaion is used when the array contains the definition
of a compiled function, and specifies which locations in the unboxed
region of the array must be changed if the array is moved during a
garbage collection.
The format of an array in INTERLISP-10 is as follows:
3.6
Figure 3-1
The header contains:
word 0 right - length of entire block=ARRAYSIZE+2.
left - address of relocation information relative to
word 0 of block (> 0 if relocation information
exists, negative if array is a hash array, 0 if
ordinary array).
word 1 right - address of pointers relative to word 0 of block.
left - used by garbage collector.
3.1.6 STRINGS
The input syntax for a string is a ", followed by a sequence of any
characters except " and %, terminated by a ". " and % may be included in
a string by preceding them with the escape character %.
Strings81 are printed by 82print and 83prin2 with initial and final "'s,84 and
%'s85 inserted where necessary for it to read back in properly. Strings
are printed by 86prin1 without the delimiting "'s and extra %'s.
Strings are created by 87mkstring, 88substring, and 89concat.
Internally a string is stored in two parts; a string pointer and the
sequence of characters. The INTERLISP pointer to a string is the
address of the string pointer. The string pointer, in turn, contains
the character position at which the string characters90 begin, and the
number of characters. String pointers91 and string characters are two
3.7
13
separate data-types, and several string pointers may reference the
same characters. This method of storing strings permits the creation of
a substring by creating a new string pointer, thus avoiding copying of
the characters. For more details, see Section 10.
String characters92 are 7 bit bytes packed 5 to a word. The format of a
string pointer is:
Figure 3-2
The maximum length of a string is 32K (K=1024) characters.93
14 15
3.2 USER DEFINED DATA-TYPES
In addition to the several built-in data-types, INTERLISP provides a way
of defining completely new classes of objects, with a fixed number of
fields determined by the definition of the data-type. Facilities are
provided for declaring the number and type of the fields for a given
class, creating objects of a given class, accessing and replacing the
contents of each of the fields of such an object, as well as
interrogating such objects.
In order to define a new class of objects, the user must supply a name
for the new data-type and specifications for each of its fields. Each
field may contain either a pointer (i.e., any arbitrary INTERLISP
datum), an integer, a floating point integer, or an n-bit integer. The
latter may also have an offset, e.g., a 3-bit field with an offset of 10
may contain the integers 10 through 17. The offset redefines the zero
point.
------------------------------------------------------------------------
13
String characters are not directly accessible by user programs.
14
The most convenient way to define new data-types is via DATATYPE
declarations in the RECORD package (as described in Section 23).
15
User data types were implemented by D.C. Lewis and L.M. Masinter
3.8
The above operations are accomplished by calling the function
declaredatatype.
94declaredatatype[typename;fieldspecs]
typename is a literal atom, which specifies the
name of the data-type. fieldspecs is a list of
field specifications. Each field specification
may be one of the following:
POINTER field may contain any
INTERLISP
datum
FIXP field contains an integer
FLOATP field contains a floating
point number
(BITS n offset) field contains an integer, x,
whose
value satisfies
n
offset < x < 2 + offset
declaredatatype returns a list of field
descriptors, one for each element of fieldspecs.
The field descriptor contains information about
where within the datum the field is actually
stored. If typename is already declared a
datatype, it is re-declared. In addition, if
fieldspecs is NIL, typename is "undeclared".
95fetchfield[fieldesc;datum]
Returns the contents of the field described by
fieldesc from datum. fieldesc must be a "field
descriptor" as returned by declaredatatype. If
datum is not an instance of the datatype of
which fieldesc is a descriptor, causes error
16
DATUM OF INCORRECT TYPE96.
97replacefield[fieldesc;datum;newvalue]
Store newvalue into the field of datum described
by fieldesc. fieldesc must be a field
descriptor, as returned by declaredatatype. If
datum is not an instance of the datatype of
which fieldesc is a descriptor, causes error
DATUM OF INCORRECT TYPE. Value is newvalue.
98typename[datum] Returns the "name" of datum. If datum is an
------------------------------------------------------------------------
16
In INTERLISP-10, if fieldesc is quoted, fetchfield compiles open.
This is used by the record package.
3.9
instance of a user datatype, the value is the
name given to declaredatatype. Otherwise,
typename returns one of the atoms LISTP, FLOATP,
FIXP, STRINGP, LITATOM, STACKP, ARRAYP,
SWPARRAYP, SMALLP corresponding to the datatype
of datum.
99typenamep[datum;typename]
17
Equivalent to eq[typename[datum];name].
100getfieldspecs[typename] Returns a list which is equal to the fieldspecs
argument given to declaredatatype for typename;
if typename is not a currently declared
data-type, returns NIL.
101getdescriptors[typename]Returns a list of field descriptors, equal to
the value of declaredatatype for typename.
102userdatatypes[] Returns list of names of currently declared user
datatypes.
In INTERLISP-10, user datatypes are allocated starting with type number
31. The minimum amount of space that can be assigned to any one data-
type is one page, thus effectively limiting the number of possible data
types. The fields are rearranged internally so that pointer fields come
first; the remaining numeric fields, if any, are packed so as to
optimize the space used.
3.3 STORAGE ALLOCATION AND GARBAGE COLLECTION
In the following discussion,103 we will speak of a quantity of memory being
assigned to a particular data-type, meaning that the space is reserved
for storage of elements of that type. Allocation will refer to the
process used to obtain from the already assigned storage a particular
location for storing one data element.
A small amount of storage is assigned to each data-type when
INTERLISP-10 is started; additional storage is assigned only during a
garbage collection.104
The page105 is the smallest unit of memory that may be assigned for use by
a particular data-type. For each page of memory there is a one word
entry in a type table. The entry contains the data-type residing on the
page as well as other information about the page. The type of a pointer
is determined by examining the appropriate entry in the type table.
------------------------------------------------------------------------
17
typenamep compiles open.
3.10
Storage is allocated as is needed by the functions which create new data
elements, such as cons,106 pack,107 mkstring.108 For example, when a large
integer is created by iplus, the integer is stored in the next available
location in the space assigned to integers. If there is no available
location, a garbage collection is initiated, which may result in more
storage being assigned.
The storage allocation109 and garbage collection methods differ for the
various data-types.110 The major distinction is between the types with
elements of fixed length and the types with elements of arbitrary
length. List nodes,111 atoms,112 large integers,113 floating point numbers,114 and
string pointers115 are fixed length; all occupy 1 word except atoms which
use 3 words. Arrays,116 pnames,117 and strings (string characters)118 are
variable length.
Elements of fixed length types are stored so that they do not overlap
page boundaries. Thus the pages assigned to a fixed length type need
not be adjacent. If more space is needed, any empty page will be used.
The method of allocating storage for these types employs a free-list119 of
available locations; that is, each available location contains a pointer
to the next available location. A new element is stored at the first
18
location on the free-list, and the free-list pointer is updated.
Elements of variable length data-types are allowed to overlap page
boundaries. Consequently all pages assigned to a particular variable
length type must be contiguous. Space for a new element is allocated
following the last space used in the assigned block of contiguous
storage.
When INTERLISP-10 is first called, a few pages of memory are assigned to
each data-type. When the allocation routine for a type determines that
no more space is available in the assigned storage for that type, a
garbage collection is initiated. The garbage collector determines what
data is currently in use and reclaims that which is no longer in use. A
garbage collection may also be initiated by the user with the function
reclaim120 (Section 10).
Data in use (also called active data) is any data that can be "reached"
from the currently running program (i.e., variable bindings and
functions in execution) or from atoms. To find the active data the
garbage collector "chases" all pointers, beginning with the contents of
the push-down lists and the components (i.e., car, cdr, and function
definition cell) of all atoms with at least one non-trivial component.
When a previously unmarked datum is encountered, it is marked, and all
pointers contained in it are chased. Most data-types are marked using
bit tables; that is tables containing one bit for each datum. Arrays,121
however, are marked using a half-word in the array header.
------------------------------------------------------------------------
18
The allocation routine for list nodes is more complicated. Each page
containing list nodes has a separate free list. First a page is
chosen (see CONS for details), then the free list for that page is
used. Lists are the only data-type which operate this way.
3.11
When the mark and chase process is completed, unmarked (and therefore
unused) space is reclaimed. Elements of fixed length types that are no
longer active are reclaimed by adding their locations to the free-list122
for that type. This free list allocation method permits reclaiming
space without moving any data, thereby avoiding the time consuming
process of updating all pointers to moved data. To reclaim unused space
in a block of storage assigned to a variable length type, the active
elements are compacted123 toward the beginning of the storage block, and
then a scan of all active data that can contain pointers to the moved
data is performed to update the pointers.
19
Whenever a garbage collection of any type is initiated, unused space
for all fixed length types is reclaimed since the additional cost is
slight. However, space for a variable length type is reclaimed only
when that type initiated the garbage collection.
If the amount of storage reclaimed for the type that initiated the
garbage collection is less than the minimum free storage requirement for
that type, the garbage collector will assign enough additional storage
to satisfy the minimum free storage requirement. The minimum free
storage requirement for each data may be set with the function 125minfs
(Section 10). The garbage collector assigns additional storage to fixed
length types by finding empty pages, and adding the appropriate size
elements from each page to the free list. Assigning additional storage
to a variable length type involves finding empty pages and moving data
so that the empty pages are at the end of the block of storage assigned
to that type.
In addition to increasing the storage assigned to the type initiating a
garbage collection, the garbage collector will attempt to minimize
garbage collections by assigning more storage to other fixed length
20
types according to the following algorithm. If the amount of active
data of a type has increased since the last garbage collection by more
than 1/4 of the minfs126 value for that type, storage is increased (if
necessary), to attain the minfs value. If active data has increased by
less than 1/4 of the minfs value, available storage is increased to 1/2
minfs. If there has been no increase, no more storage is added. For
example, if the 127minfs setting is 2000 words, the number of active words
has increased by 700, and after all unused words have been collected
there are 1000 words available, 1024 additional words (two pages) will
be assigned to bring the total to 2024 words available. If the number
of active words had increased by only 300, and there were 500 words
available, 512 additional words would be assigned.128
------------------------------------------------------------------------
19
The "type of a garbage collection" or the "type that initiated a
garbage collection" means either the type that ran out of space and
called the garbage collector, or the argument to 124reclaim.
20
We may experiment with different algorithms.
3.12
3.4 SHARED INTERLISP-10
The INTERLISP-10 system initially obtained by the user is shared; that
is, all active users of INTERLISP-10 are actually using the same pages
of memory. As a user adds to the system, private pages are added to his
memory. Similarly, if the user changes anything in the original shared
INTERLISP-10, for example, by advising a system function, a private copy
of the changed page is created.
In addition to the swapping time saved by having several users accessing
the same memory, the sharing129 mechanism permits a large saving in garbage
collection time, since we do not have to garbage collect any data in the
shared system, and thus do not need to chase from any pointers on shared
pages130 during garbage collections.
This reduction in garbage collection time is possible because the shared
system usually is not modified very much by the user. If the shared
system is changed extensively, the savings in time will vanish, because
once a page that was initially shared is made private,131 every pointer on
it must be assumed active, because it may be pointed to by something in
the shared system. Since every pointer on an initially shared but now
private page can also point to private data, they must always be chased.
A user may create his own shared system132 with the function makesys. If
several people are using the same system, making the system be shared
will result in a savings in swapping time. Similarly, if a system is
large and seldom modified, making it be shared will result in a
reduction of garbage collection time, and may therefore be worthwhile
even if the system is only being used by one user.
makesys[file] 133creates a saved file in which all pages in this
system, including private user pages, are made
read execute, i.e. shared. This system can then
be run via the 134TENEX command 135RUN, or GET and
START.
For example, new INTERLISP-10 systems are brought up by loading the
21
appropriate compiled files and then performing makesys[LISP.SAV].
herald[string] 137makes string be the 'herald' for the system,
i.e. the message printed when the system is
first started. Primarily for use in conjunction
------------------------------------------------------------------------
21
makesys is also advised (see section 19) to set the variable
136makesysdate to (DATE), i.e. the time and date the system was made.
3.13
22
with makesys.
23
THE INTERLISP-10 SWAPPER
INTERLISP-10 provides a very large auxilary address space exclusively
for swappable arrays (primarily compiled function definitions). In
addition to the 256K of resident address space, this "shadow space" can
currently accomodate an additonal 256K words, can easily be expanded to
3.5 million words, and with some further modifications, could be
expanded to 128 million words. Thus, the overlay system provides
24
essentially unlimited space for compiled code.
Shadow space and the swapper are intended to be more or less transparent
to the user. However, this section is included in the manual to give
programmers a reasonable feeling for what overlays are like, without
getting unnecessarily technical, as well as to document some new
functions and system controls which may be of interest for authors of
exceptionally large systems.
3.4.1 OVERLAYS
The shadow space is a very large auxiliary address space used
exclusively for an INTERLISP data-type called a swappable array140. The
regular address space is called the "resident" space to distinguish it
------------------------------------------------------------------------
22
makesys is advised to set the variable heraldstring 138 to the
concatenation of "INTERLISP-10", the month and day of the makesys,
and "..." and to call herald on this string. Alternatively, makesys
can be given as a second argument a string to be used instead of
"INTERLISP-10", e.g. makesys[STREK.SAV;STAR-TREK] would cause the
message STAR-TREK followed by the date and "..." to be printed when
STREK.SAV was run.
23
The INTERLISP-10 swapper was designed by E. L. Wegbreit (PARC) and
J. W. Goodwin (BBN), and implemented by J. W. Goodwin.
24
Since compiled code arrays point to atoms for function names, and
strings for error messages, not to mention the fact that programs
usually have data base, which are typically lists rather than
arrays, there is still a very real and finite limit to the total
size of programs that INTERLISP-10 can accomodate. However, since
much of the system and user compiled code can be made swappable,
there is that much more resident space available for these other
data-types.
3.14
from shadow space. Any kind of resident array - compiled code, pointer
data, binary data, or a hash array - can be copied into shadow space
("made swappable"), from which it is referred to by a one-word resident
entity called a handle141. The resident space occupied by the original
array can then be garbage collected normally (assuming there are no
remaining pointers to it, and it has not been made shared by a makesys).
Similarly, a swappable array can be made resident again at any time, but
of course this requires (re)allocating the necessary resident space.
The main purpose and intent of the swapping system is to permit
utilization of swappable arrays directly and interchangeably with
resident arrays, thereby saving resident space which is then available
for other data-types, such as lists, atoms, strings, etc.
This is accomplished as follows: A section of the resident address space
25
is permanently reserved for a swapping buffer. When a particular
swappable array is requested, it is brought (swapped) in by mapping or
overlaying the pages of shadow space in which it lies onto a section of
the swapping buffer142. This process is the swapping or overlaying from
which the system takes its name. The array is now (directly)
accessible. However, further requests for swapping could cause the
array to be overlaid with something else, so in effect it is liable to
go away at any time. Thus all system code that relates to arrays must
recognize handles as a special kind of array, fetch them into the buffer
(if not already there), when necessary check that they have not
disappeared, fetch them back in if they have, and even be prepared for
the second fetch to bring the swappable array in at a different place
than did the first.
The major emphasis in the design of the overlay system has been placed
on running compiled code, because this accounts for the overwhelming
majority of arrays in typical systems, and for as much as 60% of the
overall data and code. The system supports the running of compiled code
directly from the swapping buffer, and the function calling mechanism
knows when a swappable definition is being called, finds it in the
buffer if it is already there, and brings it in otherwise. Thus, from
the user's point of view, there is no need to distinguish between
swappable and resident compiled definitions, and in fact ccodep143 will be
true for either.
3.4.2 EFFICIENCY
Once of the most important design goals for the overlay system was that
swappable code should not execute any extra instructions compared to
resident code, once it had been swapped in. Thus, the instructions of a
swappable piece of code are identical (except for two instructions at
the entry point) to those of the resident code from which it was
------------------------------------------------------------------------
25
Currently 64 512 word pages.
3.15
26
copied, and similarly when a swappable function calls another function
(of any kind) it uses the exact same calling sequence as any other code.
Thus, all costs associated with running of swappable code are paid at
27
the point of entry (both calling and returning).
The cost of the swapping itself, i.e. the fetch of a new piece of
swapped code into the buffer, is even harder to measure meaningfully,
since two successive fetches of the same function are not the same, due
to the fact that the instance created by the first fetch is almost
certain to be resident when the second is done, if no swapping is done
in between. Similarly, two successive PMAP's (the Tenex operation to
fetch one page) are not the same from one moment to another, even if the
virtual state of both forks is exactly the same - a difficult constraint
28
to meet in itself. Thus, all that can be reported is that empirical
measurements and observations have shown no consistent slowdown in
performance of systems containing swappable functions viz a viz resident
functions.
3.4.3 SPECIFICATIONS
Associated with the overlay system is a datatype called a swparray144,
(numeric datatype 4), which occupies one word of resident space, plus
however much of shadow space needed for the body of the array. arglist,
fntyp, nargs, getd, putd, argtype, arraysize, changename, calls,
printstructure, break, advise, and edita all work equally well with
swappable as resident programs. ccodep145 is true for all compiled
functions/definitions.
------------------------------------------------------------------------
26
The relocatable instructions are indexed by a base register, to make
them run equally well at any location in the buffer. The net
slowdown due to this extra level of indirection is too small to
measure accurately in the overall running of a program. On
analytical grounds, one would expect it to be around 2%.
27
If the function in question does nothing, e.g. a compiled
(LAMBDA NIL NIL), it costs approximately twice as much to enter its
definition if it is swappable as compared to resident. However,
very small functions are normally not made swappable (see mkswapp,
page 3.17), because they don't save much space, and are (typically)
entered frequently. Larger programs don't exhibit a measurable slow
down since they amortize the entry cost over longer runs.
28
The cost of fetching is probably not in the mapping operation itself
but in the first reference to the page, which has a high probability
of faulting. This raises the problem of measuring page fault
activity, another morass of uncertainty. The BBN INTERLISP group
has a project in progress to measure the interaction of INTERLISP-10
and TENEX.
3.16
146swparrayp[x] Analogous to arrayp147. Returns x if x is a
swappable array and, NIL otherwise.
148mkswap[x] If x is a resident array, returns a swappable
array which is a copy of x. If x is a literal
atom and ccodep[x] is true, its definition is
copied into a swappable array, and it is
(undoably) redefined with the latter. The value
of mkswap is x.
149mkunswap[x] the inverse of mkswap. x is either a swappable
array, or an atom with swapped definition on its
CODE150 property.
151mkswapp[fname;cdef] All compiled definitions begin life as resident
arrays, whether they are created by load, or by
compiling to core. Before they are stored away
into their atom's function cell, mkswapp is
applied to the atom and the array. If the value
of mkswapp is T, the definition is made
swappable; otherwise, it is left resident. By
redefining mkswapp or advising it, the user can
completely control the swappability of all
future definitions as they are created. The
initial definition of mkswapp will make a
function swappable if (1) noswapflg is NIL, and
(2) the name of the function is not on
noswapfns152, and (3) the size of its definition is
greater than mkswapsize153 words, initially 128.
154setsbsize[n] Sets the size of the swapping buffer to n, a
number of pages. Returns the previous value.
setsbsize[] returns the current size without
29
changing it. 155
------------------------------------------------------------------------
29
Currently, the system lacks error recovery routines for situations
such as a call to a swappable function which is too big for the
swapping buffer, or when the size is zero. Therefore, setsbsize
should be used with care.
3.17
**INDEX**
(
(pointer 1 1)
(BEGIN data-types 1 2)
(lists 1 3)
(atoms 1 4)
(pnames 1 5)
(arrays 1 6)
(large integers 1 7)
(small integers 1 8)
(floating point numbers 1 9)
(string characters 1 10)
(string pointers 1 11)
(hash arrays 1 12)
(literal atoms 2 13)
(space 2 14)
(end-of-line 2 15)
(line-feed 2 16)
(% (escape character) 2 17)
(( 2 18)
() 2 19)
(" 2 20)
(] 2 21)
([ 2 22)
(end-of-line 2 23)
(TENEX NIL 2 24)
(carriage-return 2 25)
(escape character 2 26)
(PRINT 2 27)
(PRIN2 2 28)
(PRIN1 2 29)
(pnames 2 30)
(PACK 2 31)
(MKATOM 2 32)
(GENSYM 2 33)
(literal atoms 2 34)
(PACK 2 35)
(MKATOM 2 36)
(pnames 3 37)
(pnames 3 38)
(integers 3 39)
(floating point numbers 3 40)
(Q (following a number) 3 41)
(octal 3 42)
(RADIX 3 43)
(PACK 3 44)
(MKATOM 3 45)
(small integers 4 46)
(SMALLP 4 47)
(EQP 4 48)
(large integers 4 49)
(E (in a floating point number) 4 50)
(. (in a floating point number) 4 51)
(floating point numbers 4 52)
(TENEX NIL 4 53)
(FLTFMT 4 54)
(floating point numbers 4 55)
(PACK 4 56)
(MKATOM 4 57)
(() 5 58)
(lists 5 59)
(. 5 60)
(CONS 5 61)
(LIST 5 62)
(PRINTLEVEL 5 63)
(LINELENGTH 5 64)
(list nodes 5 65)
(arrays 6 66)
(ARRAY 6 67)
(PRINT 6 68)
(PRIN2 6 69)
(PRIN1 6 70)
(# (followed by a number) 6 71)
(array pointer 6 72)
(octal 6 73)
(ELT 6 74)
(ELTD 6 75)
(SETA 6 76)
(SETD 6 77)
(array header 6 78)
(unboxed numbers (in arrays) 6 79)
(relocation information (in arrays) 6 80)
(strings 7 81)
(PRINT 7 82)
(PRIN2 7 83)
(" 7 84)
(% (escape character) 7 85)
(PRIN1 7 86)
(MKSTRING 7 87)
(SUBSTRING 7 88)
(CONCAT 7 89)
(string characters 7 90)
(string pointers 7 91)
(string characters 8 92)
(END data-types 8 93)
(DECLAREDATATYPE 9 94)
(FETCHFIELD 9 95)
(DATUM OF INCORRECT TYPE EM 9 96)
(REPLACEFIELD 9 97)
(TYPENAME 9 98)
(TYPENAMEP 10 99)
(GETFIELDSPECS 10 100)
(GETDESCRIPTORS 10 101)
(USERDATATYPES 10 102)
(storage allocation 10 103)
(BEGIN garbage collection 10 104)
(page 10 105)
(CONS 11 106)
(PACK 11 107)
(MKSTRING 11 108)
(storage allocation 11 109)
(data-types 11 110)
(list nodes 11 111)
(atoms 11 112)
(large integers 11 113)
(floating point numbers 11 114)
(string pointers 11 115)
(arrays 11 116)
(pnames 11 117)
(string characters 11 118)
(free-list 11 119)
(RECLAIM 11 120)
(arrays 11 121)
(free-list 12 122)
(compacting 12 123)
(RECLAIM 12 124)
(MINFS 12 125)
(MINFS 12 126)
(MINFS 12 127)
(END garbage collection 12 128)
(sharing 13 129)
(shared pages 13 130)
(private pages 13 131)
(shared system 13 132)
(MAKESYS 13 133)
(TENEX NIL 13 134)
(RUN TC 13 135)
(MAKESYSDATE SY 13 136)
(HERALD 13 137)
(HERALDSTRING SY 14 138)
(BEGIN overlays 14 139)
(swappable array 14 140)
(handle 15 141)
(swapping buffer 15 142)
(CCODEP 15 143)
(SWPARRAY 16 144)
(CCODEP 16 145)
(SWPARRAYP 17 146)
(ARRAYP 17 147)
(MKSWAP 17 148)
(MKUNSWAP 17 149)
(CODE PN 17 150)
(MKSWAPP 17 151)
(NOSWAPFNS OV 17 152)
(MKSWAPSIZE OV 17 153)
(SETSBSIZE 17 154)
(END overlays 17 155)
)