Trailing-Edge
-
PDP-10 Archives
-
decus_20tap1_198111
-
decus/20-0004/10lisp.doc
There are no other files named 10lisp.doc in the archive.
SECTION 10
ATOM, STRING, ARRAY, AND STORAGE MANIPULATION
10.1 Pnames and Atom Manipulation
The term "print name" (of an atom) in LISP 1.5 referred to the
characters that were output whenever the atom was printed. Since these
characters were stored on the atom's property list under the property
PNAME, pname was used interchangeably with "print name". In INTERLISP,
all pointers have pnames, although only literal atoms and strings have
their pname explicitly stored.
The pname of a pointer are those characters that are output when the
1
pointer is printed using prin1,
2
e.g., the pname of the atom ABC%(D consists of the five characters
ABC(D. The pname of the list (A B C) consists of the seven characters
(A B C) (two of the characters are spaces).
Sometimes we will have occasion to refer to the prin2-pname.
The prin2-pname are those characters output when the corresponding
pointer is printed using prin2.
3
Thus the prin2-pname of the atom ABC%(D is the six characters ABC%(D.
------------------------------------------------------------------------
1
except that for the purposes of the functions described in this
chapter, i.e., unpack, nchars, etc. the prin1-pname of an integer is
defined as though radix=10. Note that integers will still be printed
by prin1 using the current radix, as described in Section 14.
However, we want pack[unpack[X9]] to always be X9 (and not sometimes
X11) regardless of the setting of the radix.
2
% is the escape character. See Sections 2 and 14.
3
Note that the prin2-pname also depends on what readtable is being
used (see Section 14), since this determines where %'s will be
inserted. Note also that the prin2-pname of an integer depends on
the setting of radix.
10.1
pack[x] If x is a list of atoms, the value of pack is a
single atom whose pname is the concatenation of
the pnames of the atoms in x, e.g.,
pack[(A BC DEF G)]=ABCDEFG. If the pname of the
value of pack[x] is the same as that of a
number, pack[x] will be that number, e.g.,
pack[(1 3.4)]=13.4, pack[(1 E -2)]=.01.
Although x is usually a list of atoms, it can be
a list of arbitrary INTERLISP pointers. The
value of pack is still a single atom whose pname
is the same as the concatenation of the pnames
of all the pointers in x, e.g.,
pack[((A B)"CD")] = %(A% B%)CD.
In other words, mapc[x;prin1] and prin1[pack[x]]
4
always produce the same output. In fact, pack
actually operates by calling prin1 to convert
the pointers to a stream of characters (without
printing) and then makes an atom out of the
result.
Note: In INTERLISP-10, atoms are restricted to < 127 characters.
Attempting to create a larger atom either via pack or by typing one in
(or reading from a file) will cause an error, ATOM TOO LONG.
unpack[x;flg;rdtbl] The value of unpack is the pname of x as a list
5
of characters (atoms), e.g.,
unpack[ABC] = (A B C)
unpack["ABC(D"] = (A B C %( D)
In other words prin1[x] and
mapc[unpack[x];prin1] produce the same output.
If flg=T, the prin2-pname of x is used, (and
computed with respect to rdtbl) e.g.,
unpack["ABC(D";T]= (%" A B C %( D %").
Note: unpack[x] performs n conses, where n is the number of characters
in the pname of x.
dunpack[x;scratchlist;flg;rdtbl]
------------------------------------------------------------------------
4
Except for integers when radix is other than 10, e.g., mapc[(X
9);PRIN1] produces X11 when radix is 8, but pack[(X 11Q)]=X9. (See
footnote 1.)
5
There are no special "character-atoms" in INTERLISP, i.e., an atom
consisting of a single character is the same as any other atom.
10.2
a destructive version of unpack that does not
perform any conses but instead uses scratchlist
to make a list equal to unpack[x;flg]. If the
p-name is too long to fit in scratchlist,
dunpack calls unpack and returns unpack[x;flg].
Gives an error if scratchlist is not a list.
6
nchars[x;flg;rdtbl] number of characters in pname of x. If flg=T,
the prin2-pname is used. E.g. nchars["ABC"]=3,
nchars["ABC";T]=5.
nthchar[x;n;flg;rdtbl] Value is nth character of pname of x.
Equivalent to car[nth[unpack[x;flg];n]] but
faster and does no conses. n can be negative,
in which case counts from end of pname, e.g., -1
refers to the last character, -2 next to last,
etc. If n is greater than the number of
characters in the pname, or less than minus that
number, or 0, the value of nthchar is NIL.
packc[x] like pack except x is a list of character
7
codes, e.g., packc[(70 79 79)]=FOO.
chcon[x;flg;rdtbl] like unpack, except returns the pname of x as a
list of character codes, e.g., chcon[FOO] = (70
79 79). If flg=T, the prin2-pname is used.
chcon1[x] returns character code of first character of
pname of x, e.g., chcon1[FOO] = 70. Thus
chcon[x] could be written as
mapcar[unpack[x];chcon1].
dchcon[x;scratchlist;flg;rdtbl]
similar to dunpack
character[n] n is a character code. Value is the atom having
the corresponding single character as its
------------------------------------------------------------------------
6
Both nthchar and nchars work much faster on objects that actually
have an internal representation of their pname, i.e., literal atoms
and strings, than they do on numbers and lists, as they do not have
to simulate printing.
7
INTERLISP-10 uses ASCII code.
10.3
8
pname, e.g., character[70] = F. Thus,
unpack[x] could be written as
mapcar[chcon[x];character].
fcharacter[n] fast version of character that compiles open.
gensym[char] Generates a new atom of the form xnnnn, where
x=char (or A if char is NIL) in which each of
the n's is a digit. Thus, the first one
generated is A0001, the second A0002, etc.
gensym provides a way of generating new atoms
for various uses within the system. The value
of gennum, initially 10000, determines the next
gensym, e.g., if gennum is set to 10023,
gensym[]=A0024.
The term gensym is used to indicate an atom that was produced by the
function gensym. Atoms generated by gensym are the same as any other
literal atoms: they have property lists, and can be given function
definitions. Note that the atoms are not guaranteed to be new.
For example, if the user has previously created A0012, either by typing
it in, or via pack or gensym itself, when gennum gets to 10011, the next
value returned by gensym will be the A0012 already in existence.
mapatoms[fn] Applies fn to every literal atom in the system,
e.g.,
mapatoms[(LAMBDA(X)(AND(SUBRP X)(PRINT X)))]
will print every subr. Value of mapatoms is
NIL.
10.2 String Functions
stringp[x] Is x if x a string, NIL otherwise. Note: if x
is a string, nlistp[x] is T, but atom[x] is NIL.
strequal[x;y] Is x if x and y are both strings and equal,
i.e., print the same, otherwise NIL.. Equal
uses strequal. Note that strings may be equal
without being eq.
mkstring[x] Value is string corresponding to prin1 of x.
------------------------------------------------------------------------
8
See footnote 2.
10.4
rstring[] Reads a string - see Section 14.
substring[x;n;m] Value is the substring of x consisting of the
nth through mth characters of x. If m is NIL,
the substring is the nth character of x thru the
end of x. n and m can be negative numbers, as
with nthchar. Returns NIL if the substring is
not well defined, e.g., n or m > nchars[x] or
< minus[nchars[x]] or n corresponds to a
character in x to the right of the character
indicated by m.
If x is not a string, equivalent to
substring[mkstring[x];n;m], except substring
does not have to actually make the string if x
9
is a literal atom. For example,
substring[(A B C);4;6]="B C".
gnc[x] get next character of string x. Returns the
next character of the string, (as an atom), and
removes the character from the string. Returns
NIL if x is the null string. If x isn't a
string, a string is made. Used for sequential
access to characters of a string.
Note that if x is a substring of y, gnc[x] does
not remove the character from y, i.e., gnc
doesn't physically change the string of
characters, just the pointer and the byte
10
count.
glc[x] gets last character of string x. Above remarks
about gnc also supply to glc.
concat[x1;x2;...;xn] lambda nospread function. Concatenates (copies
of) any number of strings. The arguments are
transformed to strings if they aren't strings.
Value is the new string, e.g.,
concat["ABC";DEF;"GHI"] = "ABCDEFGHI". The
value of concat[] is the null string, "".
rplstring[x;n;y] Replace characters of string x beginning at
------------------------------------------------------------------------
9
See string storage section that follows.
10
See string storage section that follows.
10.5
character n with string y. n may be positive or
negative. x and y are converted to strings if
they aren't already. Characters are smashed
into (converted) x. Returns new x. Error if
there is not enough room in x for y, i.e., the
11
new string would be longer than the original.
Note that if x is a substring of z, z will also
be modified by the action of rplstring.
mkatom[x] Creates an atom whose pname is the same as that
of the string x or if x isn't a string, the same
as that of mkstring[x], e.g., mkatom[(A B C)] is
the atom %(A% B% C%). In INTERLISP-10, if the
atom would have > 126 characters, causes an
error, ATOM TOO LONG.
Searching Strings
strpos is a function for searching one string looking for another.
Roughly it corresponds to member, except that it returns a character
position number instead of a tail. This number can then be given to
substring or utilized in other calls to strpos.
strpos[x;y;start;skip;anchor;tail]
x and y are both strings (or else they are
converted automatically). Searches y beginning
at character number start, (or else 1 if start
is NIL) and looks for a sequence of characters
equal to x. If a match is found, the
corresponding character position is returned,
otherwise NIL, e.g.,
strpos["ABC","XYZABCDEF"]=4
strpos["ABC","XYZABCDEF";5]=NIL
strpos["ABC","XYZABCDEFABC";5]=10
skip can be used to specify a character in x
that matches any character in y, e.g.,
strpos["A&C&";"XYZABCDEF";NIL;&]=4
If anchor is T, strpos compares x with the
characters beginning at position start, or 1.
If that comparison fails, strpos returns NIL
without searching any further down y. Thus it
can be used to compare one string with some
portion of another string, e.g.,
strpos["ABC";"XYZABCDEF";NIL;NIL;T]=NIL
strpos["ABC";"XYZABCDEF";4;NIL;T]=4
------------------------------------------------------------------------
11
If y was not a string, x will already have been partially modified
since rplstring does not know whether y will "fit" without actually
attempting the transfer.
10.6
Finally, if tail is T, the value returned by
strpos if successful is not the starting
position of the sequence of characters
corresponding to x, but the position of the
first character after that, i.e., starting point
plus nchars[x] e.g.,
strpos["ABC";"XYZABCDEFABC";NIL;NIL;NIL;T]=7.
Note that strpos["A";"A";NIL;NIL;NIL;T]=2, even
though "A" has only one character.
Example Problem
Given the strings x, y, and z, write a function foo that will make a
string corresponding to that portion of x between y and z, e.g.,
foo["NOW IS THE TIME FOR ALL GOOD MEN";"IS";"FOR"] is " THE TIME ".
Solution:
(FOO
[LAMBDA (X Y Z)
(AND (SETQ Y (STRPOS Y X NIL NIL NIL T))
(SETQ Z (STRPOS Z X Y))
(SUBSTRING X Y (SUB1 Z])
strposl[a;str;start;neg]str is a string (or else it is converted
automatically to a string), a is a list of
12
characters or character codes. strposl
searches str beginning at character number start
(or else 1 if start=NIL) for one of the
characters in a. If one is found, strposl
returns as its value the corresponding character
position, otherwise NIL. E.g.,
strposl[(A B C);"XYZBCD"]=4. If neg=T, strposl
searches for a character not on a, e.g.,
strposl[(A B C); "ABCDEF";NIL;T]=4.
If a is an array, it is treated as a bit table.
The bits of (ELT A 1) correspond to character
codes 0 to 43Q, of (ELT A 2) to codes 44Q to
107Q, etc. Thus an array whose first element
was 17Q would be equivalent to a list
(40Q 41Q 42Q 43Q) or (% ! %" #).
------------------------------------------------------------------------
12
If any element of a is a number, it is assumed to be a character
code. Otherwise, it is converted to a character code via chcon1.
Therefore, it is more efficient to call strposl with a a list of
character codes.
10.7
If a is not a bit table (array), strposl first converts it to a bit
table using makebittable described below. If strposl is to be called
frequently with the same list of characters, a considerable savings can
be achieved by converting the list to a bit table once, and then passing
the bit table to strposl as its first argument.
makebittable[l;neg;a] makes a bit table suitable for use by strposl.
l and neg are as for strposl. If a is not a
suitable array, makebittable will create an
array and return that as its value. Otherwise
it uses (and changes) a.
Note: if neg=T, strposl must call makebittable whether a is a list or an
array. To obtain bit table efficiency with neg=T, makebittable should
be called with neg=T, to construct the "inverted" table, and the
resulting table (array) should be given to strposl with neg=NIL.
String Storage
A string is stored in 2 parts; the characters of the string, and a
pointer to the characters. The pointer, or "string pointer", indicates
the byte at which the string begins and the length of the string. It
occupies one word of storage. In INTERLISP-10, the characters of the
string are stored five characters to a word in a portion of the address
space devoted exclusively to storing characters.
Since the internal pname of literal atoms also consists of a pointer to
the beginning of a string of characters and a byte count, conversion
between literal atoms and strings does not require any additional
storage for the characters of the pname, although one cell is required
13
for the string pointer.
When the conversion is done internally, e.g., as in substring, strpos,
or strposl, no additional storage is required for using literal atoms
instead of strings.
The use of storage by the basic string functions is given below:
mkstring[x] x string no space
x literal atom new pointer
other new characters and pointer
substring[x;n;m] x string new pointer
x literal atom new pointer
other new characters and pointer
------------------------------------------------------------------------
13
Except when the string is to be smashed by rplstring. In this case,
its characters must be copied to avoid smashing the pname of the
atom. rplstring automatically performs this operation.
10.8
gnc[x] and glc[x] x string no space, pointer is modified
other like mkstring, but doesn't make
much sense
concat[x1;x2;...xn]args any type new characters for whole new
string, one new pointer
rplstring[x;n;y] x string no new space unless characters
are in pname space (as result of
mkstring[atom]) in which case x
is quietly copied to string space
x other new pointer and characters
y any type type of y doesn't matter
10.3 Array Functions
Space for arrays and compiled code are both allocated out of a common
array space. Arrays of pointers and unboxed numbers may be manipulated
by the following functions:
array[n;p;v] This function allocates a block of n+2 words, of
which the first two are header information. The
next p <= n are cells which will contain
unboxed numbers, and are initialized to unboxed
0. The last n-p >= 0 cells will contain
pointers initialized with v, i.e., both car and
cdr are available for storing information, and
each initially contain v. If p is NIL, 0 is
used (i.e., an array containing all INTERLISP
pointers). The value of array is the array,
also called an array pointer. If sufficient
space is not available for the array, a garbage
collection of array space, GC: 1, is initiated.
If this is unsuccessful in obtaining sufficient
space, an error is generated, ARRAYS FULL.
Array-pointers print as #n, where n is the octal representation of the
pointer. Note that #n will be read as a literal atom, and not an array
pointer.
arraysize[a] Returns the size of array a. Generates an
error, ARG NOT ARRAY, if a is not an array.
arrayp[x] Value is x if x is an array pointer otherwise
NIL. No check is made to ensure that x actually
addresses the beginning of an array.
swparrayp[x] Value is x if x is a swappable array, NIL
otherwise.
10.9
14
elt[a;n] Value is nth element of the array a. elt
generates an error, ARG NOT ARRAY, if a is not
15
the beginning of an array. If n corresponds to
the unboxed region of a, the value of elt is the
full 36 bit word, as a boxed integer. If n
corresponds to the pointer region of a, the
value of elt is the car half of the
corresponding element.
seta[a;n;v] sets the nth element of the array a. Generates
an error, ARG NOT ARRAY, if a is not the
beginning of an array. If n corresponds to the
unboxed region of a, v must be a number, and is
unboxed and stored as a full 36 bit word into
the nth element of a. If n corresponds to the
pointer region of a, v replaces the car half of
the nth element. The value of seta is v.
Note that seta and elt are always inverse operations.
eltd[a;n] same as elt for unboxed region of a, but returns
cdr half of nth element, if n corresponds to the
pointer region of a.
setd[a;n;v] same as seta for unboxed region of a, but sets
cdr half of nth element, if n corresponds to the
pointer region of a. The value of setd is v.
In other words, eltd and setd are always inverse operations.
10.4 Storage Functions
reclaim[n] Initiates a garbage collection of type n. Value
of reclaim is number of words available (for
that type) after the collection.
Garbage collections, whether invoked directly by the user or indirectly
by need for storage, do not confine their activity solely to the data
------------------------------------------------------------------------
14
elt[a;1] is the first element of the array (actually corresponds to
the 3rd cell because of the 2 word header).
15
arrayp is true for pointers into the middle of arrays, but elt and
seta must be given a pointer to the beginning of an array, i.e., a
value of array.
10.10
type for which they were called, but automatically collect some or all
of the other types (see Section 3).
ntyp[x] Value is type number for the data type of
INTERLISP pointer x, e.g., ntyp[(A . B)] is 8,
the type number for lists. Thus GC: 8 indicates
a garbage collection of list words.
type number
arrays, compiled code 1
machine code 2
swapped array handles 4
stack pointers 5
list words 8
atoms 12
floating point numbers 16
large integers 18
small integers 20
string pointers 24
pname storage 28
16
string storage 30
typep[x;n] eq[ntyp[x];n]
gcgag[message] affects messages printed by garbage collector.
If message=T, whenever a garbage collection is
begun, GC: is printed, followed by the type
number. When the garbage collection is
complete, two numbers are printed the number of
words collected for that type, and the total
number of words available for that type, i.e.,
allocated but not necessarily currently in use
(see minfs below).
Example:
_RECLAIM(18)
GC: 18
511, 3071 FREE WORDS
3071
_RECLAIM(12)
GC: 12
1020, 1020 FREE WORDS
1020
------------------------------------------------------------------------
16
New user data types (see Section 23) are assigned type numbers
beginning with 31.
10.11
If message=NIL, no garbage collection message is
printed, either on entering or leaving the
garbage collector.
If message is a list, car of message is printed
(using prin1) when the garbage collection is
begun, and cdr is printed when the collection is
finished. If message is a literal atom or
string, message is printed when the garbage
collection is begun, and nothing is printed when
the collection finishes.
If message is a number, the message is the same
as for gcgag[T], except if the total number of
free pages left after the collection is less
than message, the number of free pages is
printed, e.g.,
_GCGAG(100)
T
_RECLAIM()
GC:8
10369, 10369 FREE WORDS, 87 PAGES LEFT.
The initial setting for gcgag is 40.
The value of gcgag is its previous setting.
minfs[n;typ] Sets the minimum amount of free storage which
will be maintained by the garbage collector for
data types of type number typ. If, after any
garbage collection for that type, fewer than n
free words are present, sufficient storage will
be added (in 512 word chunks) to raise the level
to n.
If typ=NIL, 8 is used, i.e., the minfs refers to
list words.
If n=NIL, minfs returns the current minfs
setting for the corresponding type.
A minfs setting can also be changed dynamically, even during a garbage
collection, by typing control-S followed by a number, followed by a
17
period. If the control-S was typed during a garbage collection, the
------------------------------------------------------------------------
17
When the control-S is typed, INTERLISP immediately clears and saves
the input buffer, rings the bell, and waits for input, which is
terminated by any non-number. The input buffer is then restored, and
the program continues. If the input was terminated by other than a
period, it is ignored.
10.12
number is the new minfs setting for the type being collected, otherwise
for type 8, i.e., list words.
Note: A garbage collection of a "related" type may also cause more
storage to be assigned to that type. See discussion of garbage
collector algorithm, Section 3.
storage[flg] Prints amount of storage (by type number) used
by and assigned to the user, e.g.,
_STORAGE()
TYPE USED ASSIGNED
1 8927 12288
2 5120 5120
4 23 512
8 6037 15360
12 2169 3584
16 0 512
18 173 2048
24 110 2048
28 802 2048
30 312 512
SUM 23673 44032
If flg=T, includes storage used by and assigned
to the system. Value is NIL.
gctrp[n] garbage collection trap. Causes a (simulated)
control-H interrupt when the number of free list
words (type 8) remaining equals n, i.e., when a
garbage collection would occur in n more conses.
The message GCTRP is printed, the function
interrupt (Section 16) is called, and a break
occurs. Note that by advising (Section 19)
interrupt the user can program the handling of a
18
gctrp instead of going into a break.
Value of gctrp is its last setting.
gctrp[-1] will "disable" a previous gctrp since
there are never -1 free list words. gctrp is
initialized this way.
------------------------------------------------------------------------
18
For gctrp interrupts, interrupt is called with intype (its third
argument) equal to 3. If the user does not want to go into a break,
the advice should still allow interrupt to be entered, but first set
intype to -1. This will cause interrupt to "quietly" go away by
calling the function that was interrupted. The advice should not
exit interrupt via return, as in this case the function that was
about to be called when the interrupt occurred would not be called.
10.13
gctrp[] returns number of list words left, i.e.,
number of conses until next type 8 garbage
collection, see Section 21.
conscount[n] conscount[] returns number of conses since
INTERLISP started up. If n is not NIL, resets
conscount to n.
closer[a;x] Stores x into memory location a. Both x and a
must be numbers.
openr[a] Value is the number in memory location a, i.e.,
boxed.
10.14