PDP-10 Archive: info/cache.info from mit_emacs_170_teco

Trailing-Edge - PDP-10 Archives - mit_emacs_170_teco_1220 - info/cache.info
There are no other files named cache.info in the archive.
****Put into INFO format, into DIR****

    The EMACS CACHE library is designed to speed function
calling, especially for the case where several libraries are
loaded.  This library was designed by John Pershing and Eugene
Ciccarelli, written by the former, and now maintained by the
latter (ECC at MIT-MC).

    The use of CACHE is described in the documentation for
& Setup CACHE Library and the variables' documentation.  Do
	M-X Describe& Setup CACHE Library
and	M-X Edit OptionsCACHE
for details.  This info will briefly discuss some of the
performance aspects.

    While it would be a very interesting project, I have never
actually made any careful measurements of CACHE's effectiveness.
The measurements in the following are very rough, made in only
one particular environment and on one system (libraries EMACS,
TWENEX, WORDAB, IVORY, and CACHE loaded;  running on MIT-OZ).

    The basic statement is:  CACHE will most help the case of
frequent function calls by name where you have several libraries
loaded.  The more libraries loaded, the slower is normal
EMACS by-name function calling;  with CACHE, the time to call a
function by name is independent of the number of loaded
libraries.  (In fact, it seems that even with no extra libraries
loaded, having the CACHE can speed things up a little, though
this effect isn't really worth much.)

    CACHE's effect on paging performance is the most interesting
and difficult question -- it may well depend a lot on the
circumstances of your environment and system.

    Some particulars that may help you understand CACHE a bit
better:

    Function calling in EMACS can happen in one of three ways:
The fundamental method is to directly run a teco program object
(a string) that is in a directly-accessible register.  This
spends negligible time in the function call.  However, the number
of registers is pretty small compared to the number of functions
we want to address, the naming of registers is very restrictive,
and registers don't lend themselves to modularity and sharing
functions between users.  They therefore are used as temporaries
within a single function (to hold programs or other objects that
are going to be used a lot in a loop, for instance) and to hold
EMACS's most basic functions.

    EMACS has a slightly higher-level object naming mechanism, a
sorted, binary-searched symbol table (variables).  Variables are
an extension to registers -- nicer naming (full names instead of
just a letter or number) and an indefinite number of them -- but
they still have trouble with modularity and sharing.

    The third level of EMACS function calling builds on the
other two mechanisms to provide a more structured name space:
libraries of objects.  Each loaded library generally has three
parts:  a name-lookup function, a symbol table that function
uses, and the string objects (typically teco programs and their
documentation strings) to be named.

    M.M is (i.e. "register .M holds") the standard EMACS
name-lookup function that takes a name as a string argument,
and returns the named object, generally a program string.  This
string can then be run directly just as if it were in a register.
Say that M.M is looking for the object named "Something", e.g.
you have just done Meta-X Something.

    M.M first checks the standard variable symbol table, for an
"MM-variable", one whose name is "MM Something".  MM-variables
are used for several purposes (e.g. patching, replacing
functions, and a cheap sort of cache), and in particular the MM
symbol table (unlike the others discussed) changes its contents;
this turns out to give CACHE some grief and limit its
effectiveness.

    Next, if no MM-variable, M.M tries each loaded library,
calling each library's name-lookup subroutine in turn.  The first
one (i.e. the most-recently loaded one with a match) that has the
name, "Something", in its symbol table, returns its named string.

    As an approximate measure, on OZ, each symbol table lookup in
this search takes about 2.1 milliseconds.  If there are 4
libraries loaded, a name-lookup that finds its object in the
EMACS basic library (the last, 4th, checked) will check 5 symbol
tables (4 libraries, plus variable table), and take about
5 * 2.1 = 11 milliseconds.

    The CACHE library replaces the standard M.M lookup function
with its own version that checks a CACHE (another symbol table)
right after the MM-variable check, before checking any libraries.
(Because of the changeable nature of the variable symbol table,
it can't avoid the MM-variable check, unfortunately.)  Whenever
the cache lookup fails, and one of the libraries provides the
object, that object is added to the cache symbol table.

    Thus, if an object is in the cache (but not in an
MM-variable), lookup takes two symbol table checks.  If an object
is not in the cache, is found in the nth library searched, and
has to be put into the cache, lookup takes n+2 symbol table
checks plus some overhead (say about 3ms on OZ) to add the entry
to the cache.  Continuing the previous example, with 4 loaded
libraries, plus an additional one now (CACHE) for a total of 5
libraries, the first lookup takes about 7 * 2.1 + 3 = 18 ms.

    Note that the cache is cleared at various times, e.g. when a
library is loaded or unloaded.  Also, it is "swept" periodically
-- entries not used much are eliminated -- in order to keep the
size of the cache small, for better paging performance and
shorter cache table lookups.