Trailing-Edge
-
PDP-10 Archives
-
decus_20tap1_198111
-
decus/20-0004/17lisp.doc
There are no other files named 17lisp.doc in the archive.
SECTION 17
1
AUTOMATIC ERROR CORRECTION - THE DWIM FACILITY
17.1 Introduction
A surprisingly large percentage of the errors made by INTERLISP users
are of the type that could be corrected by another LISP programmer
without any information about the purpose of the program or expression
in question, e.g., misspellings, certain kinds of parentheses errors,
etc. To correct these types of errors we have implemented in INTERLISP
a DWIM facility, short for DO-What-I-Mean. DWIM is called automatically
2
whenever an error occurs in the evaluation of an INTERLISP expression.
DWIM then proceeds to try to correct the mistake using the current
context of computation plus information about what the user had
previously been doing, (and what mistakes he had been making) as guides
to the remedy of the error. If DWIM is able to make the correction, the
computation continues as though no error had occurred. Otherwise, the
procedure is the same as though DWIM had not intervened: a break occurs,
or an unwind to the last errorset, as described in Section 16. The
following protocol illustrates the operation of DWIM.
Example
The user defines a function fact of one argument, n. The value of
fact[n] is to be n factorial.
_DEFINEQ((FACT (LAMBDA (N) (COND
((ZEROP N9 1) ((T (ITIMS N (FACCT 8SUB1 N]
(FACT)
_
Note that the definition of fact contains several mistakes: itimes and
fact have been misspelled; the 9 in N9 was intended to be a right
------------------------------------------------------------------------
1
DWIM was designed and implemented by W. Teitelman. It is discussed
in [Tei2].
2
Currently, DWIM only operates on unbound atoms and undefined
function errors.
17.1
parenthesis, but the shift key was not depressed; similarly, the 8 in
8SUB1 was intended to be a left parenthesis; and finally, there is an
extra left parenthesis in front of the T that begins the final clause in
the conditional.
_PRETTYPRNT((FACCT] [1]
=PRETTYPRINT [2]
=FACT [3]
(FACT
[LAMBDA (N)
(COND
((ZEROP N9 1)
((T (ITIMS N (FACCT 8SUB1 N])
(FACT)
_
After defining fact, the user wishes to look at its definition using
PRETTYPRINT, which he unfortunately misspells.[1] Since there is no
function PRETTYPRINT in the system, a U.D.F. error occurs, and DWIM is
called. DWIM invokes its spelling corrector, which searches a list of
functions frequently used (by this user) for the best possible match.
Finding one that is extremely close, DWIM proceeds on the assumption
that PRETTYPRNT meant PRETTYPRINT, notifies the user of this, [2] and
calls prettyprint.
At this point, PRETTYPRINT would normally print (FACCT NOT PRINTABLE)
and exit, since facct has no definition. Note that this is not an
INTERLISP error condition, so that DWIM would not be called as described
above. However, it is obviously not what the user meant.
This sort of mistake is corrected by having prettyprint itself
explicitly invoke the spelling corrector portion of DWIM whenever given
a function with no expr definition. Thus with the aid of DWIM,
prettyprint is able to determine that the user wants to see the
definition of the function fact,[3] and proceeds accordingly.
_FACT(3] [4]
N9 [IN FACT] -> N ) ? YES
[IN FACT] (COND -- ((T --))) ->
(COND -- (T --))
ITIMS [IN FACT] -> ITIMES [5]
FACCT [IN FACT] -> FACT
8SUB1 [IN FACT] -> ( SUB1 ? YES
6
_PP FACT [6]
(FACT
[LAMBDA (N)
(COND
((ZEROP N)
1)
(T (ITIMES N (FACT (SUB1 N])
FACT
_
The user now calls his function fact.[4] During its execution, five
errors occur, and DWIM is called five times.[5] At each point, the error
17.2
is corrected, a message printed describing the action taken, and the
computation allowed to continue as if no error had occurred. Following
the last correction, 6 is printed, the value of fact(3). Finally, the
user prettyprints the new, now correct, definition of fact.[6]
In this particular example, the user was shown operating in TRUSTING
mode, which gives DWIM carte blanche for most corrections. The user can
also operate in CAUTIOUS mode, in which case DWIM will inform him of
intended corrections before they are made, and allow the user to approve
or disapprove of them. For most corrections, if the user does not
respond in a specified interval of time, DWIM automatically proceeds
with the correction, so that the user need intervene only when he does
not approve. Sample output is given below. Note that the user
responded to the first, second, and fifth questions; DWIM responded for
him on the third and fourth.
_FACT(3)
N9 [IN FACT] -> N ) ? YES [1]
U.D.F. T [IN FACT] FIX? YES [2]
[IN FACT] (COND -- ((T --))) ->
(COND -- (T --))
ITIMS [IN FACT] -> ITIMES ? ...YES [3]
FACCT [IN FACT] -> FACT ? ...YES [4]
8SUB1 [IN FACT] -> ( SUB1 ? NO [5]
U.B.A.
(8SUB1 BROKEN)
:
We have put a great deal of effort into making DWIM "smart", and
experience with perhaps fifty different users indicates we have been
very successful; DWIM seldom fails to correct an error the user feels it
should have, and almost never mistakenly corrects an error. However, it
3
is important to note that even when DWIM is wrong, no harm is done:
since an error had occurred, the user would have had to intervene anyway
if DWIM took no action. Thus, if DWIM mistakenly corrects an error, the
user simply interrupts or aborts the computation, UNDOes the DWIM change
using UNDO described in Section 22, and makes the correction he would
have had to make without DWIM. It is this benign quality of DWIM that
makes it a valuable part of INTERLISP.
17.2 Interaction with DWIM
DWIM is enabled by performing either DWIM[C], for CAUTIOUS mode, or
------------------------------------------------------------------------
3
Except perhaps if DWIM's correction mistakenly caused a destructive
computation to be initiated, and information was lost before the
user could interrupt. We have not yet had such an incident occur.
17.3
4
DWIM[T] for TRUSTING mode. In addition to setting dwimflg to T and
redefining faulteval and faultapply as described on page 17.11, DWIM[C]
sets approveflg to T, while DWIM[T] sets approveflg to NIL. The setting
of approveflg determines whether or not the user wishes to be asked for
approval before a correction that will modify the definition of one of
his functions. In CAUTIOUS mode, i.e., approveflg=T, DWIM will ask for
approval; in TRUSTING mode, DWIM will not. For corrections to
5
expressions typed in by the user for immediate execution, DWIM always
6
acts as though approveflg were NIL, i.e., no approval necessary. In
either case, DWIM always informs the user of its action as described
below.
Spelling Correction Protocol
The protocol used by DWIM for spelling corrections is as follows: If the
correction occurs in type-in, print = followed by the correct spelling,
followed by a carriage-return, and then continue, e.g.,
user types: _(SETQ FOO (NCOCN FIE FUM))
DWIM types: =NCONC
If the correction does not occur in type-in, print the incorrect
spelling, followed by [IN function-name], ->, and then the correct
7
spelling, e.g., ITIMS [IN FACT] -> ITIMES as shown on page 17.2. Then,
if approveflg=NIL, print a carriage return, make the correction and
------------------------------------------------------------------------
4
INTERLISP arrives with DWIM enabled in CAUTIOUS mode. DWIM can be
disabled by executing DWIM[] or by setting dwimflg to NIL. See page
17.20.
5
Typed into lispx. lispx is used by evalqt and break, as well as for
processing the editor's E command. Functions that call the spelling
corrector directly, such as editdefault (Section 9), specify whether
or not the correction is to be handled as type-in. For example, in
the case of editdefault, commands typed directly to the editor are
treated as type-in, so that corrections to them will never require
approval. Commands given as an argument to the editor, or resulting
from macro expansions, or from IF, LP, ORR commands etc. are not
treated as type-in, and thus approval will be requested if
approveflg=T.
6
For certain types of corrections, e.g., run-on spelling corrections,
8-9 errors, etc., dwim always asks for approval, regardless of the
setting of approveflg.
7
The appearance of -> is to call attention to the fact that the
user's function will be or has been changed.
17.4
continue. Otherwise, print a few spaces and a ? and then wait for
8
approval. The user then has six options. He can:
1. Type Y; DWIM types ES, and proceeds with the correction.
2. Type N; DWIM types O, and does not make the correction.
3. Type ^; DWIM does not make the correction, and furthermore
guarantees that the error will not cause a break. See footnote
on page 17.12.
4. Type control-E; for error correction, this has the same effect
as typing N.
9
5. Do nothing; in which case DWIM will wait a specified interval,
and if the user has not responded, DWIM will type ... followed
10
by the default answer.
6. Type space or carriage-return; in which case DWIM will wait
indefinitely. This option is intended for those cases where
the user wants to think about his answer, and wants to insure
that DWIM does not get "impatient" and answer for him.
The procedure for spelling correction on other than INTERLISP errors is
analogous. If the correction is being handled as type-in, DWIM
prints = followed by the correct spelling, and returns it to the
function that called DWIM, e.g., =FACT as shown on page 17.2.
Otherwise, DWIM prints the incorrect spelling, followed by the correct
spelling. Then if approveflg=NIL, DWIM prints a carriage-return and
returns the correct spelling. Otherwise, DWIM prints a few spaces and a
? and then waits for approval. The user can then respond with Y, N,
control-E, space, carriage return, or do nothing as described.
Note that since the spelling corrector itself is not errorset protected,
------------------------------------------------------------------------
8
Whenever an interaction is about to take place and the user has
typed ahead, DWIM types several bells to warn the user to stop
typing, then clears and saves the input buffers, restoring them
after the interaction is complete. Thus if the user has typed ahead
before a DWIM interaction, DWIM will not confuse his type ahead with
the answer to its question, nor will his type ahead be lost.
9
Equal to dwimwait seconds. DWIM operates by dismissing for 500
milliseconds, then checking to see if anything has been typed. If
not, it dismisses again, etc. until dwimwait seconds have elapsed.
Thus, there will be a delay of at most 1/2 second before DWIM
responds to the user's answer.
10
The default is always YES unless otherwise stated.
17.5
typing N and typing control-E may have different effects when the
11
spelling corrector is called directly. The former simply instructs the
spelling corrector to return NIL, and lets the calling function decide
what to do next; the latter causes an error which unwinds to the last
errorset, however far back that may be.
Parentheses Errors Protocol
As illustrated earlier on page 17.2, DWIM will correct errors consisting
12
of typing 8 for left parenthesis and 9 for right parenthesis. In these
cases, the interaction with the user is similar to that for spelling
correction. If the error occurs in type-in, DWIM types = followed by
the correction, e.g.,
user types: _(SETQ FOO 8CONS FIE FUM]
DWIM types: = ( CONS
lispx types: (A B C D)
Otherwise, DWIM prints the offending atom, [IN function-name], ->, the
proposed correction, a few spaces and a ?, and then waits for approval,
e.g., N9 [IN FACT] -> N ) ? as shown on page 17.2. The user then has
13
the same six options as for spelling correction. If the user types Y,
DWIM then operates exactly the same as when approveflg=NIL, i.e., makes
the correction and prints its message.
U.D.F. T Errors Protocol
DWIM corrects certain types of parentheses errors involving a T clause
in a conditional, namely errors of the form:
1. (COND --) (T --), i.e., the T clause appears outside and
immediately following the COND;
2. (COND -- (-- & (T --))), i.e., the T clause appears inside a
previous clause; and
------------------------------------------------------------------------
11
The DWIM error correction routines are errorset protection.
12
Actually, DWIM uses the value of the variables lparkey and rparkey
to determine the corresponding lower case character for left and
right parentheses. lparkey and rparkey are initially 8 and 9
respectively, but they can be reset for other keyboard layouts,
e.g., on some terminals left parenthesis is over 9, and right
parenthesis is over 0.
13
except the waiting time is 3*dwimwait seconds.
17.6
3. (COND -- ((T --))), i.e., the T clause has an extra pair of
14
parentheses around it.
If the error occurs in type-in, DWIM simply types T FIXED and makes the
correction. Otherwise if approveflg=NIL, DWIM makes the correction, and
prints a message consisting of [IN function-name], followed by one of
the above incorrect forms of COND, followed by ->, then on the next line
the corresponding correct form of the COND, e.g.,
[IN FACT] (COND -- ((T --))) ->
(COND -- (T --))
as shown on page 17.2.
If approveflg=T, DWIM prints U.D.F. T, followed by [IN function-name],
several spaces, and then FIX? and waits for approval. The user then has
the same options as for spelling corrections and parenthesis errors. If
the user types Y or defaults, DWIM then proceeds exactly the same as
when approveflg=NIL, i.e., makes the correction and prints its message,
as shown on page 17.3.
Having made the correction, DWIM must then decide how to proceed with
the computation. In case 1, (COND --) (T --), DWIM cannot know whether
the last clause of the COND before the T clause succeeded or not, i.e.,
if the T clause had been inside of the COND, would it have been entered?
Therefore DWIM asks the user 'CONTINUE WITH T CLAUSE' (with a default of
YES). If the user types N, DWIM continues with the form after the COND,
i.e., the form that originally followed the T clause.
In case 2, (COND -- (-- & (T --))), DWIM has a different problem. After
moving the T clause to its proper place, DWIM must return as the value
of the COND, the value of the expression corresponding to &. Since this
value is no longer around, DWIM asks the user, 'OK TO REEVALUATE' and
then prints the expression corresponding to &. If the user types Y, or
defaults, DWIM continues by reevaluating &, otherwise DWIM aborts, and a
U.D.F. T error will then occur (even though the COND has in fact been
15
fixed).
In case 3, (COND -- ((T --))), there is no problem with continuation, so
no further interaction is necessary.
------------------------------------------------------------------------
14
For U.D.F. T errors that are not one of these three types, DWIM
takes no corrective action at all, i.e., the error will occur.
15
If DWIM can determine for itself that the form can safely be
reevaluated, it does not consult the user before reevaluating. DWIM
can do this if the form is atomic, or car of the form is a member of
the list okreevalst, and each of the arguments can safely be
reevaluated, e.g., (SETQ X (CONS (IPLUS Y Z) W)) is safe to
reevaluate because SETQ, CONS, and IPLUS are all on okreevalst.
17.7
17.3 Spelling Correction
The spelling corrector is given as arguments a misspelled word (word
means literal atom), a spelling list (a list of words), and a number:
xword, splst, and rel respectively. Its task is to find that word on
splst which is closest to xword, in the sense described below. This
word is called a respelling of xword. rel specifies the minimum
"closeness" between xword and a respelling. If the spelling corrector
cannot find a word on splst closer to xword than rel, or if it finds two
or more words equally close, its value is NIL, otherwise its value is
16
the respelling.
The exact algorithm for computing the spelling metric is described later
on page 17.17, but briefly "closeness" is inversely proportional to the
number of disagreements between the two words, and directly proportional
to the length of the longer word, e.g., PRTTYPRNT is "closer" to
PRETTYPRINT than CS is to CONS even though both pairs of words have the
same number of disagreements. The spelling corrector operates by
proceeding down splst, and computing the closeness between each word and
xword, and keeping a list of those that are closest. Certain
differences between words are not counted as disagreements, for example
a single transposition, e.g., CONS to CNOS, or a doubled letter, e.g.,
CONS to CONSS, etc. In the event that the spelling corrector finds a
word on splst with no disagreements, it will stop searching and return
this word as the respelling. Otherwise, the spelling corrector
continues through the entire spelling list. Then if it has found one
and only one "closest" word, it returns this word as the respelling.
For example, if xword is VONS, the spelling corrector will probably
return CONS as the respelling. However, if xword is CONZ, the spelling
corrector will not be able to return a respelling, since CONZ is equally
close to both CONS and COND. If the spelling corrector finds an
acceptable respelling, it interacts with the user as described earlier.
In the special case that the misspelled word contains one or more alt-
modes, the spelling corrector operates somewhat differently. Instead of
trying to find the closest word as above, the spelling corrector
searches for those words on splst that match xword, where an alt-mode
can match any number of characters (including 0), e.g., FOO$ matches
FOO1 and FOO, but not NEWFOO. $FOO$ matches all three. In this case,
the entire spelling list is always searched, and if more than one
respelling is found, the spelling corrector prints AMBIGUOUS, and
returns NIL. For example, CON$ would be ambiguous if both CONS and COND
were on the spelling list. If the spelling corrector finds one and only
one respelling, it interacts with the user as described earlier.
For both spelling correction and spelling completion, regardless of
whether or not the user approves of the spelling corrector's choice, the
respelling is moved to the front of splst. Since many respellings are
------------------------------------------------------------------------
16
The spelling corrector can also be given an optional functional
argument, fn, to be used for selecting out a subset of splst, i.e.,
only those members of splst that satisfy fn will be considered as
possible respellings.
17.8
of the type with no disagreements, this procedure has the effect of
considerably reducing the time required to correct the spelling of
frequently misspelled words.
Synonyms
Spelling lists also provide a way of defining synonyms for a particular
context. If a dotted pair appears on a spelling list (instead of just an
atom), car is interpreted as the correct spelling of the misspelled
word, and cdr as the antecedent for that word. If car is identical with
the misspelled word, the antecedent is returned without any interaction
or approval being necessary. If the misspelled word corrects to car of
the dotted pair, the usual interaction and approval will take place, and
then the antecedent, i.e., cdr of the dotted pair, is returned. For
example, the user could make IFLG synonymous with CLISPIFTRANFLG by
adding (IFLG . CLISPIFTRANFLG) to spellings3, the spelling list for
unbound atoms. Similarly, the user could make OTHERWISE mean the same
as ELSEIF by adding (OTHERWISE . ELSEIF) to clispifwordsplst, or make L
be synonymous with LAMBDA by adding (L . LAMBDA) to lambdasplst. Note
that L could also be used as a variable without confusion, since the
association of L with LAMBDA occurs only in the appropriate context.
Spelling Lists
Any list of atoms can be used as a spelling list, e.g., brokenfns,
filelst, etc. Various system packages have their own spellings lists,
e.g., lispxcoms, prettycomsplst, clispforwordsplst, editcomsa, etc.
These are documented under their corresponding sections, and are also
indexed under "spelling lists." In addition to these spelling lists, the
system maintains, i.e., automatically adds to, and occasionally prunes,
four lists used solely for spelling correction: spellings1, spellings2,
17
spellings3, and userwords.
Spellings1 is a list of functions used for spelling correction when an
input is typed in apply format, and the function is undefined, e.g.,
EDTIF(FOO). Spellings1 is initialized to contain defineq, break,
makefile, editf, tcompl, load, etc. Whenever lispx is given an input in
apply format, i.e., a function and arguments, the name of the function
18
is added to spellings1. For example, typing CALLS(EDITF) will cause
CALLS to be added to spellings1. Thus if the user typed CALLS(EDITF)
and later typed CALLLS(EDITV), since spellings1 would then contain
------------------------------------------------------------------------
17
All of the remarks on maintaining spelling lists apply only when
DWIM is enabled, as indicated by dwimflg=T.
18
Only if the function has a definition.
17.9
19
CALLS, DWIM would be successful in correcting CALLLS to CALLS.
Spellings2 is a list of functions used for spelling correction for all
other undefined functions. It is initialized to contain functions such
as add1, append, cond, cons, go, list, nconc, print, prog, return, setq,
etc. Whenever lispx is given a non-atomic form, the name of the
function is added to spellings2. For example, typing
(RETFROM (STKPOS (QUOTE FOO) 2)) to a break would add retfrom to
spellings2. Function names are also added to spellings2 by define,
defineq, load (when loading compiled code), unsavedef, editf, and
prettyprint.
Spellings3 is a list of words used for spelling correction on all
unbound atoms. Spellings3 is initialized to editmacros, breakmacros,
brokenfns, and advisedfns. Whenever lispx is given an atom to evaluate,
20
the name of the atom is added to spellings3. Atoms are also added to
spellings3 whenever they are edited by editv, and whenever they are set
via rpaq or rpaqq. For example, when a file is loaded, all of the
variables set in the file are added to spellings3. Atoms are also added
to spellings3 when they are set by a lispx input, e.g., typing
(SETQ FOO (REVERSE (SETQ FIE --))) will add both FOO and FIE to
spellings3.
Userwords is a list containing both functions and variables that the
user has referred to e.g., by breaking or editing. Userwords is used
for spelling correction by arglist, unsavedef, prettyprint, break,
editf, advise, etc. Userwords is initially NIL. Function names are
added to it by define, defineq, load, (when loading compiled code, or
loading exprs to property lists) unsavedef, editf, editv, editp,
prettyprint, etc. Variable names are added to userwords at the same
time as they are added to spellings3. In addition, the variable
lastword is always set to the last word added to userwords, i.e., the
last function or variable referred to by the user, and the respelling of
NIL is defined to be the value of lastword. Thus, if the user has just
defined a function, he can then edit it by simply typing EDITF(), or
prettyprint it by typing PP().
Each of the above four spelling lists are divided into two sections
separated by a NIL. The first section contains the "permanent" words;
the second section contains the temporary words. New words are added to
21
the corresponding spelling list at the front of its temporary section.
------------------------------------------------------------------------
19
If CALLLS(EDITV) were typed before CALLS had been "seen" and added
to spellings1, the correction would not succeed. However, the
alternative to using spelling lists is to search the entire oblist,
a procedure that would make spelling correction intolerably slow.
20
Only if the atom has a value other than NOBIND.
21
Except that functions added to spellings1 or spellings2 by lispx are
always added to the end of the permanent section.
17.10
(If the word is already in the temporary section, it is moved to the
front of that section; if the word is in the permanent section, no
action is taken.) If the length of the temporary section then exceeds a
specified number, the last (oldest) word in the temporary section is
forgotten, i.e., deleted. This procedure prevents the spelling lists
from becoming cluttered with unimportant words that are no longer being
used, and thereby slowing down spelling correction time. Since the
spelling corrector moves each word selected as a respelling to the front
of its spelling list, the word is thereby moved into the permanent
section. Thus once a word is mispelled and corrected, it is considered
important and will never be forgotten.
The maximum length of the temporary section for spellings1, spellings2,
spellings3 and userwords is given by the value of #spellings1,
#spellings2, #spellings3, and #userwords, initialized to 30, 30, 30, and
60 respectively. Using these values, the average length of time to
22
search a spelling list for one word is about 4 milliseconds.
Generators for Spelling Correction
For some applications, it is more convenient to generate candidates for
a respelling one by one, rather than construct a complete list of all
possible candidates, e.g., spelling correction involving a large
directory of files, or a natural language data base. For these
purposes, splst can be an array (of any size). The first element of
this array is the generator function, which is called with the array
itself as its argument. Thus the function can use the remainder of the
array to store "state" information, e.g., the last position on a file, a
pointer into a data structure, etc. The value returned by the function
is the next candidate for respelling. If NIL is returned, the spelling
"list" is considered to be exhausted, and the closest match is returned.
If a candidate is found with no disagreements, it is returned
immediately without waiting for the "list" to exhaust.
17.4 Error Correction
As described in Section 16, whenever the interpreter encounters an
atomic form with no binding, or a non-atomic form car of which is not a
function or function object, it calls the function faulteval.
Similarly, when apply is given an undefined function, it calls
faultapply. When DWIM is enabled, faulteval and faultapply are
redefined to first call dwimblock, a part of the DWIM package. If the
user aborts by typing control-E, or if he indicates disapproval of
DWIM's intended correction by answering N as described on page 17.5, or
------------------------------------------------------------------------
22
If the word is at the front of the spelling list, the time required
is only 1 millisecond. If the word is not on the spelling list,
i.e., the entire list must be searched, the time is proportional to
the length of the list; to search a spelling list of length 60 takes
about 7 milliseconds.
17.11
23
if DWIM cannot decide how to fix the error, dwimblock returns NIL. In
this case, faulteval and faultapply proceed exactly as described in
Section 16, by printing a U.B.A. or U.D.F. message, and going into a
break if the requirements of breakcheck are met, otherwise unwinding to
the last errorset.
If DWIM can (and is allowed to) correct the error, dwimblock exits by
performing reteval of the corrected form, as of the position of the call
to faulteval or faultapply. Thus in the example at the beginning of the
chapter, when DWIM determined that ITIMS was ITIMES misspelled, DWIM
called reteval with (ITIMES N (FACCT 8SUB1 N)). Since the interpreter
uses the value returned by faulteval exactly as though it were the value
of the erroneous form, the computation will thus proceed exactly as
though no error had occurred.
In addition to continuing the computation, DWIM also repairs the cause
24
of the error whenever possible. Thus in the above example, DWIM also
changed (with rplaca) the expression (ITIMS N (FACCT 8SUB1 N)) that
caused the error.
Error correction in DWIM is divided into three categories: unbound
atoms, undefined cars of form, and undefined function in apply.
Assuming that the user approves if he is asked, the action taken by DWIM
for the various types of errors in each of these categories is
summarized below. The protocol of DWIM's interaction with the user has
been described earlier.
Unbound Atoms
1. If the first character of the unbound atom is ', DWIM assumes that
the user (intentionally) typed 'atom for (QUOTE atom) and makes the
appropriate change. No message is typed, and no approval requested.
If the unbound atom is just ' itself, DWIM assumes the user wants
the next expression quoted, e.g., (CONS X '(A B C)) will be changed
to (CONS X (QUOTE (A B C))). Again no message will be printed or
approval asked. (If no expression follows the ', DWIM gives up.)
2. If CLISP (Section 23) is enabled, and the atom is part of a CLISP
construct, the CLISP transformation is performed and the result
returned, e.g., N-1 is transformed to (SUB1 N).
------------------------------------------------------------------------
23
If the user answers with ^, (see page 17.5) dwimblock is exited by
performing reteval[FAULTEVAL;(ERROR!)], i.e., an error is generated
at the position of the call to faulteval.
24
If the user's program had computed the form and called eval, e.g.,
performed (EVAL (LIST X Y)) and the value of x was a misspelled
function; it would not be possible to repair the cause of the error,
although DWIM could correct the misspelling each time it occurred.
17.12
25
3. If the atom contains an 8, DWIM assumes the 8 was intended to be a
left parenthesis, and calls the editor to make appropriate repairs
on the expression containing the atom. DWIM assumes that the user
did not notice the mistake, i.e., that the entire expression was
affected by the missing left parenthesis. For example, if the user
types
(SETQ X (LIST (CONS 8CAR Y) (CDR Z)) Y), the expression will be
changed to (SETQ X (LIST (CONS (CAR Y) (CDR Z)) Y)).
The 8 does not have to be the first character of the atom, e.g.,
DWIM will handle (CONS X8CAR Y) correctly.
26
4. If the atom contains a 9 DWIM assumes the 9 was intended to be a
right parenthesis and operates as in number 3.
5. If the atom begins with a 7, the 7 is treated as a ', e.g., 7FOO
becomes 'FOO, and then (QUOTE FOO).
6. If the atom is an edit command (a member of editcomsa), and the
error occurred in type-in, the effect is the same as though the user
typed EDITF(), followed by the atom, i.e., DWIM assumes the user
wants to be in the editor editing the last thing he referred to.
Thus, if the user defines the function foo and then types P, he will
see =FOO, followed by EDIT, followed by the printout associated with
the execution of the P command, followed by *, at which point he can
continue editing foo.
7. If dwimuserfn=T, DWIM calls dwimuserfn, and if it returns a non-NIL
value, DWIM returns this value. dwimuserfn is discussed below.
8. If the unbound atoms occurs in a function, DWIM attempts spelling
correction using as a spelling list the list of lambda and prog
variables of the function.
9. If the unbound atom occurred in a type-in to a break, DWIM attempts
spelling correction using the lambda and prog variables of the
broken function.
10. Otherwise, DWIM attempts spelling correction using spellings3.
------------------------------------------------------------------------
25
actually the value of lparkey, initially 8. See footnote on page
17.6.
26
actually the value of rparkey, initially 9. See footnote on page
17.6.
17.13
If all fail, DWIM gives up.
Undefined car of Form
1. If car of the form is T, DWIM assumes a misplaced T clause and
operates as described on page 17.6.
2. If car of the form is F/L, DWIM changes the F/L to
FUNCTION(LAMBDA,e.g., (F/L (Y) (PRINT (CAR Y))) is changed to
(FUNCTION (LAMBDA (Y) (PRINT (CAR Y))). No message is printed and
no approval requested. If the user omits the variable list, DWIM
supplies (X), e.g., (F/L (PRINT (CAR X))) becomes
(FUNCTION (LAMBDA (X) (PRINT (CAR X)))). DWIM determines that the
user has supplied the variable list when more than one expression
follows F/L, car of the first expression is not the name of a
function, and every element in the first expression is atomic. For
example, DWIM will supply (X) when correcting
(F/L (PRINT (CDR X)) (PRINT (CAR X))).
3. If car of the form is IF, if, or one of the CLISP iterative
statement operators, e.g., FOR, WHILE, DO et al, the indicated
transformation is performed, and the result returned as the
corrected form.
4. If car of the form has a function definition, DWIM attempts spelling
correction on car of the definition using as spelling list the value
27
of lambdasplst, initially (LAMBDA NLAMBDA).
5. If car of the form has an EXPR property, DWIM prints car of the
form, followed by 'UNSAVED', performs an unsavedef, and continues.
No approval is requested.
6. If car of the form has a property FILEDEF, the definition is to be
found on a file. If the value of the property is atomic, the entire
file is to be loaded. If a list, car is the name of the file and
cdr the relevant functions, and loadfns will be used. DWIM first
checks to see if the file appears in the attached directory,
<NEWLISP>'s directory, or <LISP>'s directory, and if found, types
"SHALL I LOAD" followed by the file name or list of functions. If
the user approves, DWIM loads the function(s) or file, and continues
the computation. TRANSOR is handled in this fashion.
------------------------------------------------------------------------
27
The user may wish to add to lambdasplst if he elects to define new
"function types" via an appropriate dwimuserfn. For example, the
QLAMBDAs of SRI's QLISP are handled in this way.
17.14
7. If CLISP is enabled, and car of the form is part of a CLISP
construct, the indicated transformation is performed, e.g., (N_N-1)
becomes (SETQ N (SUB1 N)).
8. If car of the form contains an 8, DWIM assumes a left parenthesis
was intended e.g., (CONS8CAR X).
9. If car of the form contains a 9, DWIM assumes a right parenthesis
was intended.
10. If car of the form is a list, DWIM attempts spelling correction on
caar of the form using lambdasplst as spelling list. If successful,
DWIM returns the corrected expression itself.
11. If car of the form is a small number, and the error occurred in
type-in, DWIM assumes the form is really an edit command and
operates as described in case 6 of unbound atoms.
12. If car of the form is an edit command (a member of editcomsl), DWIM
operates as in 11.
13. If dwimuserfn=T, dwimuserfn is called, and if it returns a non-NIL
value, DWIM returns this value.
14. If the error occurs in a function, or in a type-in while in a break,
DWIM checks to see if the last character in car of the form is one
of the lambda or prog variables, and if the first n-1 characters are
the name of a defined function, and if so makes the corresponding
change, e.g., (MEMBERX Y) will be changed to (MEMBER X Y). The
protocol followed will be the same as for that of spelling
correction, e.g., if approveflg=T, DWIM will type
MEMBERX [IN FOO] -> MEMBER X?
15. Otherwise, DWIM attempts spelling correction using spellings2 as the
spelling list.
If all fail, DWIM gives up.
Undefined Function in Apply
1. If the function has a definition, DWIM attempts spelling correction
on car of the definition using lambdasplst as spelling list.
2. If the function has an EXPR property, DWIM prints its name followed
by 'UNSAVED', performs an unsavedef and continues. No approval is
requested.
17.15
3. If the function has a property FILEDEF, DWIM proceeds as in case 6
of undefined car of form.
4. If the error resulted from type-in, and CLISP is enabled, and the
function name contains a CLISP operator, DWIM performs the indicated
transformation, e.g., the user types FOO_(APPEND FIE FUM).
5. If the function name contains an 8, DWIM assumes a left parenthesis
was intended, e.g., EDIT8FOO].
6. If the "function" is a list, DWIM attempts spelling correction on
car of the list using lambdasplst as spelling list.
7. If the function is a number and the error occurred in type-in, DWIM
assumes the function is an edit command, and operates as described
in case 6 of unbound atoms, e.g., the user types (on one line) 3 -
1 P.
8. If the function is the name of an edit command (on either editcomsa
or editcomsl), DWIM operates as in 7, e.g., user types F COND.
9. If dwimuserfn=T, dwimuserfn is called, and if it returns a non-NIL
value, this value is treated as the form used to continue the
computation, i.e., it will be eval-ed.
10. Otherwise DWIM attempts spelling correction using spellings1 as the
spelling list,
11. Otherwise DWIM attempts spelling correction using spellings2 as the
spelling list.
If all fail, DWIM gives up.
17.5 DWIMUSERFN
Dwimuserfn provides a convenient way of adding to the transformations
that DWIM performs, e.g., the user might want to change atoms of the
form $X to (QA4LOOKUP X). The user defines dwimuserfn as a function of
no arguments, and then enables it by setting dwimuserfn to T. DWIM will
call dwimuserfn before attempting spelling correction, but after
performing its other transformations, e.g., F/L, 8, 9, CLISP, etc. If
dwimuserfn returns a non-NIL value, this value is treated as a form to
be evaluated and returned as the value of faulteval or faultapply.
Otherwise, if dwimuserfn returns NIL, DWIM proceeds as when dwimuserfn
is not enabled, and attempts spelling correction. Note that in the
event that dwimuserfn is to handle the correction, it is also
responsible for any modifications to the original expression, i.e., DWIM
simply takes its value and returns it.
17.16
In order for dwimuserfn to be able to function, it needs to know various
things about the context of the error. Therefore, several of DWIM's
internal variables have been made SPECVARS (See Section 18) and are
therefore "visible" to dwimuserfn. Below are a list of those variables
that may be useful.
faultx for unbound atoms and undefined car of form,
faultx is the atom or form. For undefined
functions in apply, faultx is the name of the
function.
faultargs for undefined functions in apply, faultargs is
the list of arguments.
faultapplyflg Is T for undefined functions in apply. (Since
faultargs may be NIL, faultapplyflg is necessary
to distinguish between unbound atoms and
undefined function in apply, since faultx is
atomic in both cases).
tail for unbound errors, tail is the tail car of
which is the unbound atom. Thus dwimuserfn can
replace the atom by another expression by
performing (/RPLACA TAIL expr)
parent for unbound atom errors, parent is the form in
which the unbound atom appears, i.e., tail is a
tail of parent.
type-in? true if error occurred in type-in.
faultfn name of function in which error occurred.
(faultfn is TYPE-IN when the error occurred in
type-in, and EVAL or APPLY when the error
occurred under an explicit call to EVAL
or APPLY).
dwimifyflg true if error was encountered during dwimifying
as opposed to during running the program.
expr definition of faultfn, or argument to eval,
i.e., the superform in which the error occurs.
17.6 Spelling Corrector Algorithm
The basic philosophy of DWIM spelling correction is to count the number
of disagreements between two words, and use this number divided by the
length of the longer of the two words as a measure of their relative
disagreement. One minus this number is then the relative agreement or
closeness. For example, CONS and CONX differ only in their last
character. Such substitution errors count as one disagreement, so that
the two words are in 75% agreement. Most calls to the spelling
17.17
28
corrector specify rel=70, so that a single substitution error is
permitted in words of four characters or longer. However, spelling
correction on shorter words is possible since certain types of
differences such as single transpositions are not counted as
disagreements. For example, AND and NAD have a relative agreement of
100.
The central function of the spelling corrector is chooz. chooz takes as
arguments: a word, a spelling list, a minimum relative agreement, and an
29
optional functional argument, xword, splst, rel, and fn respectively.
chooz proceeds down splst examining each word. Words not satisfying fn,
or those obviously too long or too short to be sufficiently close to
xword are immediately rejected. For example, if rel=70, and xword is 5
30
characters long, words longer than 7 characters will be rejected.
If tword, the current word on splst, is not rejected, chooz computes the
number of disagreements between it and xword by calling a subfunction,
skor.
skor operates by scanning both words from left to right one character at
31
a time. Characters are considered to agree if they are the same
characters; or appear on the same teletype key (i.e., a shift mistake),
------------------------------------------------------------------------
28
Integers between 0 and 100 are used instead of numbers between 0 and
1 in order to avoid floating point arithmetic.
29
fn=NIL is equivalent to fn=(LAMBDA NIL T).
30
Special treatment is necessary for words shorter than xword, since
doubled letters are not counted as disagreements. For example,
CONNSSS and CONS have a relative agreement of 100. (Certain teletype
diseases actually produce this sort of stuttering.) chooz handles
this by counting the number of doubled characters in xword before it
begins scanning splst, and taking this into account when deciding
whether to reject shorter words.
31
skor actually operates on the list of character codes for each word.
This list is computed by chooz before calling skor using dchcon, so
that no storage is used by the entire spelling correction process.
17.18
32
for example, * agrees with :, 1 with !, etc.; or if the character in
xword is a lower case version of the character in tword. Characters
that agree are discarded, and the skoring continues on the rest of the
characters in xword and tword.
If the first character in xword and tword do not agree, skor checks to
see if either character is the same as one previously encountered, and
not accounted-for at that time. (In other words, transpositions are not
handled by lookahead, but by lookback.) A displacement of two or fewer
positions is counted as a tranposition; a displacement by more than two
positions is counted as a disagreement. In either case, both characters
are now considered as accounted for and are discarded, and skoring
continues.
If the first character in xword and tword do not agree, and neither are
equal to previously unaccounted-for characters, and tword has more
characters remaining than xword, skor removes and saves the first
character of tword, and continues by comparing the rest of tword with
xword as described above. If tword has the same or fewer characters
remaining than xword, the procedure is the same except that the
33
character is removed from xword. In this case, a special check is
first made to see if that character is equal to the previous character
in xword, or to the next character in xword, i.e., a double character
typo, and if so, the character is considered accounted-for, and not
34
counted as a disagreement.
When skor has finished processing both xword and tword in this fashion,
the value of skor is the number of unaccounted-for characters, plus the
number of disagreements, plus the number of tranpositions, with two
qualifications: (1) if both xword and tword have a character
unaccounted-for in the same position, the two characters are counted
only once, i.e., substitution errors count as only one disagreement, not
two; and (2) if there are no unaccounted-for characters and no
------------------------------------------------------------------------
32
For users on model 33 teletypes, as indicated by the value of
model33flg being T, @ and P appear on the same key, as do L and /, N
and L, and O and _, and DWIM will proceed accordingly. The initial
value for model33flg is NIL. Certain other terminals, e.g., Anderson
Jacobs terminal, have keyboard layouts similar to the model 33,
i.e., N on same key as ^, etc. In this case, the user might also
want to set model33flg to T.
33
Whenever more than two characters in either xword or tword are
unaccounted for, skoring is aborted, i.e., xword and tword are
considered to disagree.
34
In this case, the "length" of xword is also decremented. Otherwise
making xword sufficiently long by adding double characters would
make it be arbitrarily close to tword, e.g., XXXXXX would correct to
PP.
17.19
disagreements, transpositions are not counted. This permits spelling
35
correction on very short words, such as edit commands, e.g., XRT->XTR.
17.7 DWIM Functions
dwim[x] If x=NIL, disables DWIM; value is NIL. If x=C,
enables DWIM in cautious mode; value is
CAUTIOUS. If x=T, enables DWIM in trusting
mode; value is TRUSTING. For all other values
of x, generates an error.
dwimify[x] x is a form or the name of a function. dwimify
performs all corrections and transformations
that would occur if x were actually run.
dwimify is undoable.
DW edit macro. dwimifies current expression.
addspell[x;splst;n] Adds x to one of the four spelling lists as
36
follows:
if splst=NIL, adds x to userwords and to
spellings2. Used by defineq.
If splst=0, adds x to userwords. Used by load
when loading exprs to property lists.
If splst=1, adds x to spellings1 (at end of
permanent section). Used by lispx.
if splst=2, adds x to spellings2 (at end of
permanent section). Used by lispx.
If splst=3, adds x to userwords and spellings3.
splst can also be a spelling list, in which case
n is the (optional) length of the temporary
section.
------------------------------------------------------------------------
35
Transpositions are also not counted when fastypeflg=T, for example,
IPULX and IPLUS will be in 80% agreement with fastypeflg=T, only 60%
with fastypeflg=NIL. The rationale behind this is that
transpositions are much more common for fast typists, and should not
be counted as disagreements, whereas more deliberate typists are not
as likely to combine tranpositions and other mistakes in a single
word, and therefore can use more conservative metric. fastypeflg is
initially NIL.
36
If x is already on the spelling list, and in its temporary section,
addspell moves x to the front of that section. See page 17.10 for
complete description of algorithm for maintaining spelling lists.
17.20
addspell sets lastword to x when splst=NIL, 0 or
3.
If x is not a literal atom, addspell takes no
action.
misspelled?[xword;rel;splst;flg;tail;fn]
If xword=NIL or alt-mode, misspelled?
prints = followed by the value of lastword, and
returns this as the respelling, without asking
for approval. Otherwise, misspelled? checks to
see if xword is really misspelled, i.e., if fn
applied to xword is true, or xword is already
contained on splst. In this case, misspelled?
simply returns xword. Otherwise misspelled?
computes and returns
fixspell[xword;rel;splst;flg;tail;fn]
37
fixspell[xword;rel;splst;flg;tail;fn;tieflg]
The value of fixspell is either the respelling
38
of xword or NIL. fixspell performs all of the
interactions described earlier, including
requesting user approval if necessary.
If xword=NIL or $ (alt-mode), the respelling is
the value of lastword, and no approval is
requested.
If flg=NIL, the correction is handled in type-in
mode, i.e., approval is never requested, and
xword is not typed. If flg=T, xword is typed
(before the =) and approval is requested if
approveflg=T.
If tail is not NIL, and the correction is
successful, car of tail is replaced by the
respelling (using /rplaca). In addition,
fixspell will correct misspellings caused by
------------------------------------------------------------------------
37
fixspell has some additional arguments, for internal use by DWIM.
38
If for some reason xword itself is on splst, then fixspell aborts
and calls error!. If there is a possibility that xword is spelled
correctly, misspelled? should be used instead of fixspell.
17.21
39
running two words together. In this case, car
of tail is replaced by the two words, and the
value of fixspell is the first one. For
example, if fixspell is called to correct the
edit command (MOVE TO AFTERCOND 3 2) with
tail=(AFTERCOND 3 2), tail would be changed to
(AFTER COND 2 3), and fixspell would return
AFTER (subject to user approval where
40
necessary).
If tieflg=T and a tie occurs, i.e., more than
one word on splst is found with the same degree
of "closeness", the first word is taken as the
correct spelling. If tieflg=NIL and a tie
occurs, fixspell returns NIL, i.e., no
correction. If tieflg=ALL, the value of
fixspell is a list of the respellings (even if
there is only one), and fixspell will not
perform any interaction with the user, nor
modify tail, the idea being that the calling
program will handle those tasks.
The time required for a call to fixspell with a spelling list of length
60 when the entire list must be searched is .5 seconds. If fixspell
determines that the first word on the spelling list is the respelling
and does not need to search any further, the time required is .02
seconds. In other words, the time required is proportional to the
number of words with which xword is compared, with the time for one
comparison, i.e., one call skor, being roughly .01 seconds (varies
slightly with the number of characters in the words being compared.)
------------------------------------------------------------------------
39
In this case, user approval is always requested. In addition, if the
first word contains either fewer than 3 characters, or fewer
characters than the second word, the default will be N. 'Run-on'
spelling corrections can be suppressed by setting the variable
runonflg to NIL (initially T).
40
If tail=T, fixspell will also perform run-on corrections, returning
a dotted pair of the two words in the event the correction is of
this type.
17.22
The function chooz is provided for users desiring spelling correction
without any output or interaction:
41
chooz[xword;rel;splst;tail;fn;tieflg]
The value of chooz is the corrected spelling of
42
xword or NIL; chooz performs no interaction
and no output. rel, splst, tail, tieflg, and fn
are as described under fixspell above. If tail
is not NIL and the misspelling consists of
running two words together, e.g., (BREAKFOO) for
(BREAK FOO), the value of chooz will be a dotted
pair of the two words, e.g., (BREAK . FOO). If
the correction involves a synonym (see page
17.9), i.e., if xword is or corrects to word1 in
(word1 . word2), the value of chooz will be
43
(word1 word2).
fncheck[fn;noerrorflg;spellflg;propflg;tail]
The task of fncheck is to check whether fn is
the name of a function and if not, to correct
44
its spelling. If fn is the name of a function
or spelling correction is successful, fncheck
adds the (corrected) name of the function to
userwords using addspell, and returns it as its
value.
noerrorflg informs fncheck whether or not the
calling function wants to handle the
unsuccessful case: if noerrorflg is T, fncheck
simply returns NIL, otherwise it prints fn NOT A
FUNCTION and generates a non-breaking error.
------------------------------------------------------------------------
41
chooz has some additional arguments, for internal use by DWIM.
42
chooz does not perform spelling completion, only spelling
correction.
43
not (word1 . word2), since this would correspond to a run-on
correction.
44
Since fncheck is called by many low level functions such as arglist,
unsavedef, etc., spelling correction only takes place when
dwimflg=T, so that these functions can operate in a small INTERLISP
system which does not contain DWIM.
17.23
If fn does not have a definition, but does have
an EXPR property, then spelling correction is
not attempted. Instead, if propflg=T, fn is
considered to be the name of a function, and is
returned. If propflg=NIL, fn is not considered
to be the name of a function, and NIL is
returned or an error generated, depending on the
value of noerrorflg.
fncheck calls misspelled? to perform spelling
correction, so that if fn=NIL, the value of
lastword will be returned. spellflg corresponds
to misspelled?'s fourth argument, flg. If
spellflg=T, approval will be asked if DWIM was
enabled in CAUTIOUS mode, i.e., if approveflg=T.
tail corresponds to the fifth argument to
misspelled?.
fncheck is currently used by arglist, unsavedef, prettyprint, break0,
breakin, chngnm, advise, printstructure, firstfn, lastfn, calls, and
edita. For example, break0 calls fncheck with noerrorflg=T since if
fncheck cannot produce a function, break0 wants to define a dummy one.
printstructure however calls fncheck with noerrorflg=NIL, since it
cannot operate without a function.
Many other system functions call misspelled? or fixspell directly. For
example, break1 calls fixspell on unrecognized atomic inputs before
attempting to evaluate them, using as a spelling list a list of all
break commands. Similarly, lispx calls fixspell on atomic inputs using
a list of all lispx commands. When unbreak is given the name of a
function that is not broken, it calls fixspell with two different
spelling lists, first with brokenfns, and if that fails, with userwords.
makefile calls misspelled? using filelst as a spelling list. Finally,
load, bcompl, brecompile, tcompl, and recompile all call misspelled? if
their input file(s) won't open.
17.24