Trailing-Edge
-
PDP-10 Archives
-
mit_emacs_170_teco_1220
-
info/spell.info
There are no other files named spell.info in the archive.
-*-Text-*-
This is the file <info>spell.info, which documents the ISPELL
spelling checker/corrector. The canonical "R" source for this is
[MIT-XX]SRC:<WBA>SPDOC.R
File: ISPELL Node: Top Up: (DIR) Next: Correcting
1. INTRODUCTION
&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&
& &
& LATE BULLETIN ----- On MIT-OZ, this program is called SPELL! &
& The one supplied by DEC is called OSPELL. &
& &
&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&
This memo documents a spelling check/correction program maintained on
the four MIT ITS computers and on the MIT-XX Twenex computer. The program
is called "SPELL" on ITS and "ISPELL" (for ITS-style SPELL) on Twenex.
(The name ISPELL is used on Twenex to avoid conflict with the standard DEC
TOPS-20 program called SPELL. This memo does NOT apply to the DEC TOPS-20
program or to programs called NSPELL or OSPELL on ITS.)
The program's behavior is nearly identical on ITS and Twenex. (The
principal difference is that arguments to a command are separated by commas
on ITS and spaces on Twenex.) A command consists of a command name and its
arguments on one line, for example
SET SCRIBE
LOAD MYDICT 3
CORRECT FOO.BAR FOO.OUT
CORRECT FOO.BAR FOO.OUT/LINE:13
ASK PROCEDE
SPELL is designed to process input text for text formatters TJ6, R,
SCRIBE, PUB, and TEX. It has the necessary knowledge about the command
structure of these programs to avoid attempting to check the spelling of
formatter commands. It can be told by means of special comments to
suppress checking of designated portions of the text.
SPELL's querying mechanism for correcting misspelled words works best
when it is used from a display terminal, but SPELL may be effectively used
from a printing terminal or slow display, especially if the "context
display" and "alternate spellings display" (see below) are disabled. It
may be used effectively from uppercase-only terminals, since it uses the
capitalization from the original word in the file when a spelling
correction is typed in.
SPELL has a dictionary of about 38000 words and requires about 43K of
memory to run.
Report bugs and complaints to BUG-SPELL@MIT-ML.
* Menu:
* Correcting:: The principal operation
* Training:: Making a list of all unknown words from a file
* Asking:: Asking about specific words, and other fun and games
* Emacs:(EMACS)Fixit
Using SPELL from EMACS to check the spelling of one word
* Load/Dump:: Loading and dumping dictionaries
* Options:: Switches that control formatter mode etc.
* Syntax:: Syntax of commands and file names, and operation from "JCL"
* Esoterica::
File: ISPELL Node: Correcting Previous: Top Up: Top Next: Training
2. The CORRECT command
The principal operation which SPELL performs is that of "correcting" a
file. This consists of reading the file and optionally writing an output
file in which misspellings are corrected. The corrections are made either
by substituting a word known to SPELL or by retyping the word manually.
Whether output is being written or not, every misspelling (that is, every
word not known to SPELL) is displayed, and the user is queried about the
action to be taken.
The command name "CORRECT" is followed by the filespec of the input
file, the optional filespec of the output file, and an optional switch
specifying the starting line number. If this line number is given, all
earlier lines in the input file are copied to the output file without being
checked. If no output is desired, omit the second file specification.
CORRECT INPUT.FILE OUTPUT.FILE
CORRECT INPUT.FILE OUTPUT.FILE/LINE:13
CORRECT INPUT.FILE (produce no corrected output)
2.1 Suppressing checking of designated portions of the text
When SPELL is being operated in a formatter mode, it can suppress
checking of regions containing figures drawn with strange fonts or other
nonsensical-looking text. It does this by looking for comments looking
like (for the R formatter) "^K &&&SPELLON" or "^K &&&SPELLOFF". The
comments must be exactly as shown: a control K, space, three ampersands,
and the word entirely in capitals. What follows later on the line is
immaterial. The SPELLOFF command disables checking until the next SPELLON
command. The unchecked text is still copied into the output file. The
commands are:
R "^K &&&SPELLON" and "^K &&&SPELLOFF"
TEX "% &&&SPELLON" and "% &&&SPELLOFF"
TJ6 ".C &&&SPELLON" and ".C &&&SPELLOFF"
PUB ".<< &&&SPELLON" and ".<< &&&SPELLOFF"
SCRIBE "@comment(&&&SPELLON)" and "@comment(&&&SPELLOFF)"
Remember that SPELL does not expand formatter macros. While it would be
convenient to put SPELLOFF and SPELLON commands in macros, it would not
have the desired effect. SPELL responds to the commands only when they
literally appear in the text.
2.2 Querying during a correction
When SPELL finds an unknown word during a correction, it displays:
(1) The unknown word.
(2) If the "LIST" option is on (which it normally is), a
list of words that are known to SPELL and are close to
the unknown word. One of these might be the word that
was intended. These words are displayed with index
numbers beginning with zero.
(3) If the "DISPLAY" option is on (which it normally is),
the line number and a few lines of the file showing the
context in which the word appeared.
The user then types one of the commands listed below. These commands are
acted on IMMEDIATELY. No <return> is typed after them. The intention is
to allow rapid interaction with the program. However, care is required to
avoid typing the wrong thing.
A or <space> Accept the word, but do not put it in the dictionary. It
will still be unknown, so SPELL will query again if it
encounters this word again.
I Accept the word and put it into dictionary number 1. The
word will henceforth be known, so SPELL will not query about
it again. If the dictionary is written out at the end of
the run, it can be read back in on future runs and SPELL
will know all such words.
D1 to D9 Accept the word and put it into dictionary number N.
0 to 9 Substitute the numbered choice for the unknown word. (This
is only useful if corrected output is being written.) The
corrected word will be given the same capitalization as the
unknown word, if possible. If not possible, SPELL will say
so.
R The user types in a new word to replace the unknown one.
(This is only useful if corrected output is being written.)
As above, the capitalization of the original word will be
used. The capitalization of the typed-in word is ignored.
(Hence SPELL can be operated effectively from
upper-case-only terminals.) SPELL then checks the spelling
of the new word. If the new word is unknown, SPELL queries
again, and the user can reply with any of the commands in
this list, including another "R". Therefore, if you are not
sure the word you are typing in is correctly spelled, type
it anyway. If it is wrong, SPELL will give you another
chance to fix it, perhaps by displaying alternative
spellings. If the new word is correctly spelled but SPELL
does not know it, typing "I" when queried the second time
will put it in the dictionary.
+/-<letter> Sets or clears the indicated option, just as for the SET or
CLEAR command at the top level. The letter typed after the
"+" or "-" is a single letter abbreviation for the option
name. It is the first letter of the option name, except for
TJ6, whose abbreviation is "J". After giving this command,
the user must still dispose of the unknown word.
W The unknown word, and all future words, are accepted. That
is, the rest of the input file is immediately copied to the
output file without further checking. This is sometimes
useful if the system is going down in one minute. It also
complements the optional starting line feature of the
CORRECT command.
^L The display is refreshed.
^G The entire correction operation is aborted, and SPELL goes
back to the top level to await the next operation command.
? A brief summary of the available commands is displayed,
showing the currently selected options.
File: ISPELL Node: Training Previous: Correcting Up: Top Next: Asking
3. The TRAIN command
This is a sort of "off-line" version of correcting. Instead of
querying the user, SPELL lists all unknown words in an "exceptions" file,
and puts them into dictionary number 1 also. It is often possible to
determine quickly whether a file contains errors by training and then
examining the exceptions file.
The word "TRAIN" is followed by the filespecs of the input file and
the exceptions file.
SPELL can suppress its training of designated portions of the text in
exactly the same way as during corrections, by using the "&&&SPELLON" and
"&&&SPELLOFF" comments.
File: ISPELL Node: Asking Previous: Training Up: Top Next: Load/Dump
4. The ASK command
This is used for manual interrogation of the dictionary about a single
word. The command is "ASK" followed by the word. SPELL will indicate
whether it knows the word. If the word is known, SPELL will also indicate
whether suffix removal was used, and whether the word has suffix flags.
(Users generally do not need to be concerned with this information.) If
the word is not known, SPELL will list similar words that it knows, if
any.
5. The JUMBLE command
This command is provided to solve "scrambled word" problems of the
sort that appear in some newspapers. The command "JUMBLE" is followed by
the string (not more than 8 letters). All permutations of the letters in
the string that are known to SPELL are displayed. Because the program is
not clever about redundant permutations, it will display words more than
once if they contain duplicated letters.
File: ISPELL Node: Load/Dump Previous: Asking Up: Top Next: Options
6. The LOAD and DUMP commands
The LOAD command reads a file and places the words in it into one of 9
dictionaries that SPELL has in addition to its main dictionary. The word
LOAD is followed by the dictionary filespec and an optional dictionary
number (1 through 9). If the number is not given, dictionary number 1 will
be loaded. (One incremental dictionary is usually sufficient.) The "DUMP"
operation has the same syntax and writes the indicated dictionary onto a
file.
A word may be in only one dictionary. If it is in any dictionary it
is "known" for purposes of correcting, training, etc. When loading a
dictionary, SPELL ignores any word that it already knows, even if it is in
a different dictionary than the one specified.
Incremental dictionaries are useful for maintaining private lists of
words that appear in one's files and are known to be correct but are not in
SPELL's main dictionary. When correcting a file, the user should type "I"
when such a word is encountered, and then dump the incremental dictionary
at the end of the run. This dictionary may than be read back in at the
start of the next run.
Dictionaries other than number 1 are occasionally useful for
maintaining different lists of jargon words or words specific to particular
subjects.
File: ISPELL Node: Options Previous: Load/Dump Up: Top Next: Syntax
7. The SET and CLEAR commands
SPELL has 7 "option" flags to control its operation:
"R" option Reading "R" text: ignore commands, font indicators,
comments, string and number register names, and inline
macro names. It correctly handles "hexadecimal" fonts as
in ^FAword^F*. Warning: Since command lines are ignored,
titles appearing in chapter, section, and figure macros
are not checked and should be inspected manually. The
same is true of string registers that are to expand to
actual text. Also, SPELL is not clever about things that
are quoted or translated, or other obscure features of
text formatters. For example, it thinks "^Q^K" starts a
comment, because it doesn't know about the action of ^Q.
This is true, to varying degrees, for all of the
formatter options.
"TJ6" option Reading TJ6 text: ignore spelling of command lines and
font indicators. See the warning under the "R" option.
"SCRIBE" option Reading SCRIBE text: ignore all words preceded by "@",
and the arguments to those commands whose arguments are
not expected to be text. Only comments of the form
@comment(...) are recognized and ignored. Comments of
the form @begin(comment)...@end(comment) are not. The
latter types of comments may be protected by additional
"&&&SPELLOFF" and "&&&SPELLON" comments, if desired. See
the warning under the "R" option.
"PUB" option Reading PUB text: like TJ6.
"TEX" option Reading TEX text: ignore commands, font indicators, and
comments. See the warning under the "R" option.
"LIST" option List suggested alternate spellings when it encounters an
unknown word. If you don't care about the alternate
spellings, you can make it run faster by turning this
off.
"DISPLAY" option Display the context and line number of the unknown word.
It saves wear and tear on printing terminals to turn this
off if you don't really need it.
Options are turned on by the command SET (option name), off by CLEAR
(option name). The HELP command displays the currently enabled options.
Only one of the formatter options ("TJ6", "R", "PUB", "SCRIBE", and "TEX")
may be on at one time. Turning any of them on clears the others.
The initial options are "R", "LIST", and "DISPLAY". In addition to
the top level commands SET and CLEAR, options may be set or cleared when
SPELL is querying during a correction. To do this, type a plus sign to
set, or a minus sign to clear, followed by a SINGLE LETTER abbreviation for
the option (or "J" for TJ6). For example, if you had DISPLAY off and you
decide you want to see the context of a particular word after all, you can
turn it on by typing "+D".
8. The EXIT, QUIT, and KILL commands
"KILL" exits from SPELL and kills the job. "EXIT" or "QUIT" exits but
allows it to be restarted.
File: ISPELL Node: Syntax Previous: Options Up: Top Next: Esoterica
9. ENTERING AND EDITING COMMANDS
A command consists of a command name ("CORRECT" etc.) followed by
whatever arguments (e.g. filespecs) are appropriate, terminated by a
carriage return. The command name may be abbreviated to any unambiguous
prefix. In the command line, rubout deletes the last character. ^W
deletes the last word. ^U deletes the entire line, allowing one to start
over. ^R redisplays the material typed so far, in case the screen got
messed up. ^F and altmode provide the usual completion and prompting
functions. A question mark gives a help message. ^V quotes the following
character in a filename.
Filespecs and other arguments are separated from each other by spaces,
for example
CORR FOO.BAR FOO.OUT
LOAD MYDICT 3
To specify the starting line number for a correction operation, follow the
last filespec by a slash, the switch "LINE", a colon, and the number, e.g.
CORR FOO.BAR FOO.OUT/LINE:13
In the JCL line, a comma is equivalent to a carriage return in terminating
the command, making it possible to put multiple commands into a JCL string
meaningfully.
In a filespec, the device and directory always default to the user's
connected device and directory. The filename extension defaults to the
following: "DCT" for dictionaries being loaded or dumped, "EXC" for a
training exception file, and, for text files, the usual extension for the
selected formatter mode, if there is one: "R" for R, "MSS" for Scribe, and
"TXT" for Pub. For corrected output, the default extension is the actual
extension of the input file. The generation number defaults to the highest
existing number when reading, one higher than that when writing.
10. STARTING SPELL WITH A COMMAND LINE
If a command line ("JCL") is supplied when SPELL is started, it will
use that line, for as long as it lasts, instead of taking commands from the
terminal. Any command error cancels the rest of the command line and
reverts to interactive operation.
Since <return> can't be put into the command line, SPELL allows comma
to separate commands. This is allowed only when the command is being read
from JCL.
example:
ISPELL LOAD MYDICT,CORRECT FOO BAR
loads dictionary MYDICT.DCT and corrects FOO.R, writing
corrected output to BAR.R.
ISPELL LOAD MYDICT,TRAIN FOO FOO.ERRS,KILL
Loads MYDICT.DCT and trains FOO.R, putting exceptions
into FOO.ERRS, and then kills itself.
File: ISPELL Node: Esoterica Previous: Syntax Up: Top
11. ESOTERICA
The following information should not be necessary for normal operation
of SPELL. It is included for completeness.
11.1 File operations
SPELL reads and writes files as strings of 128 bytes, 36 bits per
byte. On writing, the last word is padded with ^@. On reading, any
padding of ^@, ^A, ^B, or ^C at the end of the last word is removed.
11.2 Word identification algorithm
A word is any uninterrupted sequence of letters and apostrophes, which
does not begin or end with an apostrophe. Any punctuation, digit, or
control character separates words. Words are not checked if they are being
ignored under control of the text formatter mode. Any word consisting of a
single letter, or any word more than 40 letters long, is considered to be
correctly spelled.
11.3 Closeness algorithm
Two words are considered to be "close" if they differ in only one of
the following ways:
Two adjacent letters are interchanged.
One letter is different.
One letter is missing in one and present in the other.
This criterion is used in suggesting close words when an unknown word
is found while correcting a file or when an unknown word is asked for with
the "A" command. Hence the word SEQUENCE will be suggested if the program
encounters SEUQENCE, SERQUENCE, SEQUNCE, or SEQUENCW.
11.4 Dictionary policy
It is the policy of this program to contain only one spelling of a
word, even if ordinary dictionaries show two or more "acceptable"
spellings. Hence, the dictionary contains LABELED and LABELING, but not
LABELLED or LABELLING, even though all four are actually acceptable. The
intention is to enforce uniformity within each document. The author
apologizes for the restriction on creativity and diversity that this
necessitates, but believes that it is the best policy for this program.
The dictionary contains many technical and computer terms such as
MICROPROGRAM and DEBUGGER, but does not contain extreme jargon words such
as CONTROLIFY or VALRET. The dictionary contains no proper names other
than names of countries and states of the United States. The reason is
that it would be virtually impossible to contain all of the proper names
that commonly arise in normal use. Users should keep proper names (and
other correctly spelled words) that arise in their own work in private
dictionaries to avoid having to repeatedly tell SPELL to accept them.
The dictionary is significantly smaller than that found in other
spelling checkers, such as the DEC TOPS-20 program. The author believes
that the larger dictionary would not reduce the number of false misspelling
indications by very much. Users who find words that SPELL does not know,
but should, are urged to mail pointers to lists of such words (or the
documents in which they appear) to BUG-SPELL@MIT-ML. They will be
considered for addition to the dictionary. Users who wish to argue that an
extremely large dictionary would be better should mail pointers to specific
documents demonstrating this. The argument might be valid, but evidence is
needed.
11.5 The "BASK" command
The "BASK" command is a variant of the "ASK" command intended for
invocations of SPELL by other programs. The command line (presumably from
a JCL line) is "BASK" followed by the word to be looked up, and then the
name of the file to which the information is to be written, for example:
ISPELL BASK WHEREVER TEST.OUT,KILL
looks up the word "wherever", and writes a report in the file "TEST.OUT".
The filename extension of this file defaults to "RPT". The report is
similar to the information displayed after the "A" command, except that
formatting characters are absent and the result of the search is indicated
by the first character of the file.
If the word was found directly, the file consists of a "*" followed by
any dictionary flags that the word may have had.
If the word was found through suffix removal, the file consists of a
"+", the root word, and the dictionary flags of the root word.
If the word was not found and there are no words close to it, the file
consists of a "#".
If the word was not found but there are close words, the file consists
of a "&" immediately followed by a list of close words, separated
from each other by newlines.
The file is always terminated by a <newline>.
11.6 Dictionary flags
Words in SPELL's main dictionary (but not the other dictionaries) may
have flags associated with them to indicate the legality of suffixes
without the need to keep the full suffixed words in the dictionary. The
flags have "names" consisting of single letters. Their meaning is as
follows:
Let # and @ be "variables" that can stand for any letter. Upper case
letters are constants. "..." stands for any string of zero or more
letters, but note that no word may exist in the dictionary which is not at
least 2 letters long, so, for example, FLY may not be produced by placing
the "Y" flag on "F". Also, no flag is effective unless the word that it
creates is at least 4 letters long, so, for example, WED may not be
produced by placing the "D" flag on "WE".
"V" flag:
...E --> ...IVE as in CREATE --> CREATIVE
if # .ne. E, ...# --> ...#IVE as in PREVENT --> PREVENTIVE
"N" flag:
...E --> ...ION as in CREATE --> CREATION
...Y --> ...ICATION as in MULTIPLY --> MULTIPLICATION
if # .ne. E or Y, ...# --> ...#EN as in FALL --> FALLEN
"X" flag:
...E --> ...IONS as in CREATE --> CREATIONS
...Y --> ...ICATIONS as in MULTIPLY --> MULTIPLICATIONS
if # .ne. E or Y, ...# --> ...#ENS as in WEAK --> WEAKENS
"H" flag:
...Y --> ...IETH as in TWENTY --> TWENTIETH
if # .ne. Y, ...# --> ...#TH as in HUNDRED --> HUNDREDTH
"Y" FLAG:
... --> ...LY as in QUICK --> QUICKLY
"G" FLAG:
...E --> ...ING as in FILE --> FILING
if # .ne. E, ...# --> ...#ING as in CROSS --> CROSSING
"J" FLAG"
...E --> ...INGS as in FILE --> FILINGS
if # .ne. E, ...# --> ...#INGS as in CROSS --> CROSSINGS
"D" FLAG:
...E --> ...ED as in CREATE --> CREATED
if @ .ne. A, E, I, O, or U,
...@Y --> ...@IED as in IMPLY --> IMPLIED
if # .ne. E or Y, or (# = Y and @ = A, E, I, O, or U)
...@# --> ...@#ED as in CROSS --> CROSSED
or CONVEY --> CONVEYED
"T" FLAG:
...E --> ...EST as in LATE --> LATEST
if @ .ne. A, E, I, O, or U,
...@Y --> ...@IEST as in DIRTY --> DIRTIEST
if # .ne. E or Y, or (# = Y and @ = A, E, I, O, or U)
...@# --> ...@#EST as in SMALL --> SMALLEST
or GRAY --> GRAYEST
"R" FLAG:
...E --> ...ER as in SKATE --> SKATER
if @ .ne. A, E, I, O, or U,
...@Y --> ...@IER as in MULTIPLY --> MULTIPLIER
if # .ne. E or Y, or (# = Y and @ = A, E, I, O, or U)
...@# --> ...@#ER as in BUILD --> BUILDER
or CONVEY --> CONVEYER
"Z FLAG:
...E --> ...ERS as in SKATE --> SKATERS
if @ .ne. A, E, I, O, or U,
...@Y --> ...@IERS as in MULTIPLY --> MULTIPLIERS
if # .ne. E or Y, or (# = Y and @ = A, E, I, O, or U)
...@# --> ...@#ERS as in BUILD --> BUILDERS
or SLAY --> SLAYERS
"S" FLAG:
if @ .ne. A, E, I, O, or U,
...@Y --> ...@IES as in IMPLY --> IMPLIES
if # .eq. S, X, Z, or H,
...# --> ...#ES as in FIX --> FIXES
if # .ne. S, X, Z, H, or Y, or (# = Y and @ = A, E, I, O, or U)
...@# --> ...@#S as in BAT --> BATS
or CONVEY --> CONVEYS
"P" FLAG:
if @ .ne. A, E, I, O, or U,
...@Y --> ...@INESS as in CLOUDY --> CLOUDINESS
if # .ne. Y, or @ = A, E, I, O, or U,
...@# --> ...@#NESS as in LATE --> LATENESS
or GRAY --> GRAYNESS
"M" FLAG:
... --> ...'S as in DOG --> DOG'S
Note: The existence of a flag on a root word in the directory is not by
itself sufficient to cause SPELL to recognize the indicated word ending.
If there is more than one root for which a flag will indicate a given word,
only one of the roots is the correct one for which the flag is effective;
generally it is the longest root. For example, the "D" rule implies that
either PASS or PASSE, with a "D" flag, will yield PASSED. The flag must be
on PASSE; it will be ineffective on PASS. This is because, when SPELL
encounters the word PASSED and fails to find it in its dictionary, it
strips off the "D" and looks up PASSE. Upon finding PASSE, it then accepts
PASSED if and only if PASSE has the "D" flag. Only if the word PASSE is
not in the main dictionary at all does the program strip off the "E" and
search for PASS. Furthermore, some combinations of flags are forbidden to
allow for dense flag encoding to save space. For example, only one of the
"P", "J", or "V" flags may be on in any one word.
Therefore, be careful when installing dictionary flags. When in doubt
about how to install a flag, don't; let the program do it for you. When a
word is read into the main dictionary, whenever possible SPELL sets the
appropriate flag in an existing word instead of entering the word itself.
So, for example, reading PASSED into the dictionary would result in the "D"
flag being set correctly.
Warning: The above definitions are part of the internal behavior of SPELL,
and are subject to change without notice. Users copying the dictionary
should always get the then-current version of this document to get an
accurate description of the flags.
11.7 Preparing new versions of SPELL
The main dictionary is dictionary 0, and may be accessed by that
number. When it is dumped, its flags will be written also. A file
containing flags must not be read into a dictionary other than zero, nor
should it be read in if dictionaries other than zero contain any words.
(Otherwise, it may attempt unsuccessfully to set a flag on a word whose
dictionary number is nonzero.) Therefore, flags should be used only on the
dictionary that is loaded when a new version of SPELL is created.
To create a new version of SPELL, assemble it under MIDAS and start
it. Give the "LOAD" command and read in the master dictionary, specifying
dictionary zero. Then give the "WRITE" command, with the name of the
desired image file. On ITS, this should be a file such as "TS SPELL". On
Twenex, it should be a file such as "<SUBSYS>ISPELL.EXE". On ITS, it will
ask whether to purify it. Say "N", for the purification stuff is still not
repaired from an ancient version. On Twenex it will always be written in
sharable form.