Trailing-Edge - PDP-10 Archives - decuslib20-02 - decus/20-0041/kwic.rno
There are 2 other files named kwic.rno in the archive. Click here to see a list.
.title KWIC.RNO%1-1[5 Nov 1977]/HDT
KWIC.RNO%1-1[5 Nov 1977]/HDT
.s 2
.CENTER;Key-Word-In-Context Listing Program
.s 2
	KWIC produces a listing of a file in which the key words are listed
in alphabetical order within the context of the lines in which the words
appear.  The listing would be useful for maintaining  listings of bibliographies by
author and subject or for maintaining listings of any kind of a catalog.
.s 2
.subttl User Instructions
	The program uses two input files: a master file which contains the lines
to be keyword-indexed and a "stop" file which contains words which are ^&not\&
to be considered keywords.  I.e., keywords are those words in the master
file which are ^&not\& listed in the "stop" file.  A default file with
common words such as "and", "the", "a", "an", etc., is available for
those who don't need a special stop file.
	The program produces two output files: a file which contains the
listing and a file which lists the frequency of occurrence of the
	The default device for all files is DSK:.
	To use KWIC, create the master file according to the instructions below
and optionally create the "stop" file according to the instructions below.
Then type the command:
.s 1;.i 5;&_.L KWIC
.S 1
If the program file is not on the library area, see the programmer/consultant.
	KWIC will start by requesting the name of the master file:
.s 1;.i 5;^&MASTER FILE:\&
.S 1
This filename is required.  Type in the name of the file and extension
of the file  (note that all filenames must be typed as uppercase characters!).
	KWIC will request the name of the "stop" file:
.s 1;.i 5;^&STOP FILE:\&
.s 1
Supply the name and extension of your stop file if you have one.  The default
extension is ".STP".  If you do not want to create your own stop file, just
type a carriage return to use the default stop file, KWIC.STP, which is
available from the library area.
	KWIC will ask for the name of the listing (index) file:
.s 1;.i 5;^&INDEX FILE:\&
.s 1
If you type a carriage return, KWIC will give the listing file the name of your
master file and the extension of ".NDX".  If this is not acceptable, type in
your own filename and extension.  (Default extension is ".NDX".)
	KWIC will now ask for the name of the output frequency file:
.s 1;.i 5;^&FREQUENCY FILE:\&
.s 1
Again, if you type a carriage return, KWIC will give the fequency file the name of your
master file and the extension ".FRQ".  If this is not acceptable, type in
your own filename and extension.  (Default extension is ".FRQ".)
	Finally, KWIC asks for a listing title:
.s 1;.i 5;^&LISTING TITLE:\&
.S 1
The title typed in  will be printed at the top
of each page.  It can be up to 80 characters long.
	KWIC now reads in the stop file and informs you of the core
required, then reads in the master file and informs you of the additional
core required.  Finally, KWIC generates the listing and frequency
	If an error is detected in the master file, KWIC tries to indicate the line in error by typing
its line number.  If an error occurs and your master file does not
have sequence numbers, run your master file through SOS or LINED to generate line
sequence numbers, then run KWIC again.  Sequence numbers can be removed when 
the master file has been edited to be correct.
.s 2
.center;Master File Format
	The master file consists of the data to be indexed by the KWIC program.
This may be any type of alphanumeric data.
For example, the data could be
many book titles in a specific area of study or possibly
a whole library's catalog.  But the program is flexible enough to allow KWIC
indexing of a thesis paper or similar document.
	The lines of the file contain three or optionally four fields of data:
.le;A standard DEC line sequence number (optional).
.le;The field of data to be indexed.  This may be continued on any number
of lines in the file itself.  That is, carriage return-line feeds are ignored
within this field.  The field is terminated by an "=" character.  (Note that
this means you can't have an "=" character within the text of the
data field!)
The words in this field are the ones used to create the keyword-index
listing.  Words are strings of characters delimited by one or more spaces or
.le;A field of auxiliary information which you want
to be contained in the master file for listing purposes
but over which you don't want indexing to be done.  For example,
on a bibliographic listing, you might list the publisher name and copyright
date in this field. (Optional)
.le;The identification field.  This consists of a pair of "[" and "]"
characters surrounding a field of up to 10 characters used to uniquely
identify each item to be indexed.  Warning: use no tabs or spaces in this
field.  The "[" character starts this field and delimits the end of the
auxiliary information field, and the "]" character delimits the end
of the ID field and of the line to be indexed.
Since the carriage return-line feed is ignored, to continue a word on another line the user merely types the
rest of the word with no spaces.  But if he wishes to delimit the
word with a space he must type it either at the end of the line or at the
beginning of the continuation line.  For example,
.s 1;.i 5;The Ohio State Univ
.i 5;ersity
.s 1
represents the continuation of the same word across a line.  
.s 1;.i 5;The Ohio State
.i 5;#University
.s 1
represents two separate words.
.end note
	There are two conventions which may help new users of KWIC organize
their indices better.  The first is to place all author names (for example) in
parentheses.  This will make all the author names appear in one group in the
listing.  The second convention is to use a "/#" in front of any
word which is not in the title but which is of value in classifying
the index item.
.s 2
.center;Stop File Format
	The stop file consists of words which the user feels have no value
as index terms for his particular application.  One such stop list is available
from the library as KWIC.STP.  But if a user finds that other "useless" terms
appear in his listing, he may supplement the original stop list by creating his
	The stop list is a file of words, one word per line, which
is sorted in alphabetical order.  The file may have standard DEC sequence 
numbers.  Tabs and spaces are ignored in this file.
	To create a stop file of his own, the user may start by
copying KWIC.STP from the library area.  It might be named 
KWIC.STP in the user's area, but any other name will do.
It is most convenient to use the extension ".STP", but that is not required.
Now edit that stop file and insert into it the additional
words which are to be excluded from the keyword indexing.  Insert these words
in alphabetical order (unless you know how to use the SORT program to reorder
the file).
Delete from the stop file any words which you feel would
be of use to you as keywords.
	Now when you run KWIC, give the name of this file as the STOP
.subttl Example
.s 2
	The following file, named EXAMP.MAS, was created as a disk file:
.s 2
(McCracken) A Guide to ALGOL Programming=
	Wiley, 1967 [M-101]
(McCracken) A Guide to FORTRAN Programming=
	Wiley, 2nd edition, 1972 [M-102]
(Didday) and (Page) FORTRAN for Humans=West, 1975[DP-101]
(Meissner) The Science of Computing= Wadsworth, 1972 [M-103]
(Murrill) and (Smith) Intro to Computer Science=
	IEP, 1970 [M-104]
(Pizer) Numerical Computing and Math Analysis=SRA, 1974[P-101]
.s 2
The KWIC program was run with the following commands:
.s 1
.i 5;&_.L KWIC
.S 1
.S 1
.I 5;^&STOP FILE:\&#<typed RETURN key>
.i 5;^&INDEX FILE:\&#<typed RETURN key>
.i 5;^&FREQUENCY FILE:\&#<typed RETURN key>
.S 1
The listing file, EXAMP.NDX, which was created from this example file and
command sequence was the following:
KEY-WORD-IN-CONTEXT VERSION 1 14:13 5-NOV-77                 PAGE 1

.s 2
	(a listing of stop words appears here)
.s 3
KEY-WORD-IN-CONTEXT VERSION 1 14:13 5-NOV-77                 PAGE 2

.s 2
	(the following listing has been truncated on the
	 left side for the purposes of printing this
.s 2
            (DIDDAY) AND (PAGE) FORTRAN FOR HUMANS                [DP-101
            (MCCRACKEN) A GUIDE TO ALGOL PROGRAMMING              [M-101
            (MCCRACKEN) A GUIDE TO FORTRAN PROGRAMMING            [M-102
            (MEISSNER) THE SCIENCE OF COMPUTING                   [M-103
IDDAY) AND  (PAGE) FORTRAN FOR HUMANS                             [DP-101
RRILL) AND  (SMITH) INTRO TO COMPUTER SCIENCE                     [M-104
A GUIDE TO  ALGOL PROGRAMMING                                     [M-101
G AND MATH  ANALYSIS                                              [P-101
) INTRO TO  COMPUTER SCIENCE                                      [M-104
SCIENCE OF  COMPUTING                                             [M-103
 NUMERICAL  COMPUTING AND MATH ANALYSIS                           [P-101
A GUIDE TO  FORTRAN PROGRAMMING                                   [M-102
AND (PAGE)  FORTRAN FOR HUMANS                                    [DP-101
CRACKEN) A  GUIDE TO ALGOL PROGRAMMING                            [M-101
CRACKEN) A  GUIDE TO FORTRAN PROGRAMMING                          [M-102
ORTRAN FOR  HUMANS                                                [DP-101
ND (SMITH)  INTRO TO COMPUTER SCIENCE                             [M-104
PUTING AND  MATH ANALYSIS                                         [P-101
E TO ALGOL  PROGRAMMING                                           [M-101
TO FORTRAN  PROGRAMMING                                           [M-102
SSNER) THE  SCIENCE OF COMPUTING                                  [M-103
O COMPUTER  SCIENCE                                               [M-104
.subttl Restrictions
.le;The line size for the listing (.NDX) file is set at 132 (line-printer width).
This can be changed as a parameter in the program source file.
.le;The listing file contains 50 lines per page. (Can be changed in the
source program.)
.le;The maximum number of characters per word is 50. (Can be changed in the
source program.)
.le;Maximum number of occurrences of a keyword is 300. (Can be changed in the
source program.)
.le;Size of the ID field is 10 characters. (Can be changed in the
source program.)
.le;Words listed in the stop list will be truncated to 12 characters; all
characters of the stop word are used in the processing, however.
.subttl Error Messages
.s 2
Error Messages
.s 1;Device specified in an input parameter or a default specification was
not correct or is unavailable to the user.
.S 1;The file specifed ('XXXXX') could not be found.  Probably a mistype
(though note that KWIC accepts only upper case letters for file names).
.S 1;The directory on the output device specifed for the 'XXXXX' listing was full.
.S 1;A device error occurred on the 'XXXXX' file while reading.  Probably
a hardware error.
.S 1;A device error occurred on the 'XXXXX' file while writing.  Probably
a hardware error.
.s 1;The program releases the master file for a short period while it reads
in the stop list file.  This is so on a DECtape system these two files may be
on the same drive.  This error occurs when it looks for the file the second
time (after the stop list has been read in) and cannot find it.  This should
never occur.  Fatal error.
.S 1;A CORE UUO failed while deallocating core.  Should be an impossible situation.
Fatal error.
.s 1;A word longer than the length specified by the 'SIZWRD' parameter in
the KWIC program (currently set at 50) was exceeded.  The string 'CCCCCC' will
be the word in error.
Fatal error.
.S 1;More than the number of identical index items specified by the 'MAXSAM'
parameter in the KWIC source program was exceeded (currently set at 300).
The word 'CCCCCC' is the word which occurred an excessive number of times.
Fatal error.
.s 1;If the CORE UUO fails while trying to read in data, this
message is printed.  After a 30 second delay, the program will try again
to allocate the core.  It continues looping until it gets the core.
This is useful for non-swapping systems where a user can wait for the core
needed to be freed.  Should be considered a fatal error for a swapping
.s 1;The line NNNNN or one of the nearby lines  is bad.  If the master input
file does not have line sequence numbers, no number 'NNNNN' is printed.
In this case, the user should run SOS or LINED on the master file, then
rerun KWIC to find the line with the error.  The syntax error
types are listed below.  The errors are fatal.
.S 1;Means the size of the ID number is greater than the
'IDSIZ' parameter in the source program (currently set at 10).
.s 1;A line is missing the ID field ("[" and
"]" characters).
.S 1;Missing an "=" on a line
.S 1;Undiagnosable error.  (Examine the line and preceding line
.subttl Origin
.s 2
	The KWIC program was written by G. B. Moersdorf at Ohio State
University.  It is available from the DECUS DEC-10/20 program 
.subttl Additional Documentation
.s 2
	Additional, more technical documentation is available as KWIC.DOC
from the DECUS library tapes.
.s 3
[End of KWIC.RNO]