Google
 

Trailing-Edge - PDP-10 Archives - decuslib10-09 - 43,50466/kwic.doc
There are 5 other files named kwic.doc in the archive. Click here to see a list.
LIBRARY PROGRAM #3.7.1
CALLING NAME:	KWIC
PROGRAMMED BY*
ADAPTED BY:	RUSSELL BARR,  III
PREPARED BY:	RUSSELL R BARR III
APPROVED BY:	JACK R. MEAGHER
DATE:	DECEMBER, 1973


			KEY-WORD-IN-CONTEXT PROGRAM

TABLE OF CONTENTS
1.0	PURPOSE AND EXPLANATION
2.0	LIMITATIONS
3.0	PROGRAM QUESTIONS AND HOW TO ANSWER THEM
4.0	FORMAT OF MASTER FILE AND STOP FILE
5.0	SAMPLE RUN


1.0	PURPOSE AND EXPLANATION

KWIC PRODUCES AN ALPHABETIZED LISTING OF EVERY 'KEY' WORD IN A MASTER
FILE OF TEXT.  KEY WORDS ARE THOSE WHICH ARE NOT DEFINED BY THE USER
IN A "STOP" FILE.  IN OTHER WORDS THE STOP FILE IS A LIST OF WORDS TO
BE IGNORED BY THE KWIC PROGRAM.

THE LIST OF KEY WORDS IS PRINTED IN CONTEXT, MEANING THAT IT IS PRINTED
SURROUNDED BY THE WORDS IN WHICH IT APPEARS IN THE MASTER FILE.  NOTE
THAT IN THE EXAMPLE IN SECTION 5.0 THE KEYWORDS ARE, IN ORDER, READING
DOWN THE CENTER COLUMN:	'ARE', 'BRACKETS', 'ENDED',---ETC.

KWIC ALSO PRODUCES A LIST OF ALL KEYWORDS AND THE NUMBER OF TIMES EACH
APPEARS.


2.0	LIMITATIONS
	(1)	THE MAXIMUM NUMBER OF CHARACTERS IN A WORD IS 50.
	(2)	THE MAXIMUM NUMBER OF TIMES A KEYWORD MAY APPEAR IN A
		MASTER FILE IS 300.
	(3)	THE I.D. FIELD IS LIMITED TO 10 CHARACTERS.
	(4)	THE MAXIMUM NUMBER OF KEYWORDS IN THE MASTER FILE IS
		APPROXIMATELY 5000.
	(5)	THIS PROGRAM USES A LARGE AMOUNT OF COMPUTING TIME FOR
		LARGE FILES.

3.0	PROGRAM QUESTIONS AND HOW TO ANSWER THEM
IN THIS SECTION TEXT TYPED OUT BY THE COMPUTER IS ENCLOSED IN QUOTES.

  3.1	"MASTER FILE:"
	ENTER SPECIFICATIONS DEFINING THE SOURCE OF THE MASTER TEXT.
	THE FOLLOWING EXPLANATION OF THE SPECIFICATIONS APPLIES TO 
	QUESTIONS 3.1-3.4.  STATEMENTS ABOUT INPUT APPLY TO QUESTIONS
	3.1 AND 3.2.  STATEMENTS ABOUT OUTPUT APPLY TO 3.3 AND 3.4.

*WRITTEN BY G.B. MOERSDORF OF OHIO STATE UNIVERSITY.  RECEIVED THRU
DIGITAL EQUIPMENT COMPUTER USERS SOCIETY AND MODIFIED FOR WMU BY 
RUSSELL BARR III.



	THE NORMAL RESPONSE TO EACH OF THESE QUESTIONS CONSISTS OF 
	THREE BASIC PARTS:  A DEVICE, A FILENAME, AND A PROJECT-
	PROGRAMMER NUMBER.

	THE GENERAL FORMAT FOR THESE PARTS IS AS FOLLOWS:
			DEV:FILE.EXT[PROJ,PROG]

1)  DEV:  ANY OF THE FOLLOWING DEVICES ARE APPROPRIATE WHERE INDICATED:

                 DEVICE LIST         DEFINITION           STATEMENT USE
 
                    TTY:             TERMINAL             INPUT OR OUTPUT
                    DSK:             DISK                 INPUT OR OUTPUT
                    CDR:             CARD READER          INPUT ONLY
                    LPT:             LINE PRINTER         OUTPUT ONLY
                    DTA0:            DECTAPE 0            INPUT OR OUTPUT
                    DTA1:            DECTAPE 1            INPUT OR OUTPUT
                    DTA2:            DECTAPE 2            INPUT OR OUTPUT
                    DTA3:            DECTAPE 3            INPUT OR OUTPUT
                    DTA4:            DECTAPE 4            INPUT OR OUTPUT
                    DTA5:            DECTAPE 5            INPUT OR OUTPUT
                    DTA6:            DECTAPE 6            INPUT OR OUTPUT
                    DTA7:            DECTAPE 7            INPUT OR OUTPUT
                    MTA0:            MAGNETIC TAPE 0      INPUT OR OUTPUT
                    MTA1:            MAGNETIC TAPE 1      INPUT OR OUTPUT
 
INPUT MAY NOT BE DONE FROM THE LINE PRINTER NOR MAY OUTPUT GO TO
THE CARD READER.

2)  FILE.EXT IS THE NAME AND EXTENSION OF THE FILE TO BE USED.  THIS
    PART OF THE SPECIFICATION IS USED ONLY IF DISK OR DECTAPE IS USED.

3)  [PROJ,PROG]  IF A DISK IS USED AND THE USER WISHES TO READ A FILE
    IN ANOTHER PERSON'S DIRECTORY, HE MAY DO SO BY SPECIFYING THE 
     PROJECT-PROGRAMMER NUMBER OF THE DIRECTORY FROM WHICH HE WISHES
     TO READ.  THE PROJECT NUMBER AND THE PROGRAMMER NUMBER MUST BE
     SEPARATED BY A COMMA AND ENCLOSED IN BRACKETS.  OUTPUT MUST GO
     TO YOUR OWN AREA.

4)   IT IS NOT RECOMMENDED THAT CDR: OR LPT: BE USED AS INPUT OR
     OUTPUT RESPECTIVELY EXCEPT THRU BATCH.

5)   THE MASTER FILE AND THE STOP FILE MAY NOT BE READ FROM THE SAME
     DECTAPE AND THE INDEX FILE AND THE FREQUENCY FILE MAY NOT BE 
     WRITTEN TO THE SAME DECTAPE.

     EXAMPLE:

	MASTER FILE:	DSK:DATA.DAT[71171,71026]

     IN THE EXAMPLE, THE MASTER FILE: IS A DISK FILE OF NAME DATA.DAT
     IN USER DIRECTORY [71171,71026]

     DEFAULTS:

1)   IF NO DEVICE IS SPECIFIED BUT A FILENAME IS SPECIFIED THE
     DEFAULT DEVICE WILL BE DSK:

2)   IF NO FILENAME IS GIVEN, THE DEVICE DSK: WILL BE ASSUMED AND THE
     DEFAULT FILENAME "KWIC" IS USED WITH THE FOLLOWING DEFAULT 
    EXTENSIONS:

			'MAS'	FOR MASTER FILE
			'STP'	FOR STOP FILE
			'NDX'	FOR INDEX FILE
			'FRQ'	FOR FREQUENCY FILE

3)   TO ENTER A FILENAME WITH A NULL EXTENSION ENTER THE FILENAME
     FOLLOWED BY A PERIOD AND A SPACE, OTHERWISE THE DEFAULT EXTENSION
     WILL BE USED.

4)   IF NO PROJECT-PROGRAMMER NUMBER IS GIVEN, THE USER'S OWN NUMBER
     WILL BE ASSUMED.

QUESTION 3.2 IS NEXT.

  3.2	STOP FILE
	ENTER SPECIFICATIONS DEFINING THE SOURCE OF THE STOP WORDS
	IN THE SAME MANNER AS QUESTION 3.1.  QUESTION 3.3 IS NEXT.

  3.3	INDEX FILE:
	ENTER SPECIFICATIONS DEFINING THE DESTINATION OF THE INDEX
	THE SAME WAY AS QUESTION 3.1.  THE NEXT QUESTION IS 3.4.

  3.4	FREQUENCY FILE:
	ENTER SPECIFICATIONS DEFINING THE DESTINATION OF THE 
	FREQUENCY LIST IN THE SAME MANNER AS 3.1.  QUESTION 3.5 IS NEXT.

  3.5	LISTING TITLE
	ENTER A TITLE OR HEADER OF UP TO 80 CHARACTERS TO BE USED FOR
	THE OUTPUT.  TYPE A RETURN IF NO TITLE IS DESIRED.  AFTER THE
	TITLE IS ENTERED A SERIES OF STATEMENTS WILL BE PRINTED
	INDICATING THE PROGRESS AND AMOUNTS OF CORE USED BY THE 
	PROGRAM.  WHEN THE OUTPUT IS COMPLETE, THE PROGRAM WILL EXIT
	TO MONITOR.

4.0	FORMAT OF THE MASTER AND STOP FILES
  4.1	MASTER FILE
	THE MASTER FILE CONSISTS OF FIELDS OF DATA TO BE PROCESSED
	BY THE KWIC PROGRAM AND OUTPUT IN INDEX FORM.  THIS MAY BE
	ANY TYPE OF ALPHANUMERIC DATA.

		GENERAL FORMAT OF EACH FIELD:
		1)  STANDARD SEQUENCE NUMBERS FORM LINED (OPTIONAL)
		2)  GROUP OF DATA TO BE INDEXED MAY BE CONTINUED 
		     ON ANY NUMBER OF LINES (I.E. CARRIAGE RETURNS
		    ARE IGNORED).  IF NO SPACE OR TAB APPEARS EITHER
		    AT THE END OF A CONTINUED LINE OR AT THE BEGINNING
		    OF THE NEXT, THE CHARACTERS AT THE END OF THE
		    FIRST LINE AND THE BEGINNING OF THE NEXT ARE
		    CONSIDERED ONE WORD.  (SEE LINES [3] AND [123]
		    IN THE EXAMPLE IN SECTION 5.0).
		3)  THE DELIMITER FOR THIS FIELD, AN '='
		4)  AA FIELD OF DATA THAT WILL BE IGNORED (OPTIONAL)
		5)  THE IDENTIFICATION DELIMITER CHARACTER, A '['
		6)  THE IDENTIFICATION TO BE ASSOCIATED WITH THE FIELD.
		    THIS IS A MAXIMUM OF 10 CHARACTERS WITH NO SPACES
		    OR TABS ALLOWED.
		7)  THE END OF I.D. DELIMITER, A']'


NOTE 1: A SPACE AND A TAB ARE THE ONLY CHARACTERS THAT DELIMIT A WORD FROM ITS
	NEIGHBOR.  SEQUENTIAL SPACES OR TABS ARE REDUCED TO ONE SPACE ON 
	THE INDEX.

NOTE 2: DO NOT USE A STRING OF MORE THAN 5 BLANKS CHARACTERS IN A DATAFILE

NOTE 3: SEQUENCE NUMBER ASSIST IN FINDING ERRORS IN THE MASTER FILE BUT ARE
	NOT NECESSARY.

NOTE 4: THE "=" AND THE TEXT BETWEEN THE "=" AND THE "[" DOES NOT APPEAR IN
	THE OUTPUT.

NOTE 5: THE "[" AND THE INDENTIFICATION APPEARS AT THE RIGHT MARGIN OF HTE
	INDEX.

SEE BEGINNING OF SECTION 5.0 FOR EXAMPLES OF FIELDS TO BE SUBMITTED.
   4.2  STOP FILE
	THE STOP FILE CONSISTS OF A LIST OF WORDS THAT ARE NOT TO BE USED
	AS KEY WORDS FOR INDEXING.  THERE MUST BE AT LEAST ONE WORD AND
	EVERY WORD MUST BE FOLLOWED BY A CARRIAGE RETURN.  THE STOP  FILE
	IS NOT PRINTED AS PART OF THE OUTPUT IF THE OUTPUT IS TO DEVICE 'TTY:'.


5.0  SAMPLE RUN

THE SAMPLE RUN USES THE FOLLOWING DATA FILE NAMED SAMPLE.MAS.

=THIS LINE IS IGNORED[1]
NOT THIS ONE THOUGH=[2]
MULTIPLE LINES
ARE ENDED WITH AN =[3]
NUMBERS IN BRACKETS ARE IGNORED AS ARE 
WORDS AFTER =THE EQUAL SIGN[123]


THE SAMPLE RUN USES THE FOLLOWING STOP FILE NAMED KWIC.STP ON AREA
[1,4].  THIS STOP FILE IS AVAILABLE TO ALL USERS.

NOTE:   THIS FILE IS ONE WORD PER LINE.

A        ABOUT      ABOVE    ACROSS   AFTER    AGAINST  ALL      ALONG
ALSO     ALTHOUGH   ALWAYS   AMONG    AN       AND      ANOTHER  ANY
ARE      AROUND     AS       AT       BE       BECAUSE  BEEN     BEFORE
BEHIND   BELOW      BENEATH  BESIDE   BETWEEN  BEYOND   BUT      BY
CAN      DO         DONE     DOWN     DURING   EACH     ENOUGH   EVER
EXCEPT   FOR        FOUND    FROM     GET      GETTING  HAS      HAVE
HER      HERS       HIM      HIS      I        IF       IN       INDEED
INSIDE   INTO       IS       IT       ITS      KNOW     LESS     LEST
LIKE     MAY        MORE     MUST     MY       NEAR     NEEDS    NO
NOT      OF         OFF      ON       ONTO     OR       OUR      OVER
SELDOM   SHE        SINCE    SO       SOME     THAN     THAT     THE
THEIR    THEIRS     THEM     THEN     THESE    THEY     THIS     THOSE
THOUGH   THROUGH    THUS     TO       TOWARD   UNDER    UNLESS   UNTIL
UP       UPON       WE       WELL     WHAT     WHEN     WHENEVER WHERE
WHEREAS  WHEREVER   WHETHER  WHICH    WHILE    WHO      WHOM     WHOSE
WILL     WITH       WITHIN   WITHOUT  WOULD    YOU      


SAMPLE RUN FOLLOWS.   <CR>'S AND ALL INFORMATION ON THE SAME
LINE WITH AND PRECEEDING <CR> EXCEPT PROMPTING ARE ENTERED BY USER.

THE MASTER FILE IS SAMPLE.MAS.  THE STOP FILE IS KWIC.STP[1,4].  THE
INDEX OUTPUT IT TTY: AND THE FREQUENCY FILE IS KWIC.FRQ.

.R KWIC<CR>


KEY-WORD-IN-CONTEXT  WMU VERSION

MASTER FILE: SAMPLE<CR>
STOP FILE: KWIC.STP[1,4]<CR>
INDEX FILE: TTY:<CR>
FREQUENCY FILE: <CR>
LISTING TITLE
:SAMPLE<CR>

STOP LIST 1K CORE USED

MASTER FILE 1K CORE USED
KEY-WORD-IN-CONTEXT  WMU VERSION 2 17:01 14-AUG-78          PAGE 1
KWIC INDEX---SAMPLE


NUMBERS IN BRACKETS ARE IGNORED AS  AREWORDS AFTER            [123
                        NUMBERS IN  BRACKETS ARE IGNORED AS [123
                 MULTIPLE LINESARE  ENDED WITH AN          [3
           NUMBERS IN BRACKETS ARE  IGNORED AS ARE WORDS AFTE[123
                          MULTIPLE  LINESARE ENDED WITH AN [3
                                    MULTIPLE LINESARE ENDED [3
                                    NUMBERS IN BRACKETS ARE [123
                          NOT THIS  ONE THOUGH             [2

INDEX COMPLETE
TOTAL CORE USED 0+2K CORE USED

EXIT


THE FREQUENCY LISTING IN KWIC.FRQ LOOKS LIKE THIS:


KEY-WORD-IN-CONTEXT  WMU VERSION 2 17:01 14-AUG-78          PAGE 1
FREQUENCY LIST---SAMPLE


AREWORDS                                           1
BRACKETS                                           1
ENDED                                              1
IGNORED                                            1
LINESARE                                           1
MULTIPLE                                           1
NUMBERS                                            1
ONE                                                1