Trailing-Edge
-
PDP-10 Archives
-
decuslib20-05
-
decus/20-0137/kwic/kwic.doc
There are 5 other files named kwic.doc in the archive. Click here to see a list.
LIBRARY PROGRAM #3.7.1
CALLING NAME: KWIC
PROGRAMMED BY*
ADAPTED BY: RUSSELL BARR, III
PREPARED BY: RUSSELL R BARR III
APPROVED BY: JACK R. MEAGHER
DATE: DECEMBER, 1973
KEY-WORD-IN-CONTEXT PROGRAM
TABLE OF CONTENTS
1.0 PURPOSE AND EXPLANATION
2.0 LIMITATIONS
3.0 PROGRAM QUESTIONS AND HOW TO ANSWER THEM
4.0 FORMAT OF MASTER FILE AND STOP FILE
5.0 SAMPLE RUN
1.0 PURPOSE AND EXPLANATION
KWIC PRODUCES AN ALPHABETIZED LISTING OF EVERY 'KEY' WORD IN A MASTER
FILE OF TEXT. KEY WORDS ARE THOSE WHICH ARE NOT DEFINED BY THE USER
IN A "STOP" FILE. IN OTHER WORDS THE STOP FILE IS A LIST OF WORDS TO
BE IGNORED BY THE KWIC PROGRAM.
THE LIST OF KEY WORDS IS PRINTED IN CONTEXT, MEANING THAT IT IS PRINTED
SURROUNDED BY THE WORDS IN WHICH IT APPEARS IN THE MASTER FILE. NOTE
THAT IN THE EXAMPLE IN SECTION 5.0 THE KEYWORDS ARE, IN ORDER, READING
DOWN THE CENTER COLUMN: 'ARE', 'BRACKETS', 'ENDED',---ETC.
KWIC ALSO PRODUCES A LIST OF ALL KEYWORDS AND THE NUMBER OF TIMES EACH
APPEARS.
2.0 LIMITATIONS
(1) THE MAXIMUM NUMBER OF CHARACTERS IN A WORD IS 50.
(2) THE MAXIMUM NUMBER OF TIMES A KEYWORD MAY APPEAR IN A
MASTER FILE IS 300.
(3) THE I.D. FIELD IS LIMITED TO 10 CHARACTERS.
(4) THE MAXIMUM NUMBER OF KEYWORDS IN THE MASTER FILE IS
APPROXIMATELY 5000.
(5) THIS PROGRAM USES A LARGE AMOUNT OF COMPUTING TIME FOR
LARGE FILES.
3.0 PROGRAM QUESTIONS AND HOW TO ANSWER THEM
IN THIS SECTION TEXT TYPED OUT BY THE COMPUTER IS ENCLOSED IN QUOTES.
3.1 "MASTER FILE:"
ENTER SPECIFICATIONS DEFINING THE SOURCE OF THE MASTER TEXT.
THE FOLLOWING EXPLANATION OF THE SPECIFICATIONS APPLIES TO
QUESTIONS 3.1-3.4. STATEMENTS ABOUT INPUT APPLY TO QUESTIONS
3.1 AND 3.2. STATEMENTS ABOUT OUTPUT APPLY TO 3.3 AND 3.4.
*WRITTEN BY G.B. MOERSDORF OF OHIO STATE UNIVERSITY. RECEIVED THRU
DIGITAL EQUIPMENT COMPUTER USERS SOCIETY AND MODIFIED FOR WMU BY
RUSSELL BARR III.
THE NORMAL RESPONSE TO EACH OF THESE QUESTIONS CONSISTS OF
THREE BASIC PARTS: A DEVICE, A FILENAME, AND A PROJECT-
PROGRAMMER NUMBER.
THE GENERAL FORMAT FOR THESE PARTS IS AS FOLLOWS:
DEV:FILE.EXT[PROJ,PROG]
1) DEV: ANY OF THE FOLLOWING DEVICES ARE APPROPRIATE WHERE INDICATED:
DEVICE LIST DEFINITION STATEMENT USE
TTY: TERMINAL INPUT OR OUTPUT
DSK: DISK INPUT OR OUTPUT
CDR: CARD READER INPUT ONLY
LPT: LINE PRINTER OUTPUT ONLY
DTA0: DECTAPE 0 INPUT OR OUTPUT
DTA1: DECTAPE 1 INPUT OR OUTPUT
DTA2: DECTAPE 2 INPUT OR OUTPUT
DTA3: DECTAPE 3 INPUT OR OUTPUT
DTA4: DECTAPE 4 INPUT OR OUTPUT
DTA5: DECTAPE 5 INPUT OR OUTPUT
DTA6: DECTAPE 6 INPUT OR OUTPUT
DTA7: DECTAPE 7 INPUT OR OUTPUT
MTA0: MAGNETIC TAPE 0 INPUT OR OUTPUT
MTA1: MAGNETIC TAPE 1 INPUT OR OUTPUT
INPUT MAY NOT BE DONE FROM THE LINE PRINTER NOR MAY OUTPUT GO TO
THE CARD READER.
2) FILE.EXT IS THE NAME AND EXTENSION OF THE FILE TO BE USED. THIS
PART OF THE SPECIFICATION IS USED ONLY IF DISK OR DECTAPE IS USED.
3) [PROJ,PROG] IF A DISK IS USED AND THE USER WISHES TO READ A FILE
IN ANOTHER PERSON'S DIRECTORY, HE MAY DO SO BY SPECIFYING THE
PROJECT-PROGRAMMER NUMBER OF THE DIRECTORY FROM WHICH HE WISHES
TO READ. THE PROJECT NUMBER AND THE PROGRAMMER NUMBER MUST BE
SEPARATED BY A COMMA AND ENCLOSED IN BRACKETS. OUTPUT MUST GO
TO YOUR OWN AREA.
4) IT IS NOT RECOMMENDED THAT CDR: OR LPT: BE USED AS INPUT OR
OUTPUT RESPECTIVELY EXCEPT THRU BATCH.
5) THE MASTER FILE AND THE STOP FILE MAY NOT BE READ FROM THE SAME
DECTAPE AND THE INDEX FILE AND THE FREQUENCY FILE MAY NOT BE
WRITTEN TO THE SAME DECTAPE.
EXAMPLE:
MASTER FILE: DSK:DATA.DAT[71171,71026]
IN THE EXAMPLE, THE MASTER FILE: IS A DISK FILE OF NAME DATA.DAT
IN USER DIRECTORY [71171,71026]
DEFAULTS:
1) IF NO DEVICE IS SPECIFIED BUT A FILENAME IS SPECIFIED THE
DEFAULT DEVICE WILL BE DSK:
2) IF NO FILENAME IS GIVEN, THE DEVICE DSK: WILL BE ASSUMED AND THE
DEFAULT FILENAME "KWIC" IS USED WITH THE FOLLOWING DEFAULT
EXTENSIONS:
'MAS' FOR MASTER FILE
'STP' FOR STOP FILE
'NDX' FOR INDEX FILE
'FRQ' FOR FREQUENCY FILE
3) TO ENTER A FILENAME WITH A NULL EXTENSION ENTER THE FILENAME
FOLLOWED BY A PERIOD AND A SPACE, OTHERWISE THE DEFAULT EXTENSION
WILL BE USED.
4) IF NO PROJECT-PROGRAMMER NUMBER IS GIVEN, THE USER'S OWN NUMBER
WILL BE ASSUMED.
QUESTION 3.2 IS NEXT.
3.2 STOP FILE
ENTER SPECIFICATIONS DEFINING THE SOURCE OF THE STOP WORDS
IN THE SAME MANNER AS QUESTION 3.1. QUESTION 3.3 IS NEXT.
3.3 INDEX FILE:
ENTER SPECIFICATIONS DEFINING THE DESTINATION OF THE INDEX
THE SAME WAY AS QUESTION 3.1. THE NEXT QUESTION IS 3.4.
3.4 FREQUENCY FILE:
ENTER SPECIFICATIONS DEFINING THE DESTINATION OF THE
FREQUENCY LIST IN THE SAME MANNER AS 3.1. QUESTION 3.5 IS NEXT.
3.5 LISTING TITLE
ENTER A TITLE OR HEADER OF UP TO 80 CHARACTERS TO BE USED FOR
THE OUTPUT. TYPE A RETURN IF NO TITLE IS DESIRED. AFTER THE
TITLE IS ENTERED A SERIES OF STATEMENTS WILL BE PRINTED
INDICATING THE PROGRESS AND AMOUNTS OF CORE USED BY THE
PROGRAM. WHEN THE OUTPUT IS COMPLETE, THE PROGRAM WILL EXIT
TO MONITOR.
4.0 FORMAT OF THE MASTER AND STOP FILES
4.1 MASTER FILE
THE MASTER FILE CONSISTS OF FIELDS OF DATA TO BE PROCESSED
BY THE KWIC PROGRAM AND OUTPUT IN INDEX FORM. THIS MAY BE
ANY TYPE OF ALPHANUMERIC DATA.
GENERAL FORMAT OF EACH FIELD:
1) STANDARD SEQUENCE NUMBERS FORM LINED (OPTIONAL)
2) GROUP OF DATA TO BE INDEXED MAY BE CONTINUED
ON ANY NUMBER OF LINES (I.E. CARRIAGE RETURNS
ARE IGNORED). IF NO SPACE OR TAB APPEARS EITHER
AT THE END OF A CONTINUED LINE OR AT THE BEGINNING
OF THE NEXT, THE CHARACTERS AT THE END OF THE
FIRST LINE AND THE BEGINNING OF THE NEXT ARE
CONSIDERED ONE WORD. (SEE LINES [3] AND [123]
IN THE EXAMPLE IN SECTION 5.0).
3) THE DELIMITER FOR THIS FIELD, AN '='
4) AA FIELD OF DATA THAT WILL BE IGNORED (OPTIONAL)
5) THE IDENTIFICATION DELIMITER CHARACTER, A '['
6) THE IDENTIFICATION TO BE ASSOCIATED WITH THE FIELD.
THIS IS A MAXIMUM OF 10 CHARACTERS WITH NO SPACES
OR TABS ALLOWED.
7) THE END OF I.D. DELIMITER, A']'
NOTE 1: A SPACE AND A TAB ARE THE ONLY CHARACTERS THAT DELIMIT A WORD FROM ITS
NEIGHBOR. SEQUENTIAL SPACES OR TABS ARE REDUCED TO ONE SPACE ON
THE INDEX.
NOTE 2: DO NOT USE A STRING OF MORE THAN 5 BLANKS CHARACTERS IN A DATAFILE
NOTE 3: SEQUENCE NUMBER ASSIST IN FINDING ERRORS IN THE MASTER FILE BUT ARE
NOT NECESSARY.
NOTE 4: THE "=" AND THE TEXT BETWEEN THE "=" AND THE "[" DOES NOT APPEAR IN
THE OUTPUT.
NOTE 5: THE "[" AND THE INDENTIFICATION APPEARS AT THE RIGHT MARGIN OF HTE
INDEX.
SEE BEGINNING OF SECTION 5.0 FOR EXAMPLES OF FIELDS TO BE SUBMITTED.
4.2 STOP FILE
THE STOP FILE CONSISTS OF A LIST OF WORDS THAT ARE NOT TO BE USED
AS KEY WORDS FOR INDEXING. THERE MUST BE AT LEAST ONE WORD AND
EVERY WORD MUST BE FOLLOWED BY A CARRIAGE RETURN. THE STOP FILE
IS NOT PRINTED AS PART OF THE OUTPUT IF THE OUTPUT IS TO DEVICE 'TTY:'.
5.0 SAMPLE RUN
THE SAMPLE RUN USES THE FOLLOWING DATA FILE NAMED SAMPLE.MAS.
=THIS LINE IS IGNORED[1]
NOT THIS ONE THOUGH=[2]
MULTIPLE LINES
ARE ENDED WITH AN =[3]
NUMBERS IN BRACKETS ARE IGNORED AS ARE
WORDS AFTER =THE EQUAL SIGN[123]
THE SAMPLE RUN USES THE FOLLOWING STOP FILE NAMED KWIC.STP ON AREA
[1,4]. THIS STOP FILE IS AVAILABLE TO ALL USERS.
NOTE: THIS FILE IS ONE WORD PER LINE.
A ABOUT ABOVE ACROSS AFTER AGAINST ALL ALONG
ALSO ALTHOUGH ALWAYS AMONG AN AND ANOTHER ANY
ARE AROUND AS AT BE BECAUSE BEEN BEFORE
BEHIND BELOW BENEATH BESIDE BETWEEN BEYOND BUT BY
CAN DO DONE DOWN DURING EACH ENOUGH EVER
EXCEPT FOR FOUND FROM GET GETTING HAS HAVE
HER HERS HIM HIS I IF IN INDEED
INSIDE INTO IS IT ITS KNOW LESS LEST
LIKE MAY MORE MUST MY NEAR NEEDS NO
NOT OF OFF ON ONTO OR OUR OVER
SELDOM SHE SINCE SO SOME THAN THAT THE
THEIR THEIRS THEM THEN THESE THEY THIS THOSE
THOUGH THROUGH THUS TO TOWARD UNDER UNLESS UNTIL
UP UPON WE WELL WHAT WHEN WHENEVER WHERE
WHEREAS WHEREVER WHETHER WHICH WHILE WHO WHOM WHOSE
WILL WITH WITHIN WITHOUT WOULD YOU
SAMPLE RUN FOLLOWS. <CR>'S AND ALL INFORMATION ON THE SAME
LINE WITH AND PRECEEDING <CR> EXCEPT PROMPTING ARE ENTERED BY USER.
THE MASTER FILE IS SAMPLE.MAS. THE STOP FILE IS KWIC.STP[1,4]. THE
INDEX OUTPUT IT TTY: AND THE FREQUENCY FILE IS KWIC.FRQ.
.R KWIC<CR>
KEY-WORD-IN-CONTEXT WMU VERSION
MASTER FILE: SAMPLE<CR>
STOP FILE: KWIC.STP[1,4]<CR>
INDEX FILE: TTY:<CR>
FREQUENCY FILE: <CR>
LISTING TITLE
:SAMPLE<CR>
STOP LIST 1K CORE USED
MASTER FILE 1K CORE USED
KEY-WORD-IN-CONTEXT WMU VERSION 2 17:01 14-AUG-78 PAGE 1
KWIC INDEX---SAMPLE
NUMBERS IN BRACKETS ARE IGNORED AS AREWORDS AFTER [123
NUMBERS IN BRACKETS ARE IGNORED AS [123
MULTIPLE LINESARE ENDED WITH AN [3
NUMBERS IN BRACKETS ARE IGNORED AS ARE WORDS AFTE[123
MULTIPLE LINESARE ENDED WITH AN [3
MULTIPLE LINESARE ENDED [3
NUMBERS IN BRACKETS ARE [123
NOT THIS ONE THOUGH [2
INDEX COMPLETE
TOTAL CORE USED 0+2K CORE USED
EXIT
THE FREQUENCY LISTING IN KWIC.FRQ LOOKS LIKE THIS:
KEY-WORD-IN-CONTEXT WMU VERSION 2 17:01 14-AUG-78 PAGE 1
FREQUENCY LIST---SAMPLE
AREWORDS 1
BRACKETS 1
ENDED 1
IGNORED 1
LINESARE 1
MULTIPLE 1
NUMBERS 1
ONE 1