Trailing-Edge
-
PDP-10 Archives
-
decuslib20-01
-
decus/20-0002/sail.tut
There is 1 other file named sail.tut in the archive. Click here to see a list.
SAIL TUTORIAL
Nancy W. Smith
SUMEX Computer Project
August, 1976
Note: This is a computer-readible copy of the SAIL Tutorial
published by the Stanford Artificial Intelligence Project.
The original publication is the preferable form.
T A B L E O F C O N T E N T S
SECTION PAGE
1 Introduction 1
2 The ALGOL-Part of Sail 3
1 Blocks 3
2 Declarations 4
3 Statements 7
4 Expressions 15
5 Scope of Blocks 21
6 More Control Statements 24
7 Procedures 30
3 Macros 38
4 String Scanning 42
5 Input/Output 46
1 Simple Terminal I/O 46
2 Notes on Terminal I/O for TENEX Sail Only 46
3 Setting Up a Channel for I/O 47
4 Input from a File 58
5 Output to a File 60
6 Records 61
1 Declaring and Creating Records 61
2 Accessing Fields of Records 62
3 Linking Records Together 63
7 Conditional Compilation 67
8 Systems Building in Sail 69
1 The Load Module 70
2 Source Files 72
3 Macros and Conditional Compilation 73
APPENDIX A: Sail and ALGOL W Comparison 74
REFERENCES 77
INDEX 78
SECTION 1
Introduction
The Sail manual [1] is a reference manual containing complete
information on Sail but may be difficult for a new user of the language
*
to work with. The purpose of this TUTORIAL is to introduce new users
to the language. It does not deal in depth with advanced features like
the LEAP portion of Sail; and uses pointers to the relevant portions of
the manual for some descriptions. Following the pointers and reading
specific portions of the manual will help you to develop some
familiarity with the manual. After you have gained some Sail
programming experience, it will be worthwhile to browse through the
complete reference manual to find a variety of more advanced structures
which are not covered in the TUTORIAL but may be useful in your
particular programming tasks. The Sail manual also covers use of the
BAIL debugger for Sail.
The TUTORIAL is not at an appropriate level for a computer novice. The
following assumptions are made about the background of the reader:
1) Some experience with the PDP-10 including knowledge of an
editor, understanding of the file system, and familiarity with
routine utility programs and system commands. If you are a new
user or have previous experience only on a non-timesharing system,
you should read the TENEX EXEC MANUAL [7] (for TENEX systems) or
the DEC USERS HANDBOOK [6] (for standard TOPS-10 systems) or the
MONITOR MANUAL [3] and UUO MANUAL [2] (for Stanford AI Lab
users). In addition, you might want to glance through and keep
ready for reference: the TENEX JSYS MANUAL [8] and/or the DEC
ASSEMBLY LANGUAGE HANDBOOK [5]. Also, each PDP-10 system usually
has its own introductory material for new users describing the
operation of the system.
2) Some experience with a programming language--probably
FORTRAN, ALGOL or an assembly language. If you have no
programming experience, you may need help getting started even
with this TUTORIAL. Sail is based on ALGOL so the general
concepts and most of the actual statements are the same in what is
often called the "ALGOL part" of Sail. The major additions to
Sail are its input/output routines. Appendix A contains a list of
the differences between the ALGOL W syntax and Sail.
Programs written in standard Sail (which will henceforth be called TOPS-
10 Sail) will usually run on a TENEX system through the emulator
(PA1050) which simulates the TOPS-10 UUO's, but such use is quite
inefficient. Sail also has a version for TENEX systems which we refer
to as TENEX Sail. (The new TOPS-20 system is very similar to TENEX;
either TENEX Sail or a new Sail version should be running on TOPS-20
shortly.) Note that the Sail compiler on your system will be called
simply Sail but will in fact be either the TENEX Sail or TOPS-10 Sail
version of the compiler. Aside from implementation differences which
will not be discussed here, the language differences are mainly in the
input/output (I/O) routines. And of course the system level commands to
compile, load, and run a finished program differ slightly in the TENEX
and TOPS-10 systems.
___________
*
I would like to thank Robert Smith for editing the final version;
and Scott Daniels for his contributions to the RECORD section. John
Reiser, Les Earnest, Russ Taylor, Marney Beard, and Mike Hinckley all
made valuable suggestions.
SECTION 2
The ALGOL-Part of Sail
2.1 Blocks
Sail is a block-structured language. Each block has the form:
BEGIN
<declarations>
.
.
<statements>
.
.
END
Your entire program will be a block with the above format. This program
block is a somewhat special block called the outer block. BEGIN and END
are reserved words in Sail that mark the beginning and end of blocks,
with the outermost BEGIN/END pair also marking the beginning and end of
your program. (Reserved words are words that automatically mean
something to Sail; they are called "reserved" because you should not try
to give them your own meaning.)
Declarations are used to give the compiler information about the data
structures that you will be using so that the compiler can set up
storage locations of the proper types and associate the desired name
with each location.
Statements form the bulk of your program. They are the actual commands
available in Sail to use for coding the task at hand.
All declarations in each block must precede all statements in that
block. Here is a very simple one-block program that outputs the square
root of 5:
BEGIN
DECLARATIONS ==> INTEGER i;
REAL x;
STATEMENTS ==> i _ 5;
x _ SQRT(i);
PRINT("SQUARE ROOT OF ", i,
" IS ", x);
END
which will print out on the terminal:
SQUARE ROOT OF 5 IS 2.236068 .
2.2 Declarations
A list of all the kinds of declarations is given in the Sail manual
(Sec. 2.1). In this section we will cover type declarations and array
declarations. Procedure declarations will be discussed in Section
2.7. Consult the Sail manual for details on all of the other
varieties of declarations listed.
2.2.1 Type Declarations
The purpose of type declarations is to tell the compiler what it needs
to know to set up the storage locations for your data. There are four
data types available in the ALGOL portion of Sail:
1) INTEGERs are counting numbers like -1, 0, 1, 2, 3, etc.
(Note that commas cannot be used in numbers, e.g., 15724 not
15,724.)
2) REALs are decimal numbers like -1.2, 3.14159, 100087.2,
etc.
3) BOOLEANs are assigned the values TRUE or FALSE (which are
reserved words). These are predefined for you in Sail (TRUE = -1
and FALSE = 0).
4) STRINGs are a data type not found in all programming
languages. Very often what you will be working with are not
numbers at all but text. Your program may need to output text to
the user's terminal while he/she is running the program. It may
ask the user questions and input text which is the answer to the
question. It may in fact process whole files of text. One simple
example of this is a program which works with a file containing a
list of words and outputs to a new file the same list of words in
alphabetical order. It is possible to do these things in
languages with only the integer and real data types but very
clumsy. Text has certain properties different from those of
numbers. For example, it is very useful to be able to point to
certain of the characters in the text and work with just those
temporarily or to take one letter off of the text at a time and
process it. Sail has the data type STRING for holding "strings"
of text characters. And associated with the STRING data type are
string operations that work in a way analogous to how the numeric
operators (+,-,*, etc.) work with the numeric data types. We
write the actual strings enclosed in quotation marks. Any of the
characters in the ASCII character set can be used in strings
(control characters, letters, numerals, punctuation marks). Some
examples of strings are:
"OUTPUT FILE= "
"HELP"
"Please type your name."
"aardvark"
"0123456789"
"!""#$%&"
"AaBbCcDdEeFf"
"" (the empty string)
NULL (also the empty string)
Upper and lowercase letters are not equivalent in strings, i.e.,
"a" is a different string than "A". (Note that to put a " in a
string, you use "", e.g., "quote a ""word""".)
In your programs, you will have both variables and constants. We have
already given some examples of constants in each of the data types.
REAL and INTEGER constants are just numbers as you usually see them
written (2, 618, -4.35, etc.); the BOOLEAN constants are TRUE and FALSE;
and STRING constants are a sequence of text characters enclosed in
double quotes (and NULL for the empty string).
Variables are used rather than constants when you know that a value will
be needed in the given computation but do not know in advance what the
exact value will be. For example, you may want to add 4 numbers, but
the numbers will be specified by the user at runtime or taken from a
data file. Or the numbers may be the results of previous computations.
You might be computing weekly totals and then when you have the results
for each week adding the four weeks together for a monthly total. So
instead of an expression like 2 + 31 + 25 + 5 you need an expression
like X + Y + Z + W or WEEK1 + WEEK2 + WEEK3 + WEEK4. This is done by
declaring (through a declaration) that you will need a variable of a
certain data type with a specified name. The compiler will set up a
storage location of the proper type and enter the name and location in
its symbol table. Each time that you have an intermediate result which
needs to be stored, you must set up the storage location in advance.
When we discuss the various statements available, you will see how
values are input from the user or from a file or saved from a
computation and stored in the appropriate location. The names for these
variables are often referred to as their identifiers. Identifiers can
be as long (or short) as you want. However, if you will be debugging
with DDT or using TOPS-10 programs such as the CREF cross-referencing
program, you should make your identifiers unique to the first six
characters, i.e., DDT can distinguish LONGSYMBOL from LONGNAME but not
from LONGSYNONYM because the first 6 characters are the same.
Identifiers must begin with a letter but following that can be made up
of any sequence of letters and numbers. The characters ! and $ are
considered to be letters. Certain reserved words and predeclared
identifiers are unavailable for use as names of your own identifiers. A
list of these is given in the Sail manual in Appendices B and C.
Typical declarations are:
INTEGER i,j,k;
REAL x,y,z;
STRING s,t;
where these are the letters conventionally used as identifiers of the
various types. There is no reason why you couldn't have INTEGER x; REAL
i; except that other people reading your program might be confused. In
some languages the letter used for the variable automatically tells its
type. This is not true in Sail. The type of the variable is
established by the declaration. In general, simple one-letter
identifiers like these are used for simple, straightforward and usually
temporary purposes such as to count an iteration. (ALGOL W users note
that iteration variables must be declared in Sail.)
Most of the variables in your program will be declared and used for a
specific purpose and the name you specify should reflect the use of the
variable.
INTEGER nextWord, page!count;
REAL total, subTotal;
STRING lastname, firstname;
BOOLEAN partial, abortSwitch, outputsw;
Both upper and lowercase letters are equivalent in identifiers and so
the case as well as the use of ! and $ can contribute to the readability
of your programs. Of course, the above examples contain a mixture of
styles; you will want to choose some style that looks best to you and
use it consistently. The equivalence of upper and lowercase also means
that
TOTAL | total | Total | toTal | etc.
are all instances of the same identifier. So that while it is desirable
to be consistent, forgetting occasionally doesn't hurt anything.
Some programmers use uppercase for the standard words like BEGIN,
INTEGER, END, etc. and lowercase for their identifiers. Others reverse
this. Another approach is uppercase for actual program code and
lowercase for comments. It is important to develop some style which you
feel makes your programs as easy to read as possible.
Another important element of program clarity is the format. The Sail
compiler is free format which means that blank lines, indentations,
extra spaces, etc. are ignored. Your whole program could be on one line
and the compiler wouldn't know the difference. (Lines should be less
than 250 characters if a listing is being made using the compiler
listing options.) But programs usually have each statement and
declaration on a separate line with all lines of each block indented the
same number of spaces. Some programmers put BEGIN and END on lines by
themselves and others put them on the closest line of code. It is very
important to format your programs so that they are easy to read.
2.2.2 Array Declarations
An array is a data structure designed to let you deal with a group of
variables together. For example, if you were accumulating weekly totals
over a period of a year, it would be cumbersome to declare:
REAL week1, week2, week3,.....,week52 ;
and then have to work with the 52 variables each having a separate name.
Instead you can declare:
REAL ARRAY weeks [1:52] ;
The array declaration consists of one of the data type words (REAL,
INTEGER, BOOLEAN, STRING) followed by the word ARRAY followed by the
identifier followed by the dimensions of the array enclosed in [ ]'s.
The dimensions give the bounds of the array. The lower bound does not
need to be 1. Another common value for the lower bound is 0, but you
may make it anything you like. (The LOADER will have difficulties if
the lower bound is a number of large positive or negative magnitude.)
You may declare more than one array in the same declaration provided
they are the same type and have the same dimensions. For example, one
array might be used for the total employee salary paid in the week which
will be a real number, but you might also need to record the total
employee hours worked and the total profit made (one integer and one
real value) so you could declare:
INTEGER ARRAY hours [1:52];
REAL ARRAY salaries, profits [1:52];
These 3 arrays are examples of parallel arrays.
It is also possible to have multi-dimensioned arrays. A common example
is an array used to represent a chessboard:
INTEGER ARRAY chessboard [1:8,1:8];
1,1 1,2 1,3 1,4 1,5 1,6 1,7 1,8
2,1 2,2 2,3 2,4 2,5 2,6 2,7 2,8
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
8,1 8,2 8,3 8,4 8,5 8,6 8,7 8,8
In fact even the terminology used is the same. Arrays, like matrices
and chessboards, have rows (across) and columns (up-and-down). Arrays
which are statically allocated (all outer block and OWN arrays) may have
at most 5 dimensions. Arrays which are allocated dynamically may have
any number of dimensions.
Each element of the array is a separate variable and can be used
anywhere that a simple variable can be used. We refer to the elements
by giving the name of the array followed by the particular coordinates
(called the subscripts) of the given element enclosed in []'s, for
example: weeks[34], weeks[27], chessboard[2,5], and chessboard[8,8].
2.3 Statements
All of the statements available in Sail are listed in the Sail manual
(Sec. 1.1 with the syntax for the statements in Sec. 3.1). For now, we
will discuss the assignment statement, the PRINT statement, and the
IF...THEN statement which will allow us to give some sample programs.
2.3.1 Assignment Statement
Assignment statements are used to assign values to variables:
variable _ expression
The variable being assigned to and the expression whose value is being
assigned to it are separated by the character which is a backwards arrow
in 1965 ASCII (and Stanford ASCII) and is an underbar (underlining
character) in 1968 ASCII. The assignment statement is often read as:
variable becomes expression
OR variable is assigned the value of expression
OR variable gets expression
You may assign values to any of the four types of variables (INTEGER,
REAL, BOOLEAN, STRING) or to the individual variables in arrays.
Essentially, an expression is something that has a value. An expression
is not a statement (although we will see later that some of the
constructions of the language can be either statements or expressions
depending on the current use). It is most important to remember that an
expression can be evaluated. It is a symbol or sequence of symbols that
when evaluated produces a value that can be assigned, used in a
computation, tested (e.g. for equality with another value), etc. An
expression may be
a) a constant
b) a variable
c) a construction using constants, variables, and the various
operators on them.
Examples of these 3 types of expressions in assignment statements are:
DON'T FORGET TO DECLARE VARIABLES FIRST!
INTEGER i,j;
REAL x,y;
STRING s,t;
BOOLEAN isw,osw,iosw;
INTEGER ARRAY arry [1:10];
a) i _ 2; COMMENT now i = 2;
x _ 2.4; COMMENT now x = 2.4;
s _ "abc"; COMMENT now EQU(s,"abc");
isw _ TRUE; COMMENT now isw = TRUE;
osw _ FALSE; COMMENT now osw = FALSE;
arry[4] _ 22; COMMENT now arry[4] = 22;
b) j _ i; COMMENT now i = j = 2;
y _ x; COMMENT now x = y = 2.4;
t _ s; COMMENT now EQU(s,"abc")
AND EQU(t,"abc");
arry[8] _ j; COMMENT i=j=arry[8]=2;
c) i _ j + 4; COMMENT j = 2 AND i = 6;
x _ 2y - i; COMMENT y=2.4 AND i=6
AND x = -1.2;
arry[3] _ i/j; COMMENT i=6 AND j=2
AND arry[3]=3;
iosw _ isw OR osw; COMMENT isw = TRUE
AND osw = FALSE
AND iosw = TRUE;
NOTE1: Most of the operators for strings are different than those
for the arithmetic variables. The difference between = and
EQU will be covered later.
NOTE2: Logical operators such as AND and OR are also available
for boolean expressions.
NOTE3: You may put "comments" anywhere in your program by using
the word COMMENT followed by the text of your comment and
ended with a semi-colon (no semi-colons can appear within the
comment). Generally comments are placed between declarations
or statements rather than inside of them.
NOTE4: In all our examples, you will see that the declarations
and statements are separated by semi-colons.
In a later section, we will discuss: 1) type conversion which occurs
when the data types of the variable and the expression are not the same,
2) the order of evaluation in the expression, and 3) many more
complicated expressions including string expressions (first we need to
know more of the string operators).
2.3.2 PRINT Statement
PRINT is a relatively new but very useful statement in Sail. It is used
for outputting to the user's terminal. You can give it as many
arguments as you want and the arguments may be of any type. PRINT first
converts each argument to a string if necessary and then outputs it.
Remember that only strings can be printed anywhere. Numbers are stored
internally as 36-bit words and when they are output in 7-bit bytes for
text the results are very strange. Fortunately PRINT does the
conversion to strings for you automatically, e.g., the number 237 is
printed as the string "237". The format of the PRINT statement is the
word PRINT followed by a list of arguments separated by commas with the
entire list enclosed in parentheses. Each argument may be any constant,
variable, or complex expression. For example, if you wanted to output
the weekly salary totals from a previous example and the number of the
current week was stored in INTEGER curWeek, you might use:
PRINT("WEEK ", curWeek,
": Salaries ", salaries[curWeek]);
which for curWeek = 28 and the array element salaries[28] = 27543.82
would print out:
WEEK 28: Salaries 27543.82
NOTE: The printing format for reals (number of leading zeroes
printed and places after the decimal point) is discussed in
the Sail manual under type conversions.
2.3.3 Built-in Procedures
Using just the assignment statement, the PRINT statement, and three
built-in procedures, we can write a sample program. Procedures are a
very important feature of Sail and you will be writing many of your own.
The details of procedure writing and use will be covered in Section
2.7. Without giving any details now, we will just say that some
procedures to handle very common tasks have been written for you and are
available as built-in procedures. The SQRT, INCHWL and CVD procedures
that we will be using here are all procedures which return values.
Examples are:
s _ INCHWL;
i _ CVD(s);
x _ 2 + SQRT(i);
Procedures may have any number of arguments (or none). SQRT and CVS
have a single argument and INCHWL has no arguments (but does return a
value). The procedure call is made by writing the procedure name
followed by the argument(s) in parentheses. In the expression in which
it is used, the procedure call is equivalent to the value that it
returns.
SQRT returns the square root of its argument.
CVD returns the result of converting its string argument to an
integer. The string is assumed to contain a number in decimal
representation--CVO converts strings containing octal numbers,
e.g., after executing
i _ CVD("14724"); j _ CVO("14724");
then the following
i = 14724 AND j = 6612
would be true.
INCHWL returns the next line of typing from the user at the
controlling terminal.
NOTE: In TENEX-Sail the INTTY procedure is available and SHOULD be
used in preference to the INCHWL procedure for inputting
lines. This may not be mentioned in every example, but is
very important for TENEX users to remember.
So, for the statement s INCHWL; , the value of INCHWL will be the line
typed at the terminal (minus the terminator which is usually carriage
return). This value is a string and is assigned here to the string
variable s.
So far we have seen five uses of expressions: as the right-hand-side of
the assignment statement, as an actual parameter or argument in a
procedure call, as an argument to the PRINT statement, for giving the
bounds in an array declaration (except for arrays declared in the outer
block which must have constant bounds), and for the array subscripts for
the elements of arrays. In fact the whole range of kinds of expressions
can be used in nearly all the places that constants and variables (which
are particular kinds of expressions) can be used. Two exceptions to
this that we have already seen are 1) the left-hand-side of the
assignment statement (you can assign a value to a variable but not to a
constant or a more complicated expression) and 2) the array bounds for
outer block arrays which come at a point in the program before any
assignments have been made to any of the variables so only constants may
be used--the declarations in the outer block are before any program
statements at all.
In general, any construction that makes sense to you is probably legal
in Sail. By using some of the more complicated expressions, you can
save yourself steps in your program. For example,
BEGIN
REAL sqroot;
INTEGER numb;
STRING reply;
PRINT("Type number: ");
reply_INCHWL;
numb_CVD(reply);
sqroot_SQRT(numb);
PRINT("ANS: ",sqroot);
END;
can be shortened by several steps. First, we can combine INCHWL with
CVD:
numb _ CVD (INCHWL);
and eliminate the declaration of the STRING reply. Next we can
eliminate numb and take the SQRT directly:
sqroot _ SQRT (CVD(INCHWL));
At first you might think that we could go a step further to
PRINT ("ANS: ",SQRT(CVD(INCHWL)));
and we could as far as the Sail syntax is concerned but it would produce
a bug in our program. We would be printing out "ANS: " right after
"Type number: " before the user would have time to even start typing.
But we have considerably simplified our program to:
BEGIN
REAL sqroot;
PRINT ("Type number: ");
sqroot _ SQRT (CVD(INCHWL));
PRINT ("ANS: ",sqroot);
END;
Remember that intermediate results do not need to be stored unless you
will need them again later for something else. By not storing results
unnecessarily, you save the extra assignment statement and the storage
space by not needing to declare a variable for temporary storage.
2.3.4 IF...THEN Statement
The previous example included no error checking. There are several
fundamental programming tasks that cannot be handled with just the
assignment and PRINT statements such as 1) conditional tasks like
checking the value of a number (is it negative?) and taking action
according to the result of the test and 2) looping or iterative tasks so
that we could go back to the beginning and ask the user for another
number to be processed. These sorts of functions are performed by a
group of statements called control statements. In this section we will
cover the IF..THEN statement for conditionals. More advanced control
statements will be discussed in Section 2.6.
There are two kinds of IF...THEN statements:
IF boolean expression THEN statement
IF boolean expression THEN statement
ELSE statement
A boolean expression is an expression whose value is either true or
false. A wide variety of expressions can effectively be used in this
position. Any arithmetic expression can be a boolean; if its value = 0
then it is FALSE. For any other value, it is TRUE. For now we will
just consider the following three cases:
1) BOOLEAN variables (where errorsw, base8, and miniVersion
are declared as BOOLEANs):
IF errorsw THEN
PRINT("There's been an error.") ;
IF base8 THEN digits "01234567"
ELSE digits _ "0123456789" ;
IF miniVersion THEN counter 10
ELSE counter _ 100;
2) Expressions with relational operators such as EQU, =, <,
>, LEQ, NEQ, and GEQ:
IF x < currentSmallest THEN
currentSmallest _ x;
IF divisor NEQ 0 THEN
quotient_dividend/divisor;
IF i GEQ 0 THEN ii+1 ELSE ii-1;
3) Complex expressions formed with the logical operators
AND, OR, and NOT:
IF NOT errorsw THEN
answers[counter] _ quotient;
IF x<0 OR y<0 THEN
PRINT("Negative numbers not allowed.")
ELSE z _ SQRT(x)+SQRT(y);
In the IF..THEN statement, the boolean expression is evaluated. If it
is true then the statement following the THEN is executed. If the
boolean expression is false and the particular statement has no ELSE
part then nothing is done. If the boolean is false and there is an ELSE
part then the statement following the ELSE will be executed.
BEGIN BOOLEAN bool; INTEGER i,j;
bool_TRUE; i_1; j_1;
IF bool THEN i_i+1; COMMENT i=2 AND j=1;
IF bool THEN i_i+1 ELSE j_j+1;
COMMENT i=3 AND j=1;
bool_false;
IF bool THEN i_i+1; COMMENT i=3 AND j=1;
IF bool THEN i_i+1 ELSE j_j+1;
COMMENT i=3 AND j=2;
END;
It is VERY IMPORTANT to note that NO semi-colon appears between the
statement and the ELSE. Semi-colons are used a) to separate
declarations from each other, b) to separate the final declaration from
the first statement in the block, c) to separate statements from each
other, and d) to mark the end of a comment. The key point to note is
that semi-colons are used to separate and NOT to terminate. In some
cases it doesn't hurt to put a semi-colon where it is not needed. For
example, no semi-colon is needed at the end of the program but it
doesn't hurt. However, the format
IF expression THEN statement ; ELSE statement ;
makes it difficult for the compiler to understand your code. The first
semi-colon marks the end of what could be a legitimate IF...THEN
statement and it will be taken as such. Then the compiler is faced with
ELSE statement ;
which is meaningless and will produce an error message.
The following is a part of a sample program which uses several IF...THEN
statements:
BEGIN BOOLEAN verbosesw; STRING reply;
PRINT("Verbose mode? (Type Y or N): ");
reply _ INCHWL; COMMENT INTTY for TENEX;
IF reply="Y" OR reply="y" THEN verbosesw _ TRUE
ELSE
IF reply="N" OR reply="n" THEN verbosesw_FALSE;
IF verbosesw THEN PRINT("-long msg-")
ELSE PRINT("-short msg-");
COMMENT now all our messages printed out to
terminal will be conditional on verbosesw;
END;
There are two interesting points to note about this sample program.
First is the use of = rather than EQU to check the user's reply. EQU is
used to check the equality of variables of type STRING and = is used to
check the equality of variables of type INTEGER or REAL. If we were
asking the user for a full word answer like "yes" or "no" instead of the
single character then we would need the EQU to check what the input
string was. However, in this case where we only have a single
character, we can use the fact that when a string (either a string
variable or a string constant) is put someplace in a program where an
integer is expected then Sail automatically converts to the integer
which is the ASCII code for the FIRST character in the string. For
example, in the environment
STRING str; str _ "A";
all of the following are true:
"A" = str = 65 = '101
"A" NEQ "a"
str NEQ "a"
str + 1 = "A" + 1 = '102 = "B"
str = "Aardvark"
NOT EQU(str,"Aardvark")
('101 is an octal integer constant.)
When you are dealing with single character strings (or are only
interested in the first character of a string) then you can treat them
like integers and use the arithmetic operators like the = operator
rather than EQU. In general (over 90% of the time), EQU is slower.
A second point to note in the above IF...THEN example is the use of a
nested IF...THEN. The statements following the THEN and the ELSE may be
any kind of statement including another IF..THEN statement. For
example,
IF upperOnly THEN letters _ "ABC"
ELSE IF lowerOnly THEN letters _ "abc"
ELSE letters _ "ABCabc";
This is a very common construction when you have a small list of
possibilities to check for. (Note: if there are a large number of cases
to be checked use the CASE statement instead.) The nested
IF..THEN..ELSE statements save a lot of processing if used properly.
For example, without the nesting this would be:
IF upperOnly THEN letters _ "ABC";
IF lowerOnly THEN letters _ "abc";
IF NOT upperOnly AND NOT lowerOnly THEN
letters _ "ABCabc";
Regardless of the values of upperOnly and lowerOnly, the boolean
expressions in the three IF..THEN statements need to be checked. In the
nested version, if upperOnly is TRUE then lowerOnly will never be
checked. For greatest efficiency, the most likely case should be the
first one tested in a nested IF...THEN statement. If that likely case
is true, no further testing will be done.
To avoid ambiguity in parsing the nested IF..THEN..ELSE construction,
the following rule is used: Each ELSE matches up with the last
unmatched THEN. So that
IF exp1 THEN IF exp2 THEN s1 ELSE s2 ;
will group the ELSE with the second THEN which is equivalent to
IF exp1 THEN
BEGIN
IF exp2 THEN s1 ELSE s2;
END;
and also equivalent to
IF exp1 AND exp2 THEN s1;
IF exp1 AND NOT exp2 THEN s2; .
You can change the structure with BEGIN/END to:
IF exp1 THEN
BEGIN
IF exp2 THEN s1
END ELSE s2 ;
which is equivalent to
IF exp1 AND exp2 THEN s1;
IF NOT exp1 THEN s2;
There is another common use of BEGIN/END in IF..THEN statements. All
the examples so far have shown a single simple statement to be executed.
In fact, you often will have a variety of tasks to perform based on the
condition tested for. For example, before you make an entry into an
array, you may want to check that you are within the array bounds and if
so then both make the entry and increment the pointer so that it will be
ready for the next entry:
IF pointer LEQ max THEN
BEGIN
data[pointer] _ newEntry;
pointer_pointer + 1;
END
ELSE PRINT("Array DATA is already full.");
Here we see the use of a compound statement. Compound statements are
exactly like blocks except that they have no declarations. It would
also be perfectly acceptable to use a block with declarations where the
compound statement is used here. In fact both blocks and compound
statements ARE statements and can be used ANY place that a simple
statement can be used. All of the statements between BEGIN and END are
executed as a unit (unless one of the statements itself causes the flow
of execution to be changed).
2.4 Expressions
We have already seen many of the operators used in expressions.
Sections 4 and 8 of the Sail manual cover the operators, the order of
evaluation of expressions, and type conversions. Appendix 1 of the
manual gives the word equivalents for the single character operators,
e.g., LEQ for the less-than-or-equal-to sign, which are not available
except at SU-AI. You should read these sections especially for a
complete list of the arithmetic and boolean operators available (the
string operators will be covered shortly in this TUTORIAL). A short
discussion of type conversion will be given later in this section but
you should also read these sections in the Sail manual for complete
details on type conversions.
There are three kinds of expressions that we have not used yet:
assignment, conditional, and case expressions. These are much like the
statements of the same names.
2.4.1 Assignment Expressions
Anywhere that you can have an expression, you may at the same time make
an assignment. The value will be used as the value of the expression
and also assigned to the given variable. For example:
IF (reply_INCHWL) = "?" THEN ....
COMMENT inputs reply and makes first test
on it in single step;
IF (counter_counter+1) > maxEntry THEN ....
COMMENT updates counter and checks it for
overflow in one step;
counter_ptr_nextloc_0;
COMMENT initializes several variables to 0
in one statement;
arry[ptr_ptr+1] _ newEntry ;
COMMENT updates ptr & fills next array
slot in single step;
Note that the assignment operator has low precedence and so you will
often need to use parenthesizing to get the proper order of evaluation.
This is an area where many coding errors commonly occur.
IF i_j OR boole THEN ....
is parsed like
IF i_(j OR boole) THEN ....
rather than
IF (i_j) OR boole THEN ....
See the sections in the Sail manual referenced above for a more complete
discussion of the order of evaluation in expressions. In general it is
the normal order for the arithmetic operators; then the logical
operators AND and OR (so that OR has the lowest precedence of any
operator except the assignment operator); and left to right order is
used for two operators at the same level (but the manual gives examples
of exceptions). You can use parentheses anywhere to specify the order
that you want. As an example of the effect of left-to-right evaluation,
note that
indexer_2;
arry[indexer]_(indexer_indexer+1);
will put the value 3 in arry[2], since the destination is evaluated
before indexer is incremented.
A word of caution is needed about assignment expressions. Make sure if
you put an ordinary assignment in an expression that that expression is
in a position where it will ALWAYS be evaluated. Of course,
IF i<j THEN i_i+1;
will not always increment i but this is the intended result. However,
the following is unintended and incorrect:
IF verbosesw THEN
PRINT("The square root of ",numb," is ",
sqroot_SQRT(numb)," .")
ELSE PRINT(sqroot) ;
If verbosesw = FALSE, the THEN portion is not executed and the
assignment to sqroot is not made. Thus sqroot will not have the
appropriate value when it is PRINTed. Assigning the result of a
computation to a variable to save recomputing it is an excellent
practice but be careful where you put the assignment.
Another very bad place for assignment expressions is following either
the AND or OR logical operators. The compiler handles these by
performing as little evaluation as possible so in
exp1 OR exp2
the compiler will first evaluate exp1 and if it is TRUE then the
compiler knows that the entire boolean expression is true and doesn't
bother to evaluate exp2. Any assignments in exp2 will not be made since
exp2 is not evaluated. (Of course, if exp1 is FALSE then exp2 will be
evaluated.) Similarly for
exp1 AND exp2
if exp1 is FALSE then the compiler knows the whole AND-expression is
FALSE and doesn't bother evaluating exp2.
As with nested IF...THEN...ELSE statements, it is a good coding practice
to choose the order of the expressions carefully to save processing.
The most likely expression should be first in an OR expression and the
least likely first in an AND expression.
2.4.2 Conditional Expressions
Conditionals can also be used in expressions. These have a more rigid
structure than conditional statements. It must be
IF boolean expression THEN exp1 ELSE exp2
where the ELSE is not optional.
N. B. The type of a conditional expression is the type of exp1. If
exp2 is evaluated, it will be converted to the type of exp1. (At
compile time it is not known which will be used so an arbitrary decision
is made by always using the type of exp1.) Thus the statement,
xIF flag THEN 2 ELSE y; , will always assign an INTEGER to x. If x and
y are REALs then y is converted to INTEGER and then converted to REAL
for the assignment to x. XIF flag THEN 2 ELSE 3.5; will assign either
2.0 or 3.0 to x (assuming x is REAL). Examples are:
REAL ARRAY results
[1:IF miniversion THEN 10 ELSE 100];
PRINT (IF found THEN words[i]
ELSE "Word not found.");
COMMENT words[i] must be a string;
profit _ IF (net _ income-cost) > 0 THEN net
ELSE 0;
These conditional expressions will often need to be parenthesized.
2.4.3 CASE Expressions
CASE statements are described in Section 2.6.4 below. CASE
expressions are also allowed with the format:
CASE integer OF (exp0,exp1,...,expN)
where the first case is always 0. This takes the value you give which
must be an integer between 0 and N and uses the corresponding expression
from the list. A frequent use is for error handling where each error is
assigned a number and the number of the current error is put in a
variable. Then a statement like the following can be used to print the
proper error message:
PRINT(CASE errno OF
("Zero division attempted",
"No negative numbers allowed",
"Input not a number"));
Remember that errno here must range from 0 to 2; otherwise, a case
overflow occurs.
2.4.4 String Operators
The STRING operators are:
EQU Test for string equality:
s_"ABC"; t_"abc"; test_EQU(s,t);
RESULT: test = FALSE .
& Concatenate two strings together:
s_"abc"; t_"def"; u_s&t;
RESULT: EQU(u,"abcdef") = TRUE .
LENGTH Returns the length of a string:
s_"abc"; i_LENGTH(s);
RESULT: i = 3 .
LOP Removes the first char in a string
and returns it:
s_"abc"; t_LOP(s);
RESULT: (EQU(s,"bc") AND
EQU(t,"a")) = TRUE .
Although LENGTH and LOP look like procedures syntactially, they actually
compile code "in-line". This means that they compile very fast code.
However, one unfortunate side-effect is that LOP cannot be used as a
statement, i.e., you cannot say LOP(s); if you just want to throw away
the first character of the string. You must always either use or assign
the character returned by LOP even if you don't want it for anything,
e.g., junkLOP(s); . Another point to note about LOP is that it
actually removes the character from the original string. If you will
need the intact string again, you should make a copy of it before you
start LOP'ing, e.g., tempCopys; .
A little background on the implementation of strings should help you to
use them more efficiently. Inefficient use of strings can be a
significant inefficiency in your programs. Sail sets up an area of
memory called string space where all the actual strings are stored. The
runtime system increases the size of this area dynamically as it begins
to become full. The runtime system also performs garbage collections to
retrieve space taken by strings that are no longer needed so that the
space can be reused. The text of the strings is stored in string space.
Nothing is put in string space until you actually specify what the
string is to be, i.e., by an assignment statement. At the time of the
declaration, nothing is put in string space. Instead the compiler sets
up a 2-word string descriptor for each string declared. The first word
contains in its left-half an indication of whether the string is a
constant or a variable and in its right-half the length of the string.
The second word is a byte pointer to the location of the start of the
string in string space. At the time of the declaration, the length will
be zero and the byte pointer word will be empty since the string is not
yet in string space.
From this we can see that LENGTH and LOP are very efficient operations.
LENGTH picks up the length from the descriptor word; and LOP decrements
the length by 1, picks up the character designated by the byte pointer,
and increments the byte pointer. LOP does not need to do anything with
string space. Concatenations with & are however fairly inefficient
since in general new strings must be created. For s & t, there is
usually no way to change the descriptor words to come up with the new
string (unless s and t are already adjacent in string space). Instead
both s and t must be copied into a new string in string space. In
general since the pointer is kept to the beginning of the string, it is
less expensive to look at the beginning than the end. On the other
hand, when concatenating, it is better to keep building onto the end of
a given string rather than the beginning. The runtime routines know
what is at the end of string space and, if you happen to concatenate to
the end of the last string put in, the routines can do that efficiently
without needing to copy the last string.
Assigning one string variable to another, e.g., for making a temporary
copy of the string, is also fast since the string descriptor rather than
the text is copied.
These are general guidelines rather than strict rules. Different
programs will have different specific needs and features.
2.4.5 Substrings
Sail provides a way of dealing with selected subportions of strings
called substrings. There are two different ways to designate the
desired substring:
s[i TO j]
s[i FOR j]
where [i TO j] means the substring starting at the ith character in the
string through the jth character and [i FOR j] is the substring starting
at the ith character that is j characters long. The numbering starts
with 1 at the first character on the left. The special symbol INF can
be used to refer to the last character (the rightmost) in the string.
So, s[INF FOR 1] is the last character; and s[7 TO INF] is all but the
first six characters. If you are using a substring of a string array
element then the format is arry[index][i TO j].
Suppose you have made the assignment s "abcdef" . Then,
s[1 TO 3] is "abc"
s[2 FOR 3] is "bcd"
s[1 TO INF] is "abcdef"
s[INF-1 TO INF] is "ef"
s[1 TO 3]&"X"&s[4 TO INF] is "abcXdef" .
Since substrings are parts of the text of their source strings, it is a
very cheap operation to break a string down, but is fairly expensive to
build up a new string out of substrings.
2.4.6 Type Conversions
If you use an expression of one type where another type was expected,
then automatic type conversion is performed. For example,
INTEGER i;
i _ SQRT(5);
will cause 5 to be converted to real (because SQRT expects a real
argument) and the square root of 5.0 to be automatically converted to an
integer before it is assigned to i which was declared as an integer
variable and can only have integer values. As noted in Section 4.2 of
the Sail manual, this conversion is done by truncating the real value.
Another example of automatic type conversion that we have used here in
many of the sample programs is:
IF reply = "Y" THEN .....
where the = operator always expects integer or real arguments rather
than strings. Both the value of the string variable reply and the
string constant "Y" will be converted to integer values before the
equality test. The manual shows that this conversion, string-to-
integer, is performed by taking the first character of the string and
using its ASCII value. Similarly converting from integer to string is
done by interpreting the integer (or just the rightmost seven bits if it
is less than 0 or it is too large--that is any number over 127 or '177)
as an ASCII code and using the character that the code represents as the
string. So, for example,
STRING s;
s _ '101 & '102 & '103;
will make the string "ABC".
The other common conversions that we have seen are integer/real to
boolean and string to boolean. Integers and reals are true if non-zero;
strings are true if they have a non-zero length and the first character
of the string is not the NUL character (which is ASCII code 0).
You may also call one of the built-in type conversion procedures
explicitly. We have used CVD extensively to convert strings containing
digits to the integer number which the digits represent. CVD and a
number of other useful type conversion procedures are described in
Section 8.1 of the Sail manual. Also this section discusses the
SETFORMAT procedure which is used for specifying the number of leading
zeroes and the maximum length of the decimal portion of the real when
printing. SETFORMAT is extremely useful if you will be outputting
numbers as tables and need to have them automatically line up
vertically.
2.5 Scope of Blocks
So far we have seen basically only one use of inner blocks. With the
IF..THEN statement, we saw that you sometimes need a block rather than a
simple statement following the THEN or ELSE so that a group of
statements can be executed as a unit.
In fact, blocks can be used within the program any place that you can
use a single statement. Syntactically, blocks are statements. A
typical program might look like this:
BEGIN "prog"
.
.
BEGIN "initialization"
.
.
END "initialization"
BEGIN "main part"
BEGIN "process data"
.
.
BEGIN "output results"
.
.
END "output results"
END "process data"
.
.
END "main part"
BEGIN "finish up"
.
.
END "finish up"
END "prog"
The declarations in each block establish variables which can only be
used in the given block. So another reason for using inner blocks is to
manage variables needed for a specific short range task.
Each block can (should) have a block name. The name is given in quotes
following the BEGIN and END of the block. The case of the letters,
number of spaces, etc. are important (as in string constants) so that
the names "MAIN LOOP", "Main Loop", "main loop", and "Main loop" are all
different and will not match. There are several advantages to using
block names: your programs are easier to read, the names will be used by
the debugger and thus will make debugging easier, and the compiler will
check block names and report any mismatches to help you pinpoint missing
END's (a very common programming error).
The above example shows us how blocks may nest. Any block which is
completely within the scope of another block is said to be nested in
that block. In any program, all of the inner blocks are nested in the
outer block. Here, in addition to all the blocks being within the
"prog" block, we find "output results" nested in "process data" and both
"output results" and "process data" nested in "main part". The three
blocks called "initialization", "main part" and "finish up" are not
nested with relation to each other but are said to be at the same level.
None of the variables declared in any of these three blocks is available
to any of the others. In order to have a variable shared by these
blocks, we need to declare it in a block which is "outer" to all of
them, which is in this case the very outermost block "prog".
Variables are available in the block in which they are declared and in
all the blocks nested in that block UNLESS the inner block also has a
variable of the same name declared (a very bad idea in general). The
portion of the program, i.e., the blocks, in which the variable is
available is called the scope of the variable.
BEGIN "main"
INTEGER i, j;
i_5;
j_2;
PRINT("CASE A: i=",i," j=",j);
BEGIN "inner"
INTEGER i, k;
i_10;
k_3;
PRINT("CASE B: i=",i," j=",j," k=",k);
j_4;
END "inner" ;
PRINT("CASE C: i=",i," j=",j);
END "main"
Here we cannot access k except in block "inner". The variable j is the
same throughout the entire program. There are 2 variables both named i.
So the program will print out:
CASE A: i=5 j=2
CASE B: i=10 j=2 k=3
CASE C: i=5 j=4
Variables are referred to as local variables in the block in which they
are declared. They are called global variables in relation to any of
the blocks nested in the block of their declaration. With both a local
and a global variable of the same name, the local variable takes
precedence. There are three relationships that a variable can have to a
block:
1) It is inaccessible to the block if the variable is
declared in a block at the same level as the given block or it is
declared in a block nested within the given block.
2) It is local to the block if it is declared in the block.
3) It is global to the block if it is declared in one of the
blocks that the given block is nested within.
Often the term "global variables" is used specifically to mean the
variables declared in the outer block which are global to all the other
blocks.
In reading the Sail manual, you will see the terms: allocation,
deallocation, initialization, and reinitialization. It is not important
to completely understand the implementation details, but it is extremely
important to understand the effects. The key point is that allocating
storage for data can be handled in one of two ways. Storage allocation
refers to the actual setting up of data locations in memory. This can
be done 1) at compile time or 2) at runtime. If it is done at runtime
then we say that the allocation is dynamic. Basically, it is arrays
which are dynamically allocated (excluding outer block arrays and other
arrays which are declared as OWN). LISTS, SETS, and RECORDS which we
have not discussed in this section are also allocated dynamically. The
following are allocated at compile time and are NOT dynamic: scalar
variables (INTEGER, BOOLEAN, REAL and STRING) except where the scalar
variable is in a recursive procedure, outer block arrays, and other OWN
arrays. ALGOL users should note this as an important ALGOL/Sail
difference.
Dynamic storage (inner block arrays, etc.) will be allocated at the
point that the block is entered and deallocated when the block is
exited. This makes for quite efficient use of large amounts of storage
space that serve a short term need. Also, it allows you to set variable
size bounds for these arrays since the value does not need to be known
at compile time.
At the time that storage is allocated, it is also initialized. This
means that the initial value is assigned---NULL for strings and 0 for
integers, reals, and booleans. Since arrays are allocated each time the
block is entered, they are reinitialized each time. We have not yet
seen any cases where the same block is executed more than once but this
is very frequent with the iterative and looping control statements.
Scalar variables and outer block arrays are not dynamically allocated.
They are allocated by the compiler and will receive the inital null or
zero value when the program is loaded but they will never be
reinitialized. While you are not in the block, the variables are not
accessible to you but they are not deallocated so they will have the
same value when you enter the block the next time as when you exited it
on the previous use. Usually you will find that this is not what you
want. You should initialize all local scalar variables yourself
somewhere near the start of the block--usually to NULL for strings and 0
for arithmetic variables unless you need some other specific initial
value. You should also initialize all global scalars (and outer block
arrays) at the start of your program to be on the safe side. They are
initialized for you when the compiled program is later run, but their
values will not be reinitialized if the program is restarted while
already in core and the results will be very strange.
One exception is the blocks in RECURSIVE PROCEDUREs which do have all
non-OWN variables properly handled and initialized as recursive calls
are made on the blocks.
If you should want to clear an array, the command
ARRCLR(arry)
will clear arry (set string arrays to NULL and arithmetic to 0). For
arithmetic (NOT string) arrays,
ARRCLR(arry,val)
will set the elements of arry to val.
See Sections 2.2-2.4 of the Sail manual for more information on OWN,
SAFE, and PRELOADED arrays and Section 8.5 for the ARRBLT and ARRTRAN
routines for moving the contents of arrays.
2.6 More Control Statements
2.6.1 FOR Statement
The FOR statement is used for a definite number of iterations. Many
times you will want to repeat certain code a specific number of times
(where usually the number in the sequence of repetitions is also
important in the code performed). For example,
FOR i _ 1 STEP 1 UNTIL 5 DO
PRINT(i, " ", SQRT(i));
which will print out a table of the square roots of the numbers 1 to 5.
The syntax of the (simple) FOR statement is
FOR variable _ starting-value STEP increment
UNTIL end-value DO statement
The iteration variable is assigned the starting-value and tested to
check if it exceeds the end-value; if it is within the range then the
statement after the DO is executed (otherwise the FOR statement is
finished). This completes the first execution of the FOR-loop.
Next the increment is added to the variable and it is tested to see if
it now exceeds the end-value. If it does then the statement is not
executed again and the FOR statement is finished. If it is within the
maximum (or equal to it) then the statement is executed again but all
instances of the iteration variable in the statement will now have the
new value. This incrementing and checking and executing loop is
repeated until the iteration variable exceeds the end-value.
For those users familar with GOTO statements and LABELs, the following
two program fragments for computing ans FACT(n) are equivalent.
ans _ 1;
FOR i _ 2 STEP 1 UNTIL n DO ans _ ans * i;
is equivalent to:
ans _ 1;
i _ 2;
loop: IF i > n THEN GOTO beyond;
ans _ ans * i;
i _ i + 1;
GOTO loop;
beyond:
There is considerable dispute on whether or not the use of GOTO
statements should be encouraged and if so under what conditions. These
statements are available in Sail but will not be discussed in this
Tutorial.
Very often FOR-loops are used for indexing through arrays. For example,
if you are computing averages, you will need to add together numbers
which might be stored in an array. The following program allows a
teacher to input the total number of tests taken and a list of the
scores; then the program returns the average score.
BEGIN "averager"
REAL average; INTEGER numbTests, total;
average_numbTests_total_0;
COMMENT remember to initialize variables;
PRINT("Total number of tests: ");
numbTests_CVD(INCHWL);
BEGIN "useArray"
INTEGER ARRAY testScores[1:numbTests];
COMMENT array has variable bounds so must
be in inner block;
INTEGER i;
COMMENT for use as the iteration variable;
FOR i _ 1 STEP 1 UNTIL numbTests DO
BEGIN "fillarray"
PRINT("Test Score #",i," : ");
testScores[i] _ CVD(INCHWL);
END "fillarray";
FOR i _ 1 STEP 1 UNTIL numbTests DO
total_total+testScores[i];
COMMENT note that total was initialized to
0 above;
END "useArray";
IF numbTests neq 0 THEN average_total/numbTests;
PRINT("The average is ",average,".");
END "averager";
In the first FOR-loop, we see that i is used in the PRINT statement to
tell the user which test score is wanted then it is used again as the
array subscript to put the score into the i'th element of the array.
Similarly it is used in the second FOR-loop to add the i'th element to
the cumulative total.
The iteration variable, start-value, increment, and end-value can all be
reals as well as integers. They can also be negatives (in which case the
maximum is taken as a minimum). See the Sail manual for details on
other variations where multiple values can be given for more complex
statements (these aren't used often). One point to note is that in Sail
the end-value expression is evaluated each time through the loop, while
the increment value is evaluated only at the beginning if it is a
complex expression, as opposed to a constant or a simple variable. This
means that for efficiency, if your loop will be performed very many
times you should not have very complicated expressions in the end-value
position. If you need to compute the end-value, do it before the FOR-
loop and assign the value to a variable that can be used in the FOR-loop
to save having to recompute the value each time. This doesn't save much
and probably isn't worth it for 5 or 10 iterations but for 500 or 1000
it can be quite a savings. For example use:
max_(ptr-offset)/2;
FOR i_offset STEP 1 UNTIL max DO s ;
rather than
FOR i_offset STEP 1 UNTIL (ptr-offset)/2 DO s;
2.6.2 WHILE...DO Statement and DO...UNTIL Statement
Often you will want to repeat code but not know in advance how many
times. Instead the iteration will be finished when a certain condition
is met. This is called indefinite iteration and is done with either a
WHILE...DO or a DO...UNTIL statement.
The syntax of WHILE statements is:
WHILE boolean-expression DO statement
The boolean is checked and if FALSE nothing is done. If TRUE the
statement is executed and then the boolean is checked again, etc.
For example, suppose we want to check through the elements of an integer
array until we find an element containing a given number n:
INTEGER ARRAY arry[1:max];
ptr _ 1;
WHILE (arry[ptr] NEQ n) AND (ptr < max) DO
ptr_ptr+1;
If the array element currently pointed to by ptr is the number we are
looking for OR if the ptr is at the upper bound of the array then the
WHILE statement is finished. Otherwise the ptr is incremented and the
boolean (now using the next element) is checked again. When the
WHILE...DO statement is finished, either ptr will point to the array
element with the number or ptr=max will mean that nothing was found.
The WHILE...DO statement is equivalent to the following format with
LABELs and the GOTO statement:
loop: IF NOT boolean expression THEN
GOTO beyond;
statement;
GOTO loop;
beyond:
The DO...UNTIL statement is very similar except that 1) the statement is
always executed the first time and then the check is made before each
subsequent loop through and 2) the loop continues UNTIL the boolean
becomes true rather than WHILE it is true.
DO statement UNTIL boolean-expression
For example, suppose we want to get a series of names from the user and
store the names in a string array. We will finish inputting the names
when the user types a bare carriage-return (which results in a string of
length 0 from INCHWL or INTTY).
i_0;
DO PRINT("Name #",i_i+1," is: ")
UNTIL (LENGTH(names[i]_INCHWL) = 0 );
The equivalent of the DO...UNTIL statement using LABELs and the GOTO
statement is:
loop: statement;
IF NOT boolean expression THEN GOTO loop;
Note that the checks in the WHILE...DO and DO...UNTIL statements are the
reverse of each other. WHILE...DO continues as long as the expression
is true but DO...UNTIL continues as long as the expression is NOT true.
So that
WHILE i < 100 DO .....
is equivalent to
DO ..... UNTIL i GEQ 100
except that the statement is guaranteed to be executed at least once
with the DO...UNTIL but not with the WHILE...DO.
The WHILE and DO statements can be used, for example, to check that a
string which we have input from the user is really an integer. CVD
stops converting if it hits a non-digit and returns the results of the
conversion to that point but does not give an error indication so that a
check of this sort should probably be done on numbers input from the
user before CVD is called.
INTEGER numb, char;
STRING reply,temp; BOOLEAN error;
PRINT("Type the number: ");
DO
BEGIN
error_FALSE;
temp_reply_INCHWL;
WHILE LENGTH(temp) DO
IF NOT ("0" LEQ (char_LOP(temp)) LEQ "9")
THEN error_TRUE;
IF error THEN PRINT("Oops, try again: ");
END
UNTIL NOT error;
numb_CVD(reply);
2.6.3 DONE and CONTINUE Statements
Even with definite and indefinite iterations available, there will still
be times when you need a greater degree of control over the loop. This
is accomplished by the DONE and CONTINUE statements which can be used in
any loop which begins with DO, e.g.,
FOR i 1 STEP 1 UNTIL j DO ...
DO ... UNTIL exp
WHILE exp DO ...
(See the manual for a discussion of the NEXT statement which is not
often used.) DONE means to abort execution of the entire FOR,
DO...UNTIL or WHILE...DO statement immediately. CONTINUE means to stop
executing the current pass through the loop and continue to the next
iteration.
Suppose a string array is being used as a "dictionary" to hold a list of
100 words and we want to look up one of the words which is now stored in
a string called target:
FOR i _ 1 STEP 1 UNTIL 100 DO
IF EQU(words[i],target) THEN DONE;
IF i>100 THEN PRINT(target," not found.");
If the target is found, the FOR-loop will stop regardless of the current
value of i. Note that the iteration variable can be checked after the
loop is terminated to determine whether the DONE forced the termination
(i LEQ 100) or the target was never found and the loop terminated
naturally (i > 100).
If the loops are nested then the DONE or CONTINUE applies to the
innermost loop unless there are names on the blocks to be executed by
each loop and the name is given explicitly, e.g., DONE "someloop". With
the DONE and CONTINUE statements, we can now give the complete code to
be used for the sample program given earlier where a number was accepted
from the user and the square root of the number was returned. A variety
of error checks are made and the user can continue giving numbers until
finished. In this example, block names will be used with DONE and
CONTINUE only where they are necessary for the correctness of the
program; but use of block names everywhere is a good practice for clear
programming.
BEGIN "prog" STRING temp,reply; INTEGER numb;
WHILE TRUE DO
COMMENT a very common construction which just
loops until DONE;
BEGIN "processnumb"
PRINT("Type a number, <CR> to end, or ? :");
WHILE TRUE DO
BEGIN "checker"
IF NOT LENGTH(temp_reply_INCHWL) THEN
DONE "processnumb";
IF reply = "?" THEN
BEGIN
PRINT("..helptext & reprompt..");
CONTINUE;
COMMENT defaults to "checker";
END;
WHILE LENGTH(temp) DO
IF NOT ("0" LEQ LOP(temp) LEQ "9") THEN
BEGIN
PRINT("Oops, try again: ");
CONTINUE "checker";
END;
IF (numb_CVD(reply)) < 0 THEN
BEGIN
PRINT("Negative, try again: ");
CONTINUE;
END;
DONE;
COMMENT if all the checks have been
passed then done;
END "checker";
PRINT("The Square Root of ",numb," is ",
SQRT(numb),".");
COMMENT now we go back to top of loop
for next input;
END "processnumb";
END "prog"
2.6.4 CASE Statement
The CASE statement is similar to the CASE expression where S0,S1,...Sn
represent the statements to be given at these positions.
CASE integer OF
BEGIN
S0;
; COMMENT the empty statement;
S2;
.
.
Sn
END;
where ;'s are included for those cases where no action is to be taken.
Another version of the CASE statement is:
CASE integer OF
BEGIN
[0] S0;
[4] S4; COMMENT cases can be skipped;
[3] S3; COMMENT need not be in order;
[5] S5;
[6][7] S6; COMMENT may be same statement;
[8] S8;
.
.
[n] Sn
END;
where explicit numbers in []'s are given for the cases to be included.
It is very IMPORTANT not to use a semi-colon after the final statement
before the END. Also, do NOT use CASE statements if you have a sparse
number of cases spread over a wide range because the compiler will make
a giant table, e.g.,
CASE number OF
BEGIN
[0] S0;
[1000] S1000;
[2000] S2000
END;
would produce a 2001 word table!
Remember that the first case is 0 not 1. An example is using a CASE
statement to process lettered options:
INTEGER char;
PRINT("Type A,B,C,D, or E : ");
char_INCHWL;
CASE char-"A" OF
COMMENT "A"-"A" is 0, and is thus case 0;
BEGIN
<code for A option>;
<code for B option>;
.
.
<code for E option>
END;
2.7 Procedures
We have been using built-in procedures and in fact would be lost without
them if we had to do all our own coding for the arithmetic functions,
the interactions with the system like Input/Output, and the general
utility routines that simplify our programming. Similarly, good
programmers would be lost without the ability to write their own
procedures. It takes a little time and practice getting into the habit
of looking at programming tasks with an eye to spotting potential
procedure components in the task, but it is well worth the effort.
Often in programming, the same steps must be repeated in different
places in the program. Another way of looking at it is to say that the
same task must be performed in more than one context. The way this is
usually handled is to write a procedure which is the sequence of
statements that will perform the task. This procedure itself appears in
the declaration portion of one of the blocks in your program and we will
discuss later the details of how you declare the procedure. Essentially
at the time that you are writing the statement portion of your program,
you can think of your procedures as black boxes. You recognize that you
have an instance of the task that you have designed one of your
procedures to perform and you include at that point in your sequence of
statements a procedure call statement. The procedure will be invoked
and will handle the task for you. In the simplest case, the procedure
call is accomplished by just writing the procedure's name.
For example, suppose you have a calculator-type program that accepts an
arithmetic expression from the user and evaluates it. At suitable
places in the program you will have checks to make sure that no
divisions by zero are being attempted. You might write a procedure
called zeroDiv which prints out a message to the user saying that a zero
division has occurred, repeats the current arithmetic expression, and
asks if the user would like to see the prepared help text for the
program. Every time you check for zero division anyplace in your
program and find it, you will call this procedure with the statement:
zeroDiv;
and it will do everything it is supposed to do.
Sometimes the general format of the task will be the same but some
details will be different. These cases can be covered by writing a
parameterized procedure. Suppose that we wanted something like our
zeroDiv procedure, but more general, that would handle a number of other
kinds of errors. It still needs to print out a description of the
error, the current expression being evaluated, and a suggestion that the
user consult the help text; but the description of the error will be
different depending on what the error was. We accomplish this by using
a variable when we write the procedure; in this case an integer variable
for the error number. The procedure includes code to print out the
appropriate message for each error number; and the integer variable
errno is added to the parameter list of the procedure. Each of the
parameters is a variable that will need to have a value associated with
it automatically at the time the procedure is called. (Actually arrays
and other procedures can also be parameters; but they will be discussed
later.) We won't worry about the handling of parameters in procedure
declarations now. We are concerned with the way the parameters are
specified in the procedure call. Our procedure errorHandler will have
one integer parameter so we call it with the expression to be associated
with the integer variable errno given in parentheses following the
procedure name in the procedure call. For example,
errorHandler(0)
errorHandler(1)
errorHandler(2)
would be the valid calls possible if we had three different possible
errors.
If there is more than one parameter, they are put in the order given in
the declaration and separated by commas. (Arguments is another term
used for the actual parameters supplied in a procedure call.) Any
expression can be used for the parameter, e.g., for the built-in
procedure SQRT:
SQRT(4)
SQRT(numb)
SQRT(CVD(INCHWL))
SQRT(numb/divisor)
When Sail compiles the code for these procedure calls, it first includes
code to associate the appropriate values in the procedure call with the
variables given in the parameter list of the procedure declaration and
then includes the code to execute the procedure. When errorHandler
PRINTs the error message, the variable errno will have the appropriate
value associated with it. This is not an assignment such as those done
by the assignment statement and we will also be discussing calls by
REFERENCE as well as calls by VALUE; but we don't need to go into the
details of the actual implementation -- see the manual if you are
interested in how procedure calls are implemented and arguments pushed
on the stack.
Just as we often perform the same task many times in a given program so
there are tasks performed frequently in many programs by many
programmers. The authors of Sail have written procedures for a number
of such tasks which can be used by everyone. These are the built-in
procedures (CVD, INCHWL, etc.) and are actually declared in the Sail
runtime package so all that is needed for you to use them is placing the
procedure calls at the appropriate places. Thus these procedures are
indeed black boxes when they are used.
However, for our own procedures, we do need to write the code ourselves.
An example of a useful procedure is one which converts a string argument
to all uppercase characters. First, the program with the procedure call
to upper at the appropriate place and the position marked where the
procedure declaration will go:
BEGIN
STRING reply,name;
***procedure declaration here***
PRINT("Type READ, WRITE, or SEARCH: ");
reply upper(INCHWL);
IF EQU(reply,"READ") THEN ....
ELSE IF EQU(reply,"WRITE") THEN ....
ELSE IF EQU(reply,"SEARCH") THEN ....
ELSE .... ;
END;
We put the code for the procedure right in the procedure declaration
which goes in the declaration portion of any block. Remember that the
procedure must be declared in a block which will make it accessible to
the blocks where you are going to use it; in the same way that a
variable must be declared in the appropriate place. Also, any variables
that appear in the code of the procedure must already be declared (even
in the declaration immediately preceding the procedure declaration is
fine).
Here is the procedure declaration for upper which should be inserted at
the marked position in the above code:
STRING PROCEDURE upper (STRING rawstring);
BEGIN "upper"
STRING tmp; INTEGER char;
tmp_NULL;
WHILE LENGTH(rawstring) DO
BEGIN
char_LOP(rawstring);
tmp_tmp&(IF "a" LEQ char LEQ "z"
THEN char-'40 ELSE char);
END;
RETURN(tmp);
END "upper";
The syntax is:
type-qualifier PROCEDURE identifier ;
statement
for procedures with no parameters OR
type-qualifier PROCEDURE identifier
( parameter-list ) ; statement
where the parameter-list is enclosed in ()'s and a semi-colon precedes
the statement (which is often called the procedure body). The <type-
qualifier>'s will be discussed shortly.
The parameter list includes the names and types of the parameters and
must NOT have a semi-colon following the final item on the list.
Examples are:
PROCEDURE offerHelp ;
INTEGER PROCEDURE findWord
(STRING target; STRING ARRAY words) ;
SIMPLE PROCEDURE errorHandler
(INTEGER errno) ;
RECURSIVE INTEGER PROCEDURE factorial
(INTEGER number) ;
PROCEDURE sortEntries
(INTEGER ptr,first; REAL ARRAY unsorted) ;
STRING PROCEDURE upper (STRING rawString) ;
Each of these now needs a procedure body.
PROCEDURE offerHelp ;
BEGIN "offerHelp"
COMMENT the procedure name is usually used
as block name;
PRINT("Would you like help (Y or N): ");
IF upper(INCHWL) = "Y" THEN PRINT("..help..")
ELSE RETURN;
PRINT("Would you like more help (Y or N): ");
IF upper(INCHWL) = "Y" THEN
PRINT("..more help..");
END "offerHelp";
This offers a brief help text and if it is rejected then RETURNs from
the procedure without printing anything. A RETURN statement may be
included in any procedure at any time. Otherwise the brief help message
is printed and the extended help offered. After the extended help
message is printed (or not printed), the procedure finishes and returns
without needing a specific RETURN statement because the code for the
procedure is over. Note that we can use procedure calls to other
procedures such as upper provided that we declare them in the proper
order with upper declared before offerHelp.
PROCEDURE declarations will usually have type-qualifiers. There are two
kinds: 1) the simple types--INTEGER, STRING, BOOLEAN, and REAL and 2)
the special ones--FORWARD, RECURSIVE, and SIMPLE.
FORWARD is typically used if two procedures call each other. This
creates a problem because a procedure must be declared before it can be
called. For example, if offerHelp called upper, and upper also called
offerHelp then we would need:
FORWARD STRING PROCEDURE upper
(STRING rawstring) ;
PROCEDURE offerHelp ;
BEGIN "offerHelp"
. . .
<code for offerHelp including call to upper>
. . .
END "offerHelp";
STRING PROCEDURE upper (STRING rawstring) ;
BEGIN "upper"
. . .
<code for upper including call to offerHelp>
. . .
END "upper";
The FORWARD declaration does not include the body but does include the
parameter list (if any). This declaration gives the compiler enough
information about the upper procedure for it to process the offerHelp
procedure. FORWARD is also used when there is no order of declaration
of a series of procedures such that every procedure is declared before
it is used. FORWARD declarations can sometimes be eliminated by putting
one of the procedures in the body of the other, which can be done if you
don't need to use both of them later.
RECURSIVE is used to qualify the declaration of any procedure which
calls itself. The compiler will add special handling of variables so
that the values of the variables in the block are preserved when the
block is called again and restored after the return from the recursive
call. For example,
RECURSIVE INTEGER PROCEDURE factorial
(INTEGER i);
RETURN(IF i = 0 THEN 1 ELSE factorial(i-1)*i);
The compiler adds some overhead to procedures that can be omitted if you
do not use any complicated structures. Declaring procedures SIMPLE
inhibits the addition of this overhead. However, there are severe
restrictions on SIMPLE procedures; and also, BAIL can be used more
effectively with non-SIMPLE procedures. So the appropriate use of
SIMPLE is during the optimization stage (if any) after the program is
debugged. At this time the SIMPLE qualifier can be added to the short,
simple procedures which will save some overhead. The restrictions on
SIMPLE procedures are:
1) Cannot allocate storage dynamically, i.e., no non-OWN
arrays can be declared in SIMPLE procedures.
2) Cannot do GO TO's outside of themselves (the GO TO
statement has not been covered here).
3) Cannot, if declared inside other procedures, make any use
of the parameters of the other procedures.
Procedures which are declared as one of the simple types (REAL, INTEGER,
BOOLEAN, or STRING) are called typed procedures as opposed to untyped
procedures (note that the SIMPLE, FORWARD, and RECURSIVE qualifiers have
no effect on this distinction). Typed procedures can return values.
Thus typed procedures are like FORTRAN functions and untyped procedures
are like FORTRAN subroutines. The type of the value returned
corresponds to the type of the procedure declaration. Only a single
value may be returned by any procedure. The format is
RETURN( expression ) where the expression is enclosed in ()'s.
Procedure upper which was given above is a typed procedure which returns
as its value the uppercase version of the string. Another example is:
REAL PROCEDURE averager
(INTEGER ARRAY scores; INTEGER max);
BEGIN "averager" REAL total; INTEGER i;
total _ 0;
FOR i _ 1 STEP 1 UNTIL max DO
total _ total + scores[i];
IF max NEQ 0 THEN RETURN(total/max)
ELSE RETURN(0);
END "averager";
We might have a variety of calls to this procedure:
testAverage _ averager(testScores,numberScores);
salaryAverage _ averager(salaries,numberEmployees);
speedAverage _ averager(speeds,numberTrials);
where testScores, salaries, and speeds are all INTEGER ARRAYs.
Procedure calls can always be used as statements, e.g.,
1) IF divisor=0 THEN errorHandler(1);
2) offerHelp;
3) upper(text);
but as in 3) it makes little sense to use a procedure that returns a
value as a statement since the value is lost. Thus typed procedures
which return values can also be used as expressions, e.g.,
reply_upper(INCHWL);
PRINT(upper(name));
It is not necessary to have a RETURN statement in untyped procedures.
If you do have a RETURN statement in an untyped procedure it CANNOT
specify a value; and if you have a RETURN statement in a typed procedure
it MUST specify a value to be returned. If there is no RETURN statement
in a typed procedure then the value returned will be garbage for integer
and real procedures or the null string for string procedures; this is
not good coding practice.
Procedures frequently will RETURN(true) or RETURN(false) to indicate
success or a problem. For example, a procedure which is supposed to get
a filename from the user and open the file will return true if
successful and false if no file was actually opened:
IF getFile THEN processInput
ELSE errorHandler(22) ;
This is quite typical code where you can see that all the tasks have
been procedurized. Many programs will have 25 pages of procedure
declarations and then only 1 or 2 pages of actual statements calling the
appropriate procedures at the appropriate times. In fact, programs can
be written with pages of procedures and then only a single statement to
call the main procedure.
Basically there are two ways of giving information to a procedure and
three ways of returning information. To give information you can 1) use
parameters to pass the information explicitly or 2) make sure that the
appropriate values are in global variables at the time of the call and
code the procedures so that they access those variables. There are
several disadvantages to the latter approach although it certainly does
have its uses.
First, once a piece of information has been assigned to a parameter, the
coding proceeds smoothly. When you write the procedure call, you can
check the parameter list and see at a glance what arguments you need.
If you instead use a global variable then you need to remember to make
sure it has the right value at the time of each procedure call. In fact
in a complicated program you will have enough trouble remembering the
name of the variable. This is one of the beauties of procedures. You
can think about the task and all the components of the task and code
them once and then when you are in the middle of another larger task,
you only need to give the procedure name and the values for all the
parameters (which are clearly specified in the parameter list so you
don't have to remember them) and the subtask is taken care of. If you
don't modularize your programs in this way, you are juggling too many
open tasks at the same time. Another approach is to tackle the major
tasks first and every time you see a subtask put in a procedure call
with reasonable arguments and then later actually write the procedures
for the subtasks. Usually a mixture of these approaches is appropriate;
and you will also find yourself carrying particularly good utility
procedures over from one program to another, building a library of your
own general utility routines.
The second advantage of parameters over global variables is that the
global variables will actually be changed by any code within the
procedures but variables used as parameters to procedures will not. The
changing of global variables is sometimes called a side-effect of the
procedure.
Here are a pair of procedures that illustrate both these points:
BOOLEAN PROCEDURE Ques1 (STRING s);
BEGIN "Ques1"
IF "?" = LOP(s) THEN RETURN(true)
ELSE RETURN(false);
END "Ques1";
STRING str;
BOOLEAN PROCEDURE Ques2 ;
BEGIN "Ques2"
IF "?" = LOP(str) THEN RETURN(true)
ELSE RETURN(false);
END "Ques2";
The second procedure has these problems: 1) we have to make sure our
string is in the string variable str before the procedure call and 2)
str is actually modified by the LOP so we have to make sure we have
another copy of it. With the first procedure, the string to be checked
can be anywhere and no copy is needed. For example, if we want to check
a string called command, we give Ques1(command) and the LOP done on the
string in Ques1 will not affect command.
Information can be returned from procedures in three ways:
1) With a RETURN(value) statement.
2) Through global variables. You may sometimes actually want
to change a global variable. Also, procedures can only return a
single value so if you have several values being generated in the
procedure, you may use global variables for the others.
3) Through REFERENCE parameters. Parameters can be either
VALUE or REFERENCE. By default all scalar parameters are VALUE
and array parameters are REFERENCE. Array parameters CANNOT be
value; but scalars can be declared as reference parameters. Value
parameters as we have seen are simply used to pass a value to the
variable which appears in the procedure. Reference parameters
actually associate the variable address given in the procedure
call with the variable in the procedure so that any changes made
will be made to the calling variable.
PROCEDURE manyReturns
(REFERENCE INTEGER i,j,k,l,m);
BEGIN
i_i+1; j_j+1; k_k+1; l_l+1; m_m+1;
END;
when called with
manyReturns(var1,var2,var3,var4,var5);
will actually change the var1,..,var5 variables themselves.
Arrays are always called by reference. This is useful; for
example, you might have a
PROCEDURE sorter (STRING ARRAY arry) ;
which sorts a string array alphabetically. It will actually do
the sorting on the array that you give it so that the array will
be sorted when the procedure returns. Note that arrays cannot be
returned with the RETURN statement so this eliminates the need for
making all your arrays global as a means of returning them.
See the Sail manual (Sec. 2) for details on using procedures as
parameters to other procedures.
SECTION 3
Macros
Sail macros are basically string substitutions made in your source code
by the scanner during compilation. Think of your source file as being
read by a scanner that substitutes definitions into the token stream
going to a logical "inner compiler". Anything that one can do with
macros, one could have done without them by editing the file
differently. Macros are used for several purposes.
They are used to define named constants, e.g.,
BEGIN
REQUIRE "{}{}" DELIMITERS;
DEFINE maxSize = {100} ;
REAL ARRAY arry [1:maxSize];
.
.
The {}'s are used as delimiters placed around the right-hand-side of the
macro definition. Wherever the token maxSize appears, the scanner will
substitute 100 before the code is compiled. These substitutions of the
source text on the right-hand-side of the DEFINE for the token on the
left-hand-side wherever it subsequently appears in the source file is
called expanding the macro. The above array declaration after macro
expansion is:
BEGIN
REAL ARRAY arry [1:100];
.
.
which is more efficient than using:
BEGIN INTEGER maxSize;
maxSize_100;
BEGIN
REAL ARRAY arry [1:maxSize];
.
.
Also, in this example, the use of the integer variable for assignment of
the maxSize means that the array bounds declaration is variable rather
than constant so it must be in an inner block; with the macro, maxSize
is a constant so the array can be declared anywhere.
Other advantages to using macros to define names for constants are 1) a
name like maxSize used in your code is easier to understand than an
arbitrary number when you or someone else is reading through the program
and 2) maxSize will undoubtedly appear in many contexts in the program
but if it needs to be changed, e.g., to 200, only the single definition
needs changing. If you had used 100 instead of maxSize throughout the
program then you would have to change each 100 to 200.
Before giving your DEFINEs you should require some delimiters. {}{},
[][], or <><> are good choices. If you don't require any delimiters
then the defaults are """" which are probably a poor choice since they
make it hard to define string constants. The first pair of delimiters
given in the REQUIRE statement are for the right-hand-side of the
DEFINE. See the Sail manual for details on use of the second pair of
delimiters.
DEFINEs may appear anywhere in your program. They are neither
statements nor declarations. REQUIREs can be either declarations or
statements so they can also go anywhere in your program.
Another use of macros is to define octal characters. If you have tried
to use any of the sample programs here you will have discovered a
glaring bug. Each time we have output our results with the PRINT
statement, no account has been taken of the need for a CRLF (carriage
return and line feed) sequence. So all the lines will run together.
Here are 4 possible solutions to the problem:
1) PRINT("Some text.", ('15&'12));
2) PRINT("Some text.
");
3) STRING crlf;
crlf_"
"; PRINT ("Some text.",crlf);
4) REQUIRE "{}" DELIMITERS;
DEFINE crlf = {"
"}; PRINT("Some text.",crlf);
The first solution is hard to type frequently with the octals. (In
general, concatenations should be avoided if possible since new strings
must usually be created for them; but in this case with only constants
in the concatenation, it will be done at compile time so that is not a
consideration.) The second solution with the string extending to the
next line to get the crlf is unwieldy to use in your code. The fourth
solution is both the easiest to type and the most efficient.
You may also want to define a number of the other commonly used control
characters:
REQUIRE "<><>" DELIMITERS;
DEFINE ff = <('14&NULL)>,
lf = <('12&NULL)>,
cr = <('15&NULL)>,
tab = <('11&NULL)>,
ctlO = <'17>;
The characters which will be used as arguments in the PRINT statement
must be forced to be strings. If ff = <'14> were used; then PRINT(ff)
would print the number 12 (which is '14) rather than to print a formfeed
because PRINT would treat the '14 as an integer. For all the other
places that you can use these single character definitions, they will
work correctly whether defined as strings or integers, e.g.,
IF char = ctlO THEN ....
as well as
IF char = ff THEN ....
Note that string constants like '15&'12 and '14&NULL do not ordinarily
need parenthesizing but ('15&'12) and ('14&NULL) were used above. This
is a little trick to compile more efficient code. The compiler will not
ordinarily recognize these as string constants when they appear in the
middle of a concatenated string, e.g.,
"....line1..."&'15&'12&"....line2..."
but with the proper parenthesizing
"....line1..."&('15&'12)&"....line2..."
the compiler will treat the crlf as a string constant at compile time
and not need to do a concatenation on '15 and '12 every time at runtime.
Another very common use of macros is to "personalize" the Sail language
slightly. Usually macros of this sort are used either to save
repetitive typing of long sequences or to make the code where they are
used clearer. (Be careful--this can be carried overboard.)
Here are some sample definitions followed by an example of their use on
the next line:
REQUIRE "<><>" DELIMITERS;
DEFINE upto = <STEP 1 UNTIL>;
FOR i upto 10 DO ....;
DEFINE ! = <COMMENT>;
i_i+1; ! increment i here;
DEFINE forever = <WHILE TRUE>;
forever DO ....;
DEFINE eif = <ELSE IF>;
IF ... THEN ....
EIF .... THEN ....
EIF .... THEN ....;
Macros may also have parameters:
DEFINE append(x,y) = <x_x&y>;
IF LENGTH(s) THEN append(t,LOP(s));
DEFINE inc(n) = <(n_n+1)>,
dec(n) = <(n_n-1)>;
IF inc(ptr) < maxSize THEN ....;
COMMENT watch that you don't forget
needed parentheses here;
DEFINE ctrl(n) = <("n"-'100)>;
IF char = ctrl(O) THEN abortPrint;
As we saw in some of the sample macros, the macro does not need to be a
complete statement, expression, etc. It can be just a fragment.
Whether or not you want to use macros like this is a matter of personal
taste. However, it is quite clear that something like the following is
simply terrible code although syntactically correct (and rumored to have
actually occurred in a program):
DEFINE printer = <PRINT(>;
printer "Hi there.");
which expands to
PRINT("Hi there.");
On the other hand, those who completely shun macros are erring in the
other direction. One of the best coding practices in Sail is to DEFINE
all constant parameters such as array bounds.
SECTION 4
String Scanning
We have not yet covered Input/Output which is one of the most important
topics. Before we do that, however, we will cover the SCAN function for
reading strings. SCAN which reads existing strings is very similar to
INPUT which is used to read in text from a file.
Both SCAN and INPUT use break tables. When you are reading, you could
of course read the entire file in at once but this is not what you
usually want even if the file would all fit (and with the case of SCAN
for strings it would be pointless). A break table is used to 1) set up
a list of characters which when read will terminate the scan, 2) set up
characters which are to be omitted from the resulting string, and 3)
give instructions for what to do with the break character that
terminated the scan (append it to the result string, throw it away,
leave it at the new beginning of the old string, etc.). During the
course of a program, you will want to scan strings in different ways,
for example: scan and break on a non-digit to check that the string
contains only digits, scan and break on linefeed (lf) so that you get
one line of text at a time, scan and omit all spaces so that you have a
compact string, etc. For each of these purposes (which will have
different break characters, omit characters, disposition of the break
character, and setting of certain other modes available), you will need
a different break table. You are allowed to set up as many as 54
different break tables in a program. These are set up with a SETBREAK
command.
A break table is referred to by its number (1 to 54). The GETBREAK
procedure is used to get the number of the next free table and the
number is stored in an integer variable. GETBREAK is a relatively new
feature. Previously, programmers had to keep track of the free numbers
themselves. GETBREAK is highly recommended especially if you will be
interfacing your program with another program which is also assigning
table numbers and may use the same number for a different table.
GETBREAK will know about all the table numbers in use. You assign this
number to a break table by giving it as the first argument to the
SETBREAK function. You can also use RELBREAK(table#) to release a table
number for reassignment when you no longer need that break table.
SETBREAK(table#, "break-characters",
"omit-characters", "modes") ;
where the first argument is an integer and the ""'s around the other
arguments here are a standard way of indicating, in a sample procedure
call, that the argument expected is a string. For example:
REQUIRE "<><>" DELIMITERS;
DEFINE lf = <'12>, cr = <'15>, ff = <'14>;
INTEGER lineBr, nonDigitBr, noSpaces;
SETBREAK(lineBr_GETBREAK, lf, ff&cr, "ins");
SETBREAK(noSpaces_GETBREAK, NULL, " ", "ina");
SETBREAK(nonDigitBr_GETBREAK, "0123456789",
NULL, "xns");
The characters in the "break-characters" string will be used as the
break characters to terminate the SCAN or INPUT. SCAN and INPUT return
that portion of the initial string up to the first occurrence of one of
the break-characters.
The characters in the "omit-characters" string will be omitted from the
string returned.
The "modes" establish what is to be done with the break character that
terminated the SCAN or INPUT. Any combination of the following modes
can be given by putting the mode letters together in a string constant:
CHARACTERS USED FOR BREAK CHARACTERS:
"I" (inclusion) The characters in the break-characters string are the
set of characters which will terminate the SCAN or INPUT.
"X" (eXclusion) Any character except those in the break-characters
string will terminate the SCAN or INPUT, e.g., to break on any digit
use:
INTEGER tbl;
SETBREAK(tbl_GETBREAK,"0123456789",NULL,"i");
and to break on any non-digit use:
INTEGER tbl;
SETBREAK(tbl_GETBREAK,"0123456789","","x");
where NULL or "" can be used to indicate no characters are being
given for that argument.
DISPOSITION OF BREAK CHARACTER:
"S" (skip) The character which actually terminates the SCAN or INPUT
will be "skipped" and thus will not appear in the result string
returned nor will it be still in the original string.
"A" (append) The terminating character will be appended to the end of
the result string.
"R" (retain) The terminating character will be retained in its
position in the original string so that it will be the first
character read by the next SCAN or INPUT.
OTHER MISCELLANEOUS MODES:
"K" This mode will convert characters to be put in the result string to
uppercase.
"N" This mode will discard SOS line numbers if any and should probably
be used for break tables which will be scanning text from a file.
This is a very good Sail coding practice even if it seems highly
unlikely that an SOS file will ever be given to your program.
"result-string" _ SCAN(@"source",table#, @brchar);
In these sample formats, the ""'s mean the argument is a string and the
@ prefix means that the argument is an argument by reference.
When you call the SCAN function, you give it as arguments 1) the source
string, 2) the break table number and 3) the name of an INTEGER variable
where it will put a copy of the character that terminated the scan.
Both the source string and the break character integer are reference
parameters to the SCAN procedure and will have new values when the
procedure is finished. The following example illustrates the use of the
SCAN procedure and also shows how the "S", "A", and "R" modes affect the
resulting strings with the disposition of the break character.
INTEGER skipBr, appendBr, retainBr, brchar;
STRING result, skipStr, appendStr, retainStr;
SETBREAK(skipBr_GETBREAK,"*",NULL,"s");
SETBREAK(appendBr_GETBREAK,"*",NULL,"a");
SETBREAK(retainBr_GETBREAK,"*",NULL,"r");
skipStr_appendStr_retainStr_"first*second";
result _ SCAN(skipStr, skipBr, brchar);
COMMENT EQU(result,"first") AND
EQU(skipStr,"second");
result _ SCAN(appendStr, appendBr, brchar);
COMMENT EQU(result,"first*") AND
EQU(appendStr,"second");
result _ SCAN(retainStr, retainBr, brchar);
COMMENT EQU(result,"first") AND
EQU(retainStr,"*second");
COMMENT in each case above brchar = "*"
after the SCAN;
Now we can look again at the break tables given above:
SETBREAK(lineBr,lf,ff&cr,"ins");
This break table will return a single line up to the lf. Any carriage
returns or formfeeds (usually used as page marks) will be omitted and
the break character is also omitted (skipped) so that just the text of
the line will be returned in the result string. The more conventional
way to read line by line where the line terminators are preserved is
SETBREAK(readLine,lf,NULL,"ina");
Note here that it is extremely important that lf rather than cr be used
as the break character since it follows the cr in the actual text.
Otherwise, you'll end up with strings like
text of line<cr>
<lf>text of line<cr>
<lf>
instead of
text of line<cr><lf>
text of line<cr><lf>
After the SCAN, the brchar variable can be either the break character
that terminated the scan (lf in this case) or 0 if no break character
was encountered and the scan terminated by reaching the end of the
source string.
DO processLine(SCAN(str,readLine,brchar))
UNTIL NOT brchar;
This code would be used if you had a long multi-lined text stored in a
string and wanted to process it one line at a time with PROCEDURE
processLine.
SETBREAK(nonDigitBr,"0123456789",NULL,"xs");
This break table could be used to check if a number input from the user
contains only digits.
WHILE true DO
BEGIN
PRINT("Type a number: ");
reply_INCHWL; ! INTTY for TENEX;
SCAN(reply,nonDigitBr,brchar);
IF brchar THEN
PRINT(brchar&NULL," is not a digit.",crlf)
ELSE DONE;
END;
Here the value of brchar (converted to a string constant since the
integer character code will probably be meaningless to the user) was
printed out to show the user the offending character. There are many
other uses of the brchar variable particularly if a number of characters
are specified in the break-characters string of the break table and
different actions are to be taken depending on which one actually was
encountered.
SETBREAK(noSpaces,NULL," ","ina");
Here there are no break-characters but the omit-character(s) will be
taken care of by the scan, e.g.,
str_"a b c d";
result_SCAN(str,noSpaces,brchar);
will return "abcd" as the result string.
If you need to scan a number which is stored in a string, two special
scanning functions, INTSCAN and REALSCAN, have been set up which do not
require break tables but have the appropriate code built in:
integerVar _ INTSCAN("number-string",@brchar);
realVar _ REALSCAN("number-string",@brchar);
where the integer or real number read is returned; and the string
argument after the call contains the remainder of the string with the
number removed. We could use INTSCAN to check if a string input from a
user is really a proper number.
PRINT("Type the number: ");
reply _ INCHWL; ! INTTY for TENEX;
numb _ INTSCAN(reply,brchar);
IF brchar THEN error;
SECTION 5
Input/Output
5.1 Simple Terminal I/O
We have been doing input/output (I/O) from the controlling terminal with
INCHWL (or INTTY for TENEX) and PRINT. A number of other Teletype I/O
routines are listed in the Sail manual in Sections 7.5 and 12.4 but they
are less often used. Also any of the file I/O routines which will be
covered next can be used with the TTY: specified in place of a file.
Before we cover file I/O, a few comments are needed on the usual
terminal input and output.
The INCHWL (INTTY) that we have used is like an INPUT with the source of
input prespecified as the terminal and the break characters given as the
line terminators. Should you ever want to look at the break character
which terminated an INCHWL or INTTY, it will be in a special variable
called !SKIP! which the Sail runtimes use for a wide variety of
purposes. INTTY will input a maximum of 200 characters. If the INTTY
was terminated for reaching the maximum limit then !SKIP! will be set to
-1. Since this variable is declared in the runtime package rather than
in your program, if you are going to be looking at it, you will need to
declare it also, but as an EXTERNAL, to tell the compiler that you want
the runtime variable.
EXTERNAL INTEGER !SKIP!;
PRINT("Number followed by <CR> or <ALT>: ");
reply_INCHWL; ! INTTY for TENEX;
IF !SKIP! = cr THEN ......
ELSE IF !SKIP! = alt THEN .....
Altmode (escape, enter, etc.) is one of the characters which is
different in the different character sets. The standard for most of the
world including both TOPS-10 and TENEX is to have altmode as '33. At
some point in the past TOPS-10 used '176. This is now obsolete;
however, the SU-AI character set follows this convention but does so
incorrectly. It uses '175 as altmode. This will present a problem for
programs transported among sites. It also partially explains why most
systems when they believe they are dealing with a MODEL-33 Teletype or
other uppercase only terminal (or are in @RAISE mode in TENEX) will
convert the characters '173 to '176 to altmodes.
5.2 Notes on Terminal I/O for TENEX Sail Only
If you are programming in TENEX Sail, you should use INTTY in preference
to the various teletype routines listed in the manual. TENEX does not
have a line editor built in. You can get the effect of a line editor by
using INTTY which allows the user to edit his/her typing with the usual
^A, ^R, ^X, etc. up until the point where the line terminator is typed.
If you use INCHWL, the editing characters are only DEL to rubout one
character and ^U to start over. Efforts have been made in TENEX Sail to
provide line-editing where needed in the various I/O routines when
accessing the controlling terminal. Complete details are contained in
Section 12 of the Sail manual.
TENEX also has a non-standard use of the character set which can
occasionally cause problems. The original design of TENEX called for
replacing crlf sequences with the '37 character (eol). This has since
been largely abandoned and most TENEX programs will not output text with
eol's but rather use the standard crlf. Eol's are still used by the
TENEX system itself. The Sail input routines INPUT, INTTY, etc. convert
eol's to crlf sequences. See the Sail manual for details, if necessary;
but in general, the only time that you should ever have a problem is if
you input from the terminal with some routine that inputs a single
character at a time, e.g., CHARIN. In these cases you will need to
remember that end-of-line will be signalled by an eol rather than a cr.
The user of course types a cr but TENEX converts to eol; and the Sail
single character input functions do not reconvert to cr as the other
Sail input functions do.
5.3 Setting Up a Channel for I/O
Now we need I/O for files. The input and output operations to files are
much like what we have done for the terminal. CPRINT will write
arguments to a file as PRINT writes them to the terminal. It is also
possible with the SETPRINT command to specify that you would rather send
your PRINT's to a file (or to the terminal AND a named file). See the
manual for details.
There are a number of other functions available for I/O in addition to
INPUT and CPRINT, but they all have one common feature that we have not
seen before. Each requires as first argument a channel number. The CPU
performs I/O through input/output channels. Any device (TTY:, LPT:,
DTA:, DSK:, etc.) can be at the other end of the channel. Note that by
opening the controlling terminal (TTY:) on a channel, you can use any of
the input/output routines available. In the case of directory devices
such as DSK: and DTA:, a filename is also necessary to set up the I/O.
There are several steps in the process of establishing the
source/destination of I/O on a numbered channel and getting it ready for
the actual transfer. This is the area in which TOPS-10 and TENEX Sail
have the most differences due to the differences in the two operating
systems. Therefore separate sections will be included here for TOPS-10
and TENEX Sail and you should read only the one relevant for you.
5.3.1 TOPS-10 Sail Channel and File Handling
Routines for opening and closing files in TOPS-10 Sail correspond
closely to the UUO's available in the TOPS-10 system. The main routines
are:
GETCHAN OPEN LOOKUP ENTER RELEASE
Additional routines (not discussed here) are:
USETI USETO MTAPE CLOSE CLOSIN CLOSO
5.3.1.1 Device Opening
chan _ GETCHAN;
GETCHAN obtains the number of a free channel. On a TOPS-10 system,
channel numbers are 0 through '17. GETCHAN finds the number of a
channel not currently in use by Sail and returns that number. The user
is advised to use GETCHAN to obtain a channel number rather than using
absolute channel numbers.
OPEN(chan, "device", mode, inbufs,
outbufs, @count, @brchar, @eof);
The OPEN procedure corresponds to the TOPS-10 OPEN (or INIT) UUO. OPEN
has eight parameters. Some of these refer to parameters that the OPEN
UUO will need; other parameters specify the number of buffers desired,
with other UUO's called by OPEN to set up this buffering; still other
parameters are internal Sail bookkeeping parameters.
The parameters to OPEN are:
1) CHANNEL: channel number, typically the number returned by
GETCHAN.
2) "DEVICE": a string argument that is the name of the device
that is desired, such as "DSK" for the disk or "TTY" for the
controlling terminal.
3) MODE: a number indicating the mode of data transfer.
Reasonable values are: 0 for characters and strings and '14 for
words and arrays of words. Mode '17 for dump mode transfers of
arrays is sometimes used but is not discussed here.
4) INBUFS: the number of input buffers that are to be set up.
5) OUTBUFS: the number of output buffers.
6) COUNT: a reference parameter specifying the maximum number
of characters for the INPUT function.
7) BRCHAR: a reference parameter in which the character on
which INPUT broke will be saved.
8) EOF: a reference parameter which is set to TRUE when the
file is at the end.
The CHANNEL, "DEVICE", and MODE parameters are passed to the OPEN UUO;
INBUFS and OUTBUFS tell the Sail runtime system how many buffers should
be set up for data transfers; and the COUNT, BRCHAR and EOF variables
are cells that are used by Sail bookkeeping. N.B.: many of the above
parameters have additional meanings as given in the Sail manual. The
examples in this section are intended to demonstrate how to do simple
things.
RELEASE(chan);
The RELEASE function, which takes the channel number as an argument,
finishes all the input and output and makes the channel available for
other use.
The following routine illustrates how to open a device (in this case,
the device is only the teletype) and output to that device. The CPRINT
function, which is like PRINT except that its output goes to an
arbitrary channel destination, is used.
BEGIN
INTEGER OUTCHAN;
OPEN(OUTCHAN _ GETCHAN,"TTY",0,0,2,0,0,0);
COMMENT
(1) Obtain a channel number, using
GETCHAN, and save it in variable OUTCHAN.
(2) Specify device TTY, in mode 0,
with 0 input and 2 output buffers.
(3) Ignore the COUNT, BRCHAR, and EOF
variables, which are typically not needed if
the file is only for output. ;
CPRINT(OUTCHAN, "Message for OUTCHAN
");
COMMENT Actual data transfer.;
RELEASE(OUTCHAN);
COMMENT Close channel;
END;
The following example illustrates how to read text from a device, again
using the teletype as the device.
BEGIN
INTEGER INCHAN, INBRCHAR, INEOF;
OPEN (INCHAN _ GETCHAN, "TTY", 0, 2, 0, 200,
INBRCHAR, INEOF);
COMMENT
Opens the TTY in mode 0 (characters), with
2 input buffers, 0 output buffers. At most
200 characters will be read in with each
INPUT statement, and the break character
will be put into variable INBRCHAR. The
end-of-file will be signalled by INEOF
being set to TRUE after some call to an
input function has found that there is no
more data in the file;
WHILE NOT INEOF DO
BEGIN
... code to do input -- see below. ...
END;
RELEASE(INCHAN);
END;
5.3.2 Reading and Writing Disk Files
Most input and output will probably be done to the disk. The disk (and,
typically, the DECtape) are directory devices, which means that
logically separate files are associated with the device. When using a
directory device, it is necessary to associate a file name with the
channel that is open to the device.
LOOKUP(CHAN, "FILENAME", @FLAG);
ENTER(CHAN, "FILENAME", @FLAG);
File names are associated with channels by three functions: LOOKUP,
ENTER, and RENAME. We will discuss LOOKUP and ENTER here. Both LOOKUP
and ENTER take three arguments: a channel number, such as returned by
GETCHAN, which has already been opened; a text string which is the name
of the file, using the file name conventions of the operating system;
and a reference flag that will be set to FALSE if the operation is
successful, or TRUE otherwise. (The TRUE value is a bit pattern
indicating the exact cause of failure, but we will not be concerned with
that here.) There are three permutations of LOOKUP and ENTER that are
useful:
1) LOOKUP alone: this is done when you want to read an
already existing file.
2) ENTER alone: this is done when you want to write a file.
If a file already exists with the selected name, then a new one is
created, and upon closing of the file, the old version is deleted
altogether. This is the standard way to write a file.
3) A LOOKUP followed by an ENTER using the same name: this is
the standard way to read and write an already existing file.
The following program will read an already existing text file, (e.g.,
with the INPUT, REALIN, and INTIN functions, which scan ASCII text.)
Note that the LOOKUP function is used to see if the file is there,
obtaining the name of the file from the user. See below for details
about the functions that are used for the actual reading of the data in
the file.
BEGIN
INTEGER INCHAN, INBRCHAR, INEOF, FLAG;
STRING FILENAME;
OPEN (INCHAN _ GETCHAN, "DSK", 0, 2, 0, 200,
INBRCHAR, INEOF);
WHILE TRUE DO
BEGIN
PRINT("Input file name *");
LOOKUP(INCHAN, FILENAME _ INCHWL, FLAG);
IF FLAG THEN DONE ELSE
PRINT("Cannot find file ", FILENAME,
" try again.
");
END;
WHILE NOT INEOF DO
BEGIN "INPUT"
.... see below for reading characters...
END "INPUT";
RELEASE(INCHAN);
END;
The following program opens a file for writing characters.
BEGIN
INTEGER OUTCHAN, FLAG;
STRING FILENAME;
OPEN (OUTCHAN _ GETCHAN, "DSK", 0, 0, 2, 0,
0, 0);
WHILE TRUE DO
BEGIN
PRINT("Output file name *");
ENTER(OUTCHAN, FILENAME _ INCHWL, FLAG);
IF NOT FLAG THEN DONE ELSE
PRINT("Cannot write file ", FILENAME,
" try again.
");
END;
... now write the text to OUTCHAN ...
RELEASE(OUTCHAN);
END;
5.3.2.1 Reading and Writing Full Words
Reading 36-bit PDP10 words, using WORDIN and ARRYIN, and writing words
using WORDOUT and ARRYOUT, is accomplished by opening the file using a
binary mode such as '14. We recommend the use of binary mode, with 2 or
more input and/or output buffers selected in the call to the OPEN
function. There are other modes available, such as mode '17 for dump
mode transfers; see the timesharing manual for the operating system.
5.3.2.2 Other Input/Output Facilities
Files can be renamed using the RENAME function. Some random input and
output is offered by the USETI and USETO functions, but random input and
output produces strange results in TOPS-10 Sail. Best results are
obtained by using USETI and USETO and reading or writing 128-word arrays
to the disk with ARRYIN and ARRYOUT.
Magnetic tape operations are performed with the MTAPE function.
See the Sail manual (Sec. 7) for more details about these functions. In
particular, we stress that we have not covered all the capabilities of
the functions that we have discussed.
5.3.3 TENEX Sail Channel and File Handling
TENEX Sail has included all of the TOPS-10 Sail functions described in
Section 7.2 of the Sail manual for reasons of compatibility and has
implemented them suitably to work on TENEX. Descriptions of how these
functions actually work in TENEX are given in Section 12.2 of the
manual. However, they are less efficient than the new set of
specifically TENEX routines which have been added to TENEX Sail so you
probably should skip these sections of the manual. The new TENEX
routines are also greatly simplified for the user so that a number of
the steps to establishing the I/O are done transparently.
Basically, you only need to know three commands: 1) OPENFILE which
establishes a file on a channel, 2) SETINPUT which establishes certain
parameters for the subsequent inputs from the file, and 3) CFILE which
closes the file and releases the channel when you are finished.
chan# _ OPENFILE("filename","modes")
The OPENFILE function takes 2 arguments: a string containing the device
and/or filename and a string constant containing a list of the desired
modes. OPENFILE returns an integer which is the channel number to be
used in all subsequent inputs or outputs. If you give NULL as the
filename then OPENFILE goes to the user's terminal to get the name. (Be
sure if you do this that you first PRINT a prompt to the terminal.) The
modes are listed in the Sail manual (Sec. 12.3) but not all of those
listed are commonly used. The following are the ones that you will
usually give:
R or W or A for Read, Write, or Append depending on what you
intend to do with the file.
* if you are allowing multi-file specifications, e.g.,
data.*;* .
C if the user is giving the filename from the terminal,
C mode will prompt for [confirm].
E if the user is giving the filename and an error
occurs (typically when the wrong filename is typed), the E
mode returns control to your program. If E is not specified
the user is automatically asked to try again.
Modes O and N for Old or New File are also allowed but probably
shouldn't be used. They are misleading. The defaults, e.g. without
either O or N specified, are the usual conditions (read an old version
and write a new version). The O and N options are peculiar. For
example, "NW" means that you must specify a completely new filename for
the file to be written, e.g., a name that has not been used before. N
does not mean a new version as one might have expected. In general, the
I/O routines use the relevant JSYS's directly and thus include all of
the design errors and bugs in the JSYS's themselves.
INTEGER infile, outfile, defaultsFile;
PRINT("Input file: ");
inFile _ OPENFILE(NULL,"rc");
PRINT("Output file: ");
outFile _ OPENFILE(NULL,"wc");
defaultsFile _
OPENFILE("user-defaults.tmp","w");
We now have files "open" on 3 channels--one for reading and two for
writing. We have the channel numbers stored in inFile, outFile, and
defaultsFile so that we can refer to the appropriate channel for each
input or output. Next we need to do a SETINPUT on the channel open for
input (reading).
SETINPUT(chan#, count, @brchar, @eof)
There are four arguments:
1) The channel number.
2) An integer number which is the maximum number of
characters to be read in any input operation (the default if no
SETINPUT is done is 200).
3) A reference integer variable where the input function will
put the break character.
4) A reference integer variable where the input function will
put true or false for whether or not the end-of-file was reached
(or the error number if an error was encountered while reading).
So here we need:
INTEGER infileBrChr, infileEof;
SETINPUT (infile, 200, infilebrchr, infileEof);
Now we do the relevant input/output operations and when finished:
CFILE(infile);
CFILE(outfile);
CFILE(defaultsFile);
A simple example of the use of these routines for opening a file and
outputting to it is:
INTEGER outfile;
PRINT("Type filename for output: ");
outfile_OPENFILE(NULL,"wc");
CPRINT(outfile, "message...");
CFILE(outfile);
where CPRINT is like PRINT except for the additional first argument
which is the channel number.
The OPENFILE, SETINPUT, and CFILE commands will handle most situations.
If you have unusual requirements or like to get really fancy then there
are many variations of file handling available. A few of the more
commonly used will be covered in the next section; but do not read this
section until you have tried the regular routines and need to do more
(if ever). On first reading, you should now skip to Section 5.4.
5.3.4 Advanced TENEX Sail Channel and File Handling
If you want to use multiple file designators with *'s, you should give
"*" as one of the options to OPENFILE. Then you will need to use
INDEXFILE to sequence through the multiple files. The syntax is
found!another!file _ INDEXFILE(chan#)
where found!another!file is a boolean variable. INDEXFILE accomplishes
two things. First, if there is another file in the sequence, it is
properly initialized on the channel; and second, INDEXFILE returns TRUE
to indicate that it has gotten another file. Note that the original
OPENFILE gets the first file in the sequence on the channel so that you
don't use the INDEXFILE until you have finished processing the first
file and are ready for the second. This is done conveniently with a
DO...UNTIL where the test is not made until after the first time through
the loop, e.g.,
multiFiles _ OPENFILE("data.*","r*");
DO
BEGIN
...<input and process current file>...
END
UNTIL NOT INDEXFILE(multiFiles);
Another available option to the OPENFILE routine which you should
consider using is the "E" option for error handling. If you specify
this option and the user gives an incorrect filename then OPENFILE will
return -1 rather than a channel number and the TENEX error number will
be returned in !SKIP!. Remember to declare EXTERNAL INTEGER !SKIP! if
you are going to be looking at it. Handling the errors yourself is
often a good idea. TENEX is unmerciful. If the user gives a bad
filename, it will ask again and keep on asking forever even when it is
obvious after a certain number of tries that there is a genuine problem
that needs to be resolved.
Another use for the "E" mode is to offer the user the option of typing a
bare <CR> to get a default file. If the "E" mode has been specified and
the user types a carriage-return for the filename then we know that the
error number returned in !SKIP! will be the number (listed in the JSYS
manual) for "Null filename not allowed." so we can intercept this error
and simply do another OPENFILE with the default filename, e.g.,
EXTERNAL INTEGER !SKIP!;
outfile_-1;
WHILE outfile = -1 DO
BEGIN
PRINT("Filename (<CR> for TTY:) *");
outfile_OPENFILE(NULL,"we");
IF !skip! = '600115 THEN
outfile_OPENFILE("TTY:","w");
END;
The GTJFNL and GTJFN routines are useful if you need more options than
are provided in the OPENFILE routine, but neither of these actually
opens the file so you will need an OPENF or OPENFILE after the GTJFNL or
GTJFN unless your purpose in using the GTJFN is specifically that you do
not want to open the file. The GTJFNL routine is actually the long form
of the GTJFN JSYS; and the GTJFN routine is the short form of the GTJFN
JSYS. See the TENEX JSYS manual for details.
Another use of GTJFNL is to combine filename specification from a string
with filename specification from the user. This is a simple way to
preprocess the filename from the user, i.e., to check if it is really a
"?" rather than a filename. First, you need to declare !SKIP! and ask
the user for a filename:
EXTERNAL INTEGER !SKIP!;
WHILE TRUE DO
BEGIN "getfilename"
PRINT("Type input filename or ? : ");
Next do a regular INTTY to get the reply into a string:
s _ INTTY;
Then you process the string in any way that you choose, e.g., check if
it is a "?" or some other special keyword:
IF s = "?" THEN BEGIN
givehelp;
CONTINUE "getfilename";
END;
If you decide it is a proper filename and want to use it then you give
that string (with the break character from INTTY which will be in !SKIP!
appended back on to the end of the string) to the GTJFNL.
chan# _ GTJFNL(s&!SKIP!, '160000000000,
'000100000101, NULL, NULL, NULL,
NULL, NULL, NULL);
If the string ended in altmode meaning that the user wanted filename
recognition then that will be done; and if the string is not enough for
recognition and more typein is needed then the GTJFNL will ring the bell
and go back to the user's terminal without the user knowing that any
processing has gone on in the meantime, i.e., to the user it looks
exactly like the ordinary OPENFILE. Thus the GTJFNL goes first to the
string that you give it but can then go to the terminal if more is
needed.
After the GTJFNL don't forget that you still need to OPENF the file.
For reading a disk file,
OPENF (chan#, '440000200000);
is a reasonable default, and for writing:
OPENF (chan#, '440000100000);
The arguments to GTJFNL are:
chan# _ GTJFNL("filename", flags, jfnjfn,
"dev", "dir", "name", "ext",
"protection", "acct");
where the flag specification is made by looking up the FLAGS for the
GTJFN JSYS in the JSYS manual and figuring out which bits you want
turned on and which off. The 36-bit resulting word can be given here in
its octal representation. '160000000000 means bits 2 (old file only), 3
(give messages) and 4 (require confirm) are turned on. Remember that
the bits start with Bit 0 on the left. The jfnjfn will probably always
be '000100000101. This argument is for the input and output devices to
be used if the string needs to be supplemented. Here the controlling
terminal is used for both. Devices on the system have an octal number
associated with them. The controlling terminal as input device is '100
and as output is '101. For most purposes you can refer to the terminal
by its "name" which is TTY: but here the number is required. The input
and output devices are given in half word format which means that '100
is in the left and '101 in the right half of the word with the
appropriate 0's filled out for the rest.
The next six arguments to GTJFNL are for defaults if you want to give
them for: device, directory, file name, file extension, file protection,
and file account. If no default is given for a field then the standard
default (if any) is used, e.g., DSK: for device and Connected Directory
for directory. This is another reason why you may choose GTJFNL over
OPENFILE for getting a filename. In this way, you can set up defaults
for the filename or extension. You can also use GTJFNL to simulate a
directory search path. For example, the EXEC when accepting the name of
a program to be run follows a search path to locate the file. First it
looks on <SUBSYS> for a file of that name with a .SAV extension. Next
it looks on the connected directory and finally on the login directory.
If you have an analogous situation, you can use a hierarchical series of
GTJFNL's with the appropriate defaults specified:
EXTERNAL INTEGER !SKIP!;
INTEGER logdir,condir,ttyno;
STRING logdirstr,condirstr;
GJINF(logdir,condir,ttyno);
COMMENT puts the directory numbers for login
and connected directory and the tty# in
its reference integer arguments;
logdirstr_DIRST(logdir);
condirstr_DIRST(condir);
COMMENT returns a string for the name
corresponding to directory# ;
WHILE true DO
BEGIN "getname"
PRINT("Type the name of the program: ");
IF EQU (upper(NAME _ INTTY),"EXEC") THEN
BEGIN
name_"<SYSTEM>EXEC.SAV";
DONE "getname";
END;
IF name = "?" THEN
BEGIN
givehelp;
CONTINUE "getname";
END;
name_name&!SKIP!;
COMMENT put the break char back on;
DEFINE flag = <'100000000000>,
jfnjfn = <'100000101>;
IF (tempChan_GTJFNL(name,flag,jfnjfn,NULL,
"SUBSYS",NULL,"SAV",NULL,NULL)) = -1
THEN
IF (tempChan_GTJFNL(name,flag,
jfnjfn,NULL,condirstr,NULL,
"SAV",NULL,NULL)) = -1 THEN
IF (tempChan_GTJFNL(name,flag,
jfnjfn,NULL,logdirstr,NULL,
"SAV",NULL,NULL)) = -1 THEN
BEGIN
PRINT(" ?",crlf);
CONTINUE "getname";
END;
COMMENT try each default and if not found
then try next until none are found then
print ? and try again;
name _ JFNS(tempChan, 0);
COMMENT gets name of file on chan--0
means in normal format;
CFILE(tempChan);
COMMENT channel not opened but does
need to be released;
DONE "getname";
END;
In this case, we did not want to open a channel at all since we will not
be either reading or writing the .SAV file. At the end of the above
code, the complete filename is stored in STRING name. We might wish to
run the program with the RUNPRG routine. GTJFN and GTJFNL are often
used for the purpose of establishing filenames even though they are not
to be opened at the moment. However, the Sail channel does need to be
released afterwards.
Some of the other JSYS's which have been implemented in the runtime
package were used in this program: GJINF, DIRST, and JFNS. JFNS in
particular is very useful. It returns a string which is the name of the
file open on the channel. You might need this name to record or to
print on the terminal or because you will be outputting to a new version
of the input file which you can't do unless you know its name.
These and a number of other routines are covered in Section 12 of the
Sail manual. You should probably glance through and see what is there.
Many of these commands correspond directly to utility JSYS's available
in TENEX and will be difficult to use if you are not familiar with the
JSYS's and the JSYS manual.
5.4 Input from a File
In this section, we will assume that you have a file opened for reading
on some channel and are ready to input. Also that you have
appropriately established the end-of-file and break character variables
to be used by the input routines and the break table if needed.
Another function which can be used in conjunction with the various input
functions is SETPL:
SETPL (chan#, @line#, @page#, @sos#)
This allows you to set up the three reference integer variables line#,
page#, and sos# to be associated with the channel so that any input
function on the channel will update their values. The line# variable is
incremented each time a '12 (lf) is input and the page# variable is
incremented (and line# reset to 0) each time a '14 (formfeed) is input.
The last SOS line number input (if any) will be in the sos# variable.
The SETPL should be given before the inputting begins.
The major input function for text is INPUT.
"result" _ INPUT(chan#, table#);
where you give as arguments the channel number and the break table
number; and the resulting input string is returned. This is very
similar to SCAN.
To input one line at a time from a file (where infile is the channel
number and infileEof is the end-of-file variable):
SETBREAK(readLine_GETBREAK,lf,NULL,"ina");
DO
BEGIN
STRING line;
line_INPUT(infile,readLine);
...<process the line>...
END
UNTIL infileEof;
If the INPUT function sets the eof variable to TRUE then either the end-
of-file was encountered or there was a read error of some sort.
If the INPUT terminated because a break character was read then the
break character will be in the brchar variable. If brchar=0 then you
have to look at the eof variable also to determine what happened: If
eof=TRUE then that was what terminated the INPUT but if eof=FALSE and
brchar=0 then the INPUT was terminated by reaching the maximum count per
input that was specified for the channel.
If you are inputting numbers from the channel then
realVar _ REALIN(chan#)
integerVar _ INTIN(chan#)
which are like REALSCAN and INTSCAN can be used. The brchar established
for the channel will be used rather than needing to give it as an
argument as in the REALSCAN and INTSCAN.
INPUT is designed for files of text. Several other input functions are
available for other sorts of files.
Number _ WORDIN(chan#)
will read in a 36-bit word from a binary format file. For details see
the manual.
ARRYIN(chan#, @loc, count)
is used for filling arrays with data from binary format files. Count is
the number of 36-bit words to be read in from the file. They are placed
in consecutive locations starting with the location specified by loc,
e.g.,
INTEGER ARRAY numbs [1:max];
ARRYIN(dataFile,numbs[1],max);
ARRYIN can only be used for INTEGER and REAL arrays (not STRING arrays).
5.4.1 Additional TENEX Sail Input Routines
Two extra input routines which are quite fast have been added to TENEX
Sail to utilize the available input JSYS's.
char _ CHARIN (chan#)
inputs a single character which can be assigned to an integer variable.
If the file is at the end then CHARIN returns 0.
"result" _
SINI (chan#, maxlength, break-character)
does a very fast input of a string which is terminated by either reading
maxlength characters or encountering the break-character. Note that the
break-character here is not a reference integer where the break
character is to be returned; rather it actually is the break character
to be used like the "break-characters" established in a break table
except that only one character can be specified. If the SINI terminated
for reaching maxlength then !SKIP! = -1 else !SKIP! will contain the
break character.
TENEX Sail also offers random I/O which is not available in TOPS-10
Sail. A file bytepointer is maintained for each file and is initialized
to point at the beginning of the file which is byte 0. It subsequently
moves through the file always pointing to the character where the next
read or write will begin. In fact the same file may be read and written
at the same time (assuming it has been opened in the appropriate way).
If the pointer could only move in this way then only sequential I/O
would be available. However, you can reset the pointer to any random
position in the file and begin the read/write at that point which is
called random I/O.
charptr _ RCHPTR (chan#)
returns the current position of the character pointer. This is given as
an integer representing the number of characters (bytes) from the start
of the file which is byte 0. You can reset the pointer by
SCHPTR (chan#, newptr)
If newptr is given as -1 then the pointer will be set to the end-of-
file.
There are many uses for random I/O. For example, you can store the help
text for a program in a separate file and keep track of the bytepointer
to the start of each individual message. Then when you want to print
out one of the messages, you can set the file pointer to the start of
the appropriate message and print it out.
RWDPTR AND SWDPTR are also available for random I/O with words (36-bit
bytes) as the primary unit rather than characters (7-bit bytes).
5.5 Output to a File
The CPRINT function is used for outputting to text files.
CPRINT (chan#, arg1, arg2, ...., argN)
CPRINT is just like PRINT except that the channel must be given as the
first argument.
FOR i_1 STEP 1 UNTIL maxWorkers DO
CPRINT(outfile, name[i], " ",
salary[i],crlf);
Each subsequent argument is converted to a string if necessary and
printed out to the channel.
WORDOUT(chan#, number)
writes a single 36-bit word to the channel.
ARRYOUT(chan#, @loc, count)
writes out an array by outputting count number of consecutive words
starting at location loc.
REAL ARRAY results [1:max];
.
.
ARRYOUT(resultFile,results[1],max);
TENEX Sail also has the routine:
CHAROUT(chan#, char)
which outputs a single character to the channel.
The OUT function is generally obsolete now that CPRINT is available.
SECTION 6
Records
Records are the newest data structure in Sail. They take us beyond the
basic part of the language, but we describe them here in the hope that
they will be very useful to users of the language. Sail records are
similar to those in ALGOL W (see Appendix A for the differences). Some
other languages that contain record-like structures are SIMULA and
PASCAL.
Records can be extremely useful in setting up complicated data
structures. They allow the Sail programmer: 1) a means of program
controlled storage allocation, and 2) a simple method of referring to
bundles of information. (Location(x) and memory[x], which are not
discussed here and should be thought of as liberation from Sail, allow
one to deal with addresses of things.)
6.1 Declaring and Creating Records
A record is rather like an array that can have objects of different
syntactic types. Usually the record represents different kinds of
information about one object. For example, we can have a class of
records called person that contains records with information about
people for an accounting program. Thus, we might want to keep: the
person's name, address, account number, monetary balance. We could
declare a record class thus:
RECORD!CLASS person (STRING name, address;
INTEGER account;
REAL balance)
This occurs at declaration level, and the identifier person is available
within the current block -- just like any other identifier.
RECORD!CLASS declarations do not actually reserve any storage space.
Instead they define a pattern or template for the class, showing what
fields the pattern has. In the above, name, address, account and
balance are all fields of the RECORD!CLASS person.
To create a record (e.g., when you get the data on an actual person) you
need to call the NEW!RECORD procedure, which takes as its argument the
RECORD!CLASS. Thus,
rp _ NEW!RECORD (person);
creates a person, with all fields initially 0 (or NULL for strings,
etc). Records are created dynamically by the program and are garbage
collected when there is no longer a way to access them.
When a record is created, NEW!RECORD returns a pointer to the new
record. This pointer is typically stored in a RECORD!POINTER.
RECORD!POINTERs are variables which must be declared. The RECORD!POINTER
rp was used above. There is a very important distinction to be made
between a RECORD!POINTER and a RECORD. A RECORD is a block of variables
called fields, and a RECORD!POINTER is an entity that points to some
RECORD (hence can be thought of as the "name" or "address" of a RECORD).
A RECORD has fields, but a RECORD!POINTER does not, although its
associated RECORD may have fields. The following is a complete program
that declares a RECORD!CLASS, declares a RECORD!POINTER, and creates a
record in the RECORD!CLASS with the pointer to the new record stored in
the RECORD!POINTER.
BEGIN
RECORD!CLASS person (STRING name,address;
INTEGER account;
REAL balance);
RECORD!POINTER (person) rp;
COMMENT program starts here.;
rp _ NEW!RECORD (person);
END;
RECORD!POINTERs are usually associated with particular record class(es).
Notice that in the above program the declaration of RECORD!POINTER
mentions the class person:
RECORD!POINTER (person) rp;
This means that the compiler will do type checking and make sure that
only pointers to records of class person will be stored into rp. A
RECORD!POINTER can be of several classes, as in:
RECORD!POINTER (person, university) rp;
assuming that we had a RECORD!CLASS university.
RECORD!POINTERs can be of any class if we say:
RECORD!POINTER (ANY!CLASS) rp;
but declaring the class(es) of record pointers gives compilation time
checking of record class agreement. This becomes an advantage when you
have several classes, since the compiler will complain about many of the
simple mistakes you can make by mis-assigning record pointers.
6.2 Accessing Fields of Records
The fields of records can be read/written just like the elements of
arrays. Developing the above program a bit more, suppose we have
created a new record of class person, and stored the pointer to that
record in rp. Then, we can give the "person" a name, address, etc.,
with the following statements.
person:name[rp] _ "John Doe";
person:address[rp] _ "101 East Lansing Street";
person:account[rp] _ 14;
person:balance[rp] _ 3000.87;
and we could write these fields out with the statement:
PRINT ("Name is ", person:name[rp], crlf,
"Address is ", person:address[rp], crlf,
"Account is ", person:account[rp], crlf,
"Balance is ", person:balance[rp], crlf);
The syntax for fields has the following features:
1) The fields are available within the lexical scope where
the RECORD!CLASS was declared, and follow ALGOL block structure.
2) The fields in different classes may have the same name,
e.g., parent:name and child:name.
3) The syntax is rather like that for arrays -- using
brackets to surround the record pointer in the same way brackets
are used for the array index.
4) The fields can be read or written into, also like array
locations.
5) It is necessary to write class:field[pointer] -- i.e., you
have to include the name of the class (here person) with a ":"
before the name of the field.
6.3 Linking Records Together
Notice, in the above example, that as we create the persons, we have to
store the pointers to the records somewhere or else they will become
"missing persons". One way to do this would be to use an array of
record pointers, allocating as many pointers as we expect to have
people. If the number of people is not known in advance then the more
customary approach is to link the records together, which is done by
using additional fields in the records.
Suppose we upgrade the above example to the following:
RECORD!CLASS person (STRING name, address;
INTEGER account;
REAL balance;
RECORD!POINTER(ANY!CLASS) next);
Notice now that there is a RECORD!POINTER field in the template. This
may be used to keep a pointer to the next person. The header to the
entire list of persons will be kept in a single RECORD!POINTER.
Thus, the following program would create persons dynamically and put
them into a "linked list" with the newest at the head of the list. This
technique allows you to write programs that are not restricted to some
fixed maximum number of persons, but instead allocate the memory space
necessary for a new person when you need it.
BEGIN
RECORD!CLASS person (STRING name, address;
INTEGER account; REAL balance;
RECORD!POINTER(ANY!CLASS) next);
RECORD!POINTER (ANY!CLASS) header;
WHILE TRUE DO
BEGIN
STRING s;
RECORD!POINTER (ANY!CLASS) temp;
PRINT("Name of next person, CR if done:");
IF NOT LENGTH(s _ INCHWL) THEN DONE;
COMMENT put new person at head of list;
temp _ NEW!RECORD(person);
COMMENT make a new record;
person:next[temp] _ header;
COMMENT the old head becomes the second;
header _ temp;
COMMENT the new record becomes the head;
COMMENT now fill information fields;
person:name[temp] _ s;
COMMENT now we can fill address, account,
balance if we want...;
END;
END;
A very powerful feature of record structures is the ability to have
different sets of pointers. For example, there might be both forward
and backward links (in the above, we used a forward link). Structures
such as binary trees, sparse matrices, deques, priority queues, and so
on are natural applications of records, but it will take a little study
of the structures in order to understand how to build them, and what
they are good for.
Be warned about the difference between records, record pointers, record
classes, and the fields of records: they are all distinct things, and
you can get in trouble if you forget it. Perhaps a simple example will
show you what is meant:
BEGIN
RECORD!CLASS pair (INTEGER i, j);
RECORD!POINTER (pair) a, b, c, d;
a _ NEW!RECORD (pair);
pair:i [a] _ 1;
pair:j [a] _ 2;
d _ a;
b _ NEW!RECORD (pair);
pair:i [b] _ 1;
pair:j [b] _ 2;
c _ NEW!RECORD (pair);
pair:i [c] _ 1;
pair:j [c] _ 3;
IF a = b THEN PRINT( " A = B " );
pair:j [d] _ 3;
IF a = c THEN PRINT( " A = C " );
IF c = d THEN PRINT( " C = D " );
IF a = d THEN PRINT( " A = D " );
PRINT( " (A I:", pair:i [a], ", J:",
pair:j [a], ")" );
PRINT( " (B I:", pair:i [b], ", J:",
pair:j [b], ")" );
PRINT( " (C I:", pair:i [c], ", J:",
pair:j [c], ")" );
PRINT( " (D I:", pair:i [d], ", J:",
pair:j [d], ")" );
END;
will print:
A = D (A I:1, J:3) (B I:1, J:2)
(C I:1, J:3) (D I:1, J:3)
Note that two RECORD!POINTERs are only equal if they point to the same
record (regardless of whether the fields of the records that they point
to are equal). At the end of executing the previous example, there are
3 distinct records, one pointed to by RECORD!POINTER b, one pointed to
by RECORD!POINTER c, and one pointed to by RECORD!POINTERs a and d.
When the line that reads: pair:j [d] 3; is executed, the j-field of
the record pointed at by RECORD!POINTER d is changed to 3, not the j-
field of d (RECORD!POINTERs have no fields). Since that is the same
record as the one pointed to by RECORD!POINTER a, when we print
pair:j [a], we get the value 3, not 2.
Records can also help your programs to be more readable, by using a
record as a means of returning a collection of values from a procedure
(no Sail procedure can return more than one value). If you wish to
return a RECORD!POINTER, then the procedure declaration must indicate
this as an additional type-qualifier on the procedure declaration, for
example:
RECORD!POINTER (person) PROCEDURE maxBalance;
BEGIN
RECORD!POINTER (person) tempHeader,
currentMaxPerson;
REAL currentMax;
tempHeader _ header;
currentMax _ person:balance [tempHeader];
currentMaxPerson _ tempHeader;
WHILE tempHeader _ person:next [tempHeader] DO
IF person:balance [tempHeader] > currentMax THEN
BEGIN
currentMax _ person:balance [tempheader];
currentMaxPerson _ tempHeader;
END;
RETURN(currentMaxPerson);
END;
This procedure goes through the linked list of records and finds the
person with the highest balance. It then returns a record pointer to
the record of that person. Thus, through the single RETURN statement
allowed, you get both the name of the person and the balance.
RECORD!POINTERs can also be used as arguments to procedures; they are by
default VALUE parameters when used. Consider the following quite
complicated example:
RECORD!CLASS pnt (REAL x,y,z);
RECORD!POINTER (pnt) PROCEDURE midpoint
(RECORD!POINTER (pnt) a,b);
BEGIN
RECORD!POINTER (pnt) retval;
retval _ NEW!RECORD (pnt);
pnt:x [retval] _ (pnt:x [a] + pnt:x [b]) / 2;
pnt:y [retval] _ (pnt:y [a] + pnt:y [b]) / 2;
pnt:z [retval] _ (pnt:z [a] + pnt:z [b]) / 2;
RETURN( retval );
END;
...
p _ midpoint( q, r );
...
While this procedure may appear a bit clumsy, it makes it easy to talk
about such things as pnts later, using simply a record pointer to
represent each pnt. Another common method for "returning" more than one
thing from a procedure is to use REFERENCE parameters, as in the
following example:
PROCEDURE midpoint (REFERENCE REAL rx,ry,rz;
REAL ax,ay,az,bx,by,bz);
BEGIN
rx _ (ax + bx) / 2;
ry _ (ay + by) / 2;
rz _ (az + bz) / 2;
END;
...
MIDPOINT( px, py, pz, qx, qy, qz, rx, ry, rz, );
...
Here the code for the procedure looks quite simple, but there are so
many arguments to it that you can easily get lost in the main code.
Much of the confusion comes about because procedures simply cannot
return more than one value, and the record structure allows you to
return the name of a bundle of information.
SECTION 7
Conditional Compilation
Conditional compilation is available so that the same source file can be
used to compile slightly different versions of the program for different
purposes. Conditional compilation is handled by the scanner in a way
similar to the handling of macros. The text of the source file is
manipulated before it is compiled. The format is
IFCR boolean THENC code ELSEC code ENDC
This construction is not a statement or an expression. It is not
followed by a semi-colon but just appears at any point in your program.
The ELSEC is optional. The ENDC must be included to mark the end but no
begin is used. The code which follows the THENC (and ELSEC if used) can
be any valid Sail syntax or fragment of syntax. As with macros, the
scanner is simply manipulating text and does not check that the text is
valid syntax.
The boolean must be one which has a value at compile time. This means
it cannot be any value computed by your program. Usually, the boolean
will be DEFINE'd by a macro. For example:
DEFINE smallVersion = <TRUE>;
. . .
IFCR smallVersion THENC max _ 10*total;
ELSEC max _ 100*total; ENDC
. . .
where every difference in the program between the small and large
versions is handled with a similar IFCR...THENC...ENDC construction.
For this construction, the scanner checks the value of the boolean; and
if it is TRUE, the text following THENC is inserted in the source being
sent to the inner compiler--otherwise the text is simply thrown away and
the code following the ELSEC (if any) is used. Here the code used for
the above will be max 10*total;, and if you edit the program and
instead
DEFINE smallVersion = <FALSE>;
the result will be max 100*total;.
The code following the THENC and ELSEC will be taken exactly as is so
that statements which need final semi-colons should have them. The
above format of statement ; ELSEC is correct.
If this feature were not available then the following would have to be
used:
BOOLEAN smallVersion;
smallVersion _ TRUE;
...
IF smallVersion THEN max _ 10*total
ELSE max _ 100*total;
...
so that a conditional would actually appear in your program.
Some typical uses of conditional compilation are:
1) Insertion of debugging or testing code for experimental
versions of a program and then removal for the final version.
Note that the code will still be in your source file and can be
turned back on (recompilation is of course required) at any time
that you again need to debug. When you do not turn on debugging,
the code completely disappears from your program but not from your
source file.
2) Maintainence of a single source file for a program which
is to be exported to several sites with minor differences.
DEFINE sumex = <TRUE>,
isi = <FALSE>;
...
IFCR sumex THENC docdir _ "DOC"; ENDC
IFCR isi THENC docdir _ "DOCUMENTATION"; ENDC
...
where only one site is set to TRUE for each compilation.
3) "Commenting out" large portions of the program. Sometimes
you need to temporarily remove a large section of the program.
You can insert the word COMMENT preceding every statement to be
removed but this is a lot of extra work. A better way is to use:
IFCR FALSE THENC
...
<all the code to be "removed">
...
ENDC
SECTION 8
Systems Building in Sail
Many new Sail users will find their first Sail project involved with
adding to an already-existing system of large size that has been worked
on by many people over a period of years. These systems include the
speech recognition programs at Carnegie-Mellon, the hand-eye software at
Stanford AI, large CAI systems at Stanford IMSSS, and various medical
programs at SUMEX and NIH. This section does not attempt to deal with
these individual systems in any detail, but instead tries to describe
some of the features of Sail that are frequently used in systems
building, and are common to all these systems. The exact documentation
of these features is given elsewhere; this is intended to be a guide to
those features.
The Sail language itself is procedural, and this means that programs can
be broken down into components that represent conceptual blocks
comprising the system. The block structuring of ALGOL also allows for
local variables, which should be used wherever possible. The first rule
of systems building is: break the system down into modules corresponding
to conceptual units. This is partly a question of the design of the
system--indeed, some systems by their very design philosophy will defy
modularity to a certain extent. As a theory about the representation of
knowledge in computer programs, this may be necessary; but programs
should, most people would agree, be as modular "as possible".
Once modularized, most of the parts of the system can be separate files,
and we shall show below how this is possible. Of course, the modules
will have to communicate together, and may have to share common data
(global arrays, flags, etc.). Also, since the modules will be sharing
the same core image (or job), there are certain Sail and timesharing
system resources that will have to be commonly shared. The rules to
follow here are:
1) Make the various modules of a system as independent and
separate as design philosophy allows.
2) Code them in a similar "style" for readability among
programmers.
3) Make the points of interface and communication between the
programs as clear and explicit as possible.
4) Clear up questions about which modules govern system
resources (Sail and the timesharing system), such as files,
terminals, etc. so that they are not competing with each other
for these resources.
8.1 The Load Module
The most effective separation of modules is achieved through separate
compilations. This is done by having two or more separate source files,
which are compiled separately and then loaded together. Consider the
following design for an AI system QWERT. QWERT will contain three
modules: a scanner module XSCAN, a parser module PARSE, and a main
program QWERT. We give below the three files for QWERT.
First, the QWERT program, contained in file QWERT.SAI:
BEGIN"QWERT"
EXTERNAL STRING PROCEDURE XSCAN(STRING S);
REQUIRE "XSCAN" LOAD!MODULE;
EXTERNAL STRING PROCEDURE PARSE(STRING S);
REQUIRE "PARSE" LOAD!MODULE;
WHILE TRUE DO
BEGIN
PRINT("*",PARSE(XSCAN(INCHWL)));
END;
END"QWERT";
Notice two features about QWERT.SAI:
1) There are two EXTERNAL declarations. An EXTERNAL
declaration says that some identifier (procedure or variable) is
to be used in the current program, but it will be found somewhere
else. The EXTERNAL causes the compiler to permit the use of the
identifier, as requested, and then to issue a request for a global
fixup to the LOADER program.
2) Secondly, there are two REQUIRE ... LOAD!MODULE statements
in the program. A load module is a file that is loaded by the
loader, presumably the output of some compiler or assembler.
These REQUIRE statements cause the compiler to request that the
loader load modules XSCAN.REL and PARSE.REL when we load MAIN.REL.
This will hopefully satisfy the global requests: i.e., the loader
will find the two procedures in the two mentioned files, and link
the programs all together into one "system".
Second, the code for modules XSCAN and PARSE:
ENTRY XSCAN;
BEGIN
INTERNAL STRING PROCEDURE XSCAN(STRING S);
BEGIN
..... code for XSCAN ....
RETURN (resulting string);
END;
END;
and now PARSE.SAI:
ENTRY PARSE;
BEGIN
INTERNAL STRING PROCEDURE PARSE(STRING S);
BEGIN
....code for PARSE....
RETURN(resulting string);
END;
END;
Both of these modules begin with an ENTRY declaration. This has the
effect of saying that the program to be compiled is not a "main" program
(there can be only one main program in a core image), and also says that
PARSE is to be found as an INTERNAL within this file. The list of
tokens after the ENTRY construction is mainly used for LIBRARYs rather
than LOAD!MODULEs, and we do not discuss the difference here, since
LIBRARYs are not much used in system building due to the difficulty in
constructing them.
A few important remarks about LOAD!MODULES:
1) The use of LOAD!MODULES depends on the loaders (LOADER and
LINK10) that are available on the system. In particular, there is
no way to associate an external symbol with a particular
LOAD!MODULE.
2) The names of identifiers are limited to six characters,
and the character set permissible is slightly less than might be
expected. The symbol "!" is, for example, mapped into "." in
global symbol requests.
3) The "semantics" of a symbol (e.g., whether the symbol
names an integer or a string procedure) is in no way checked
during loading.
Initialization routines in a LOAD!MODULE can be performed automatically
by including a REQUIRE ... INITIALIZATION procedure. For example,
suppose that INIT is a simple parameterless, valueless procedure that
does the initialization for a given module:
SIMPLE PROCEDURE INIT;
BEGIN
...initialization code...
END;
REQUIRE INIT INITIALIZATION;
will run INIT prior to the outer block of the main program. It is
difficult to control the order in which initializations are done, so it
is advisable to make initializations that do not conflict with each
other.
8.2 Source Files
In addition to the ability to compile programs separately, Sail allows a
single compilation to be made by inserting entire files into the scan
stream during compilation. The construction:
REQUIRE "FILENM.SAI" SOURCE!FILE;
inserts the text of file FILENM.SAI into the stream of characters being
scanned--having the same effect that would be obtained by copying all of
FILENM.SAI into the current file.
One pedestrian use of this is to divide a file into smaller files for
easier editing. While this can be convenient, it can also unnecessarily
fragment a program into little pieces without purpose. There are,
however, some real purposes of the SOURCE!FILE construction in systems
building. One use is to include code that is needed in several places
into one file, then "REQUIRE" that file in the places that it is needed.
Macros are a common example. For example, a file of global definitions
might be put into a file MACROS.SAI:
REQUIRE "<><>" DELIMITERS;
DEFINE ARRAYSIZE=<100>,
NUMBEROFSTUDENTS=<200>,
FILENAME=<"FIL.DAT">;
A common use of source files is to provide a SOURCE!FILE that links to a
load module: the source file contains the EXTERNAL declarations for the
procedures (and data) to be found in a module, and also requires that
file as a load module. Such a file is sometimes called a "header" file.
Consider the file XSCAN.HDR for the above XSCAN load module:
EXTERNAL STRING PROCEDURE XSCAN(STRING S);
REQUIRE "XSCAN" LOAD!MODULE;
The use of header files ameliorates some of the deficiencies of the
loader: the header file can, for example, be carefully designed to
contain the declarations of the EXTERNAL procedures and data, reducing
the likelihood of an error caused by misdeclaration. Remember, if you
declare:
INTERNAL STRING PROCEDURE XSCAN(STRING S);
BEGIN ..... END;
in one file and
EXTERNAL INTEGER PROCEDURE XSCAN(STRING S);
in another, the correct linkages will not be made, and the program may
crash quite strangely.
8.3 Macros and Conditional Compilation
Macros, especially those contained in global macro files, can assist in
system building. Parameters, file names, and the like can be
"macroized".
Conditional compilation also assists in systems building by allowing the
same source files to do different things depending on the setting of
switches. For example, suppose a file FILE is being used for both a
debugging and a "production" version of the same module. We can include
a definition of the form:
DEFINE DEBUGGING=<FALSE>;
COMMENT false if not debugging;
and then use it
IFCR DEBUGGING THENC
PRINT("Now at PROC PR ",I," ",J,CRLF); ENDC
(See Section 7 on conditional compilation for more details.) In the
above example, the code will define the switch to be FALSE, and the
PRINT statement will not be compiled, since it is in the FALSE
consequent of an IFCR ...THENC. In using switches, it is common that
there is a default setting that one generally wants. The following
conditional compilation checks to see if DEBUGGING has already been
defined (or declared), and if not, defines it to be false. Thus the
default is established.
IFCR NOT DECLARATION(DEBUGGING) THENC
DEFINE DEBUGGING=<FALSE>; ENDC
Then, another file, inserted prior to this one, sets the compilation
mode to get the DEBUGGING version if needed.
Macros and conditional compilation also allow a number of complex
compile-time operations, such as building tables. These are beyond our
discussion here, except to note that complex macros are often used
(overused?) in systems building with Sail.
APPENDIX A
Sail and ALGOL W Comparison
There are many variants of ALGOL. This Appendix will cover only the
main differences between Sail and ALGOL W.
The following are differences in terminology:
ALGOL W Sail
:= Assignment operator _
** Exponentiation operator ^
= Not equal or NEQ
<= Less than or equal or LEQ
>= Greater than or equal or GEQ
REM Division remainder operator MOD
END. Program end END
RESULT Procedure parameter type REFERENCE
str(i|j) Substrings str[i+1 for j]
STRING(i) s String declarations STRING s
arry(1) Array subscript arry[1]
arry (1::10) Array declaration arry[1:10]
The following are not available in Sail:
ODD ROUND ENTIER
TRUNCATE Truncation is default conversion.
WRITE, WRITEON Use PRINT statement for both.
READON Use INPUT, REALIN, INTIN.
Block expressions
Procedure expressions
Use RETURN statement
in procedures.
Other differences are:
1) Iteration variables and Labels must be declared in Sail, but the
iteration variable is more general since it can be tested after the
loop.
2) STEP UNTIL cannot be left out in the FOR-statement in Sail.
3) Sail strings do not have length declared and are not filled out with
blanks.
4) EQU not = is used for Sail strings.
5) The first case in the CASE statement in Sail is 0 rather than 1 as
in ALGOL W. (Note that Sail also has CASE expressions.)
6) <, =, and > will not work for alphabetizing Sail strings. They are
arithmetic operators only.
7) ALGOL W parameter passing conventions vary slightly from Sail. The
ALGOL W RESULT parameter is close to the Sail REFERENCE parameter,
but there is a difference, in that the Sail REFERENCE parameter
passes an address, whereas the ALGOL W RESULT parameter creates a
copy of the value during the execution of the procedure.
8) A FORWARD PROCEDURE declaration is needed in Sail if another
procedure calls an as yet undeclared procedure. Sail is a one-pass
compiler.
9) Sail uses SIMPLE PROCEDURE, PROCEDURE, and RECURSIVE PROCEDURE where
ALGOL has only PROCEDURE (equivalent to Sail's RECURSIVE PROCEDURE).
10) Scalar variables in Sail are not cleared on block entry in non-
RECURSIVE procedures.
11) Outer block arrays in Sail must have constant bounds.
12) The RECORD syntax is considerably different. See below.
Sail features (or improvements) not in ALGOL W:
a) Better string facilities with more flexibility.
b) More complete RECORD structures.
c) Use of DONE and CONTINUE statements for easier control of loops.
d) Assignment expressions for more compact code.
e) Complete I/O facilities.
f) Easy interface to machine instructions.
The following compares Sail and ALGOL W records in several important
aspects.
Aspect Sail ALGOL W
-------------------------------------------------
Declaration RECORD!CLASS RECORD
of class
Declaration of RECORD!POINTER REFERENCE
record pointer
Pointers can be pointers must
several classes or be to one
ANY!CLASS class
Empty record Reserved word Reserved word
NULL!RECORD NULL
Fields of record
Use brackets Use parens
Must use Don't use
CLASS: before the class name
field name before field
REFERENCES
1. Reiser, John (ed.), Sail, Memo AIM-289, Stanford Artificial
Intelligence Laboratory, August 1976.
2. Frost, Martin, UUO Manual (Second Edition), Stanford Artificial
Intelligence Laboratory Operating Note 55.4, July 1975.
3. Harvey, Brian (M. Frost, ed.), Monitor Command Manual, Stanford
Artificial Intelligence Laboratory Operating Note 54.5, January
1976.
4. Feldman, J.A., Low, J.A., Swinehart, D.C., Taylor, R.H., "Recent
Developments in Sail", AFIPS FJCC 1972, p. 1193-1202.
5. DECSYTEM10 Assembly Language Handbook (3rd Edition), Digital
Equipment Corporation, Maynard, Massachusetts, 1973.
6. DECSYSTEM10 Users Handbook (2nd Edition), Digital Equipment
Corporation, Maynard, Massachusetts, 1972.
7. Myer, Theodore and Barnaby, John, TENEX EXECUTIVE Manual (revised by
William Plummer), Bolt, Beranek and Newman, Cambridge,
Massachusetts, 1973.
8. JSYS Manual (2nd Revision), Bolt, Beranek and Newman, Cambridge,
Massachusetts, 1973.
INDEX
!SKIP! 46
& 18
ALGOL 74
allocation 23
Altmode 46
ANY!CLASS 62
Arguments 31
array 6, 10
arrays 23, 25, 59
ARRCLR 24
ARRYIN 52, 59
ARRYOUT 52, 60
assignment expressions 16
assignment operator 16
Assignment statements 7
BEGIN 3
binary format files 59
bits 56
block 3
block name 21
blocks 15, 21
BOOLEAN 4
boolean expression 12
break character 42, 46, 58
break tables 42
built-in procedures 9, 30
CASE expressions 18
CFILE 52
channel 52, 58
channel number 47
CHARIN 59
CHAROUT 60
Commenting 68
compile time 23
compound statement 15
Conditional compilation 67
conditional expressions 17
conditionals 11
connected directory 56
constants 5
CONTINUE 28
control statements 11
controlling terminal 46, 56
CPRINT 60
crlf 47
CVD 9
data 59
deallocation 23
debugging 68
Declarations 3
DEFINE 38
delimiters 38
directory devices 47, 50
DIRST 57
DO...UNTIL 26
DONE 28
dynamic 23
ELSEC 67
emulator 1
END 3
end-of-file 58, 60
ENDC 67
ENTER 50
ENTRY 70
eol 47
EQU 13, 18
equality 13
error handling 54
expression 7, 10
expressions 15
EXTERNAL 46, 70
FALSE 4
fields 61
file bytepointer 59
file name 50
files 47
flag specification 56
FOR statement 24
format 6
FORWARD 33
free format 6
garbage collections 19
GETBREAK 42
GETCHAN 48
GJINF 57
global 22
GTJFN 55
GTJFNL 55
half word format 56
I/O 46
identifiers 5
IF..THEN statement 11
IFCR 67
INCHWL 9, 46
indefinite iteration 26
INDEXFILE 54
initialization 23
Initialization routines 71
INPUT 42, 58
input/output 46, 47
INTEGER 4
INTIN 59
INTSCAN 45
INTTY 46
iteration variable 24
JFNS 57
LENGTH 18
line terminators 44
line-editing 46
LOAD!MODULE 70
LOADER 70
local 22
login directory 56
LOOKUP 50
LOP 18
lowercase 6
macro expansion 38
macros 38
modularity 69
MTAPE 52
multi-dimensioned arrays 7
multiple file designators 54
nested 14, 22
NEW!RECORD 61
NUL character 20
NULL 5
octal representation 56
OPEN 48
OPENFILE 52
order of evaluation 16
outer block 3
OWN 23
PA1050 1
parallel arrays 7
parameter list 31
parameterized procedure 31
parenthesized 17
predeclared identifiers 5
PRINT 9
PRINT statement 39
procedure 30
procedure body 33
procedure call 30
random I/O 59
RCHPTR 60
read error 58
REAL 4
REALIN 59
REALSCAN 45
RECORD!CLASS 61
RECORD!POINTER 61
Records 61
RECURSIVE 23, 33
REFERENCE 37
reinitialization 23
RELEASE 49
RENAME 50
reserved words 3, 5
RETURN statement 33
runtime 23
scalar variables 23
SCAN 42
scanner 38
SCHPTR 60
scope of the variable 22
search path 56
semi-colon 12
sequential I/O 59
SETBREAK 42
SETFORMAT 21
SETINPUT 52
SETPL 58
SETPRINT 47
side-effect 36
SIMPLE 33
SINI 59
SOS line numbers 43
SOURCE!FILE 72
SQRT 9
Statements 3
statements 7
Storage allocation 23
STRING 4
string descriptor 19
STRING operators 18
string space 19
strings 42
subscripts 7
substrings 19
tables 21
Teletype I/O 46
TENEX Sail 1
THENC 67
TOPS-10 Sail 1
TRUE 4
TTY: 56
type conversion 9
typed procedures 35
untyped procedures 35
uppercase 6, 32, 43, 46
USETI 52
USETO 52
VALUE 37
variables 5, 22
WHILE...DO 26
WORDIN 52, 59
WORDOUT 52, 60