Trailing-Edge
-
PDP-10 Archives
-
SRI_NIC_PERM_FS_1_19910112
-
c/old/nmit/overview-tech.doc
There are no other files named overview-tech.doc in the archive.
NMT V7 C Project
Technical Overview Manual
June 1985
This manual is a technical overview of the
New Mexico Tech Version 7 C project. It
describes the technical documentation for the
project and explains the project's coding
standards.
Operating System: TOPS-20
Software: V7 C v1.0
pppppppppppppppp
pp ppp
pp cccccccccc pp
pp cc cc pp
pp cc 7777 pp
pp cc 7 pp
pp cc 7 pp
pp cc 7 pp
pp cc cc pp
pp cccccccccc pp
pp ppp
ppppppppppppppppp
pp
pp
pp
pp
pp
pp
pp
pp
pp
First Printing, 1984
Copyright (C) New Mexico Tech 1984, 1985. All rights reserved.
The information in this manual is intended for internal use only
by New Mexico Tech (NMT). In no way should the information in
this manual be construed as a commitment by NMT. NMT assumes no
responsibility for any errors that may appear in this document.
DEC, DECSYSTEM-20, TOPS-20, MACRO-20, RUNOFF, VAX, and VMS are
trademarks of Digital Equipment Corporation.
PREFACE
Document Objective
This document is intended as an overview of the V7 C system
from a technical (i.e., designer's or programmer's) point of
view.
Intended Audience
This manual assumes a relatively high profiency in the C
language, and a general understanding of compiler theory, on
the part of the reader.
Other Documents
The V7 C Compiler Technical Manual and V7 C Runtime Library
-- - -------- --------- ------ -- - ------- -------
Technical Manual cover the internals of the V7 C compiler
--------- ------
and runtime library, respectively, in much greater detail.
Conventions
This manual uses the following conventions:
"<RET>" represents a carriage return/line feed (ASCII
13/10 decimal).
"^" (circumflex) represents a control character. For
example, ^C represents pressing the keyboard "CONTROL"
or "CTRL" key and "C" key simultaneously. Depending on
the context, "^" can also mean the C language's
"exclusive or" operator.
Underlining indicates information that must be typed by
-----------
the user in examples, and is also used for titles of
publications.
CHAPTER 1
TECHNICAL DOCUMENTATION STANDARDS
This chapter explains the standards to be used when writing
technical documents for the V7 C project. The chapter sections
cover file-level standards and the various parts of a document.
1.1 FILE-LEVEL CONVENTIONS
All of the technical documents should reside in the same
directory. Randomly searching a variety of directories,
each of which has a non-zero probability of containing the
desired file, is tedious and no fun.
Documents should be written in the form of RUNOFF source
files with the ".RND" extension (RUNOFF will create ".DOC"
output files automatically from ".RND" input files).
Furthermore, the file name (the name itself, independent of
the directory and extension) should be of the form "X-TECH",
where the "X" part denotes the subject matter of the
document. For example, the file I am now typing is called
"OVERVIEW-TECH.RND", from which RUNOFF will create
"OVERVIEW-TECH.DOC", which is the file you are now reading.
While the file-level standards are somewhat restrictive,
they have the advantage of allowing one to find all of the
technical documentation for the project by searching one
directory with the wild-card specification "*-TECH.RND" or
"*-TECH.DOC".
1.2 THE DOCUMENT HEADER
The header of a technical document consists of a copy of the
technical document header skeleton (see section 2.1),
modified as needed for manual-specific information.
Creating a document header is primarily drudgework. Use of
the header skeleton eliminates most of this, leaving only
TECHNICAL DOCUMENTATION STANDARDS Page 1-2
the pleasant job of writing the body. Furthermore, it
ensures that our documents contain copyright statements and
that Digital is given credit for its trademarks in any
document in which we use them.
Any document which has more than one or two chapters should
have a table of contents.
1.3 THE DOCUMENT BODY
Using RUNOFF's ".CHAPTER" and ".HEADER LEVEL n" commands
seems to section documents reasonably. Beyond this it is
probably not necessary to set standards, as long as the
document is intelligently divided into its parts, has a
meaningful indentation scheme, and contains sufficient white
space to prevent the appearance that one is reading a novel.
Technical documentation has a high information density, but
a variety of techniques can be used to help prevent
fatiguing the reader unnecessarily. Short paragraphs and a
high proportion of white space are both desirable, as is the
use of non-verbal material (charts, graphs, and other
artwork).
1.4 THE DOCUMENT TRAILER
Whether a given document needs appendices, an index, or a
glossary is left to the author's discretion. It is better
to add these appendages to a document that does not really
need them then it is to leave them off of one that does.
1.5 RUNOFF AND PRINTING
Use the "/CRETURN" switch to RUNOFF to force it to emit
carriage return/line feed pairs for blank lines, instead of
just line feeds. This makes it much easier to look at the
generated documents with an editor.
The header skeleton is set up to provide no left margin in
the document. With no further effort (additional RUNOFF
commands or switches), a document of this form can be
printed on the printer with "/FORMS:SMWH16". If the
"/RIGHT:10" switch is given to RUNOFF, the document can be
printed with "/FORMS:SMAL16" on the printer, and with
"/FORMS:TITAN0" on the Diablo. DO NOT give the "/DIABLO"
switch to RUNOFF; it will cause filling and justification
to work oddly (the reason for this is, at present, unknown).
TECHNICAL DOCUMENTATION STANDARDS Page 1-3
1.6 AN EXAMPLE
By now, an example is probably in order. The file you are
reading is (or should be) "PS:<C>OVERVIEW-TECH.DOC", which
is the result of applying RUNOFF to
"PS:<C>OVERVIEW-TECH.RND". You can refer to both of these
(and particularly the ".RND" file) as examples of technical
documentation for the V7 C project.
CHAPTER 2
AVAILABLE TECHNICAL DOCUMENTATION
This chapter explains what technical documentation is available
for the V7 C project, and where to find it.
The separate sections of this chapter cover the following
documents: the technical manual header skeleton, the technical
overview for the project, and the technical manuals for the
compiler and the runtime library. All of the documents are in
the TCC directory PS:<C> and have a ".RND" extension; the
filename for each one is given in the appropriate section.
2.1 THE TECHNICAL MANUAL HEADER SKELETON
The filename for the header skeleton is "SKELETON-TECH".
The skeleton contains basic title, copyright, and preface
pages, which are modified (correct title inserted, abstract
added, etc.) to suit the needs of the document being
written. Lines in the skeleton which begin with ">>>" mark
spots where changes and additions need to be done.
2.2 THE TECHNICAL OVERVIEW
The filename for the technical overview is "OVERVIEW-TECH".
Since you are already reading the overview, further
description is probably unnecessary.
2.3 THE COMPILER TECHNICAL MANUAL
The filename for the compiler technical manual is "CC-TECH".
This manual first describes the compiler in terms of its
phases (the preprocessor/scanner, the parser, and the code
generator), and explains how the phases communicate. At the
AVAILABLE TECHNICAL DOCUMENTATION Page 2-2
next level, the compiler is broken down into modules (which
correspond to source files); the manual gives a functional
specification of the arguments and behavior of each function
in the module, and also describes any global data structures
present in the module. Finally, any non-obvious pieces of
code or algorithmic oddities within functions are explained.
Appendices in the compiler manual include a data-access
table, which for each global data structure tells which
functions can read it and which can modify it, a table which
for each function gives the global data structures it
accesses and the functions it calls, and descriptions of all
of the compiler's "#include" files. Another appendix
documents the procedure for building the compiler from
sources and/or relocatable modules.
2.4 THE RUNTIME LIBRARY TECHNICAL MANUAL
The filename for the runtime library technical manual is
"CL-TECH".
>>>
>>> Tozzz should write this.
>>>
CHAPTER 3
CODING STANDARDS
This chapter documents coding standards for the source files of
the V7 C project. Standards for source languages, source file
formats, and coding formats are presented.
3.1 SOURCE LANGUAGES
The language of choice for the entire project is C, as
defined in Kernighan and Ritchie. MACRO-20 is the only
other allowable language. If you can make a choice (that
is, if both C and MACRO can reasonably be used as the source
language for a given module), always choose C.
The entire system is presently written in C, with the
exception of five small modules in the runtime library.
3.2 THE FILE LEVEL
There is a skeleton source file for the project in
"PS:<C>CCSKEL.C". Use it; a copy of it makes a perfect
start to a good module.
No source text of any kind may extend beyond column 80 of
the source file. This avoids wraparound on terminals and
the associated readability problems. The last section of
this chapter describes standards for breaking long source
statements onto multiple lines.
Source files (modules) are divided into six sections by
banners. A banner is a comment in the left margin which
looks like this:
/*==================================================
* there are fifty "="s here ^
*
*/
CODING STANDARDS Page 3-2
At the beginning of the file is a file banner, which has the
module's filename in its first line and a short (one- or
two-line) description of its contents starting in the second
line. Following the description is one blank line, a line
containing a copyright notice, another blank line, and then
a line which gives the name of the module's author(s).
Following the file banner are any "#include"s needed for
this module. This makes it much easier to determine what is
included by a module, since all of the "#include"s are in
the same place.
Next is the overview banner, which has the word
"O V E R V I E W" in its first line, and a description of
the module's contents in the rest. Within this banner is
(at least) a list of all of the functions and non-auto data
structures in the module, in the order in which they appear
in the file, and a short description with each one telling
what it does or is. If it seems a good idea to put anything
else in the overview (such as some general comments about
algorithms or a description of information flow among the
various functions in the module) then by all means do so.
Do not worry about making the overview banner too large;
The C compiler can strip comments out of source code very
rapidly.
After the overview comes a banner which says
"E X T E R N A L D A T A", followed by declarations for
any external data structures.
Next is a banner which says
"E X T E R N A L F U N C T I O N S", and then the
declarations for any external functions.
After the external declarations comes a banner which says
"P R I V A T E D A T A". This in turn is followed by
declarations for any non-auto data structures which are
local to the module (i.e., are declared "static ...").
Following the private data is a "P U B L I C D A T A"
banner, after which are declarations for any non-auto data
structures to which access by functions in other modules is
allowed (by declaring the data structures "extern ...").
CAUTION
Make a strong attempt to keep global data of this
sort to a minimum. A proliferation of
globally-accessible data structures in a software
system of this size will be a curse of truly
magnificent proportions on debuggers and
maintainers. Try to practice information hiding
as much as possible.
CODING STANDARDS Page 3-3
Next in the source file come the functions, divided as are
the data structures into a private section (in which
functions are declared "static ...") and a public section
(functions which are available to other modules by declaring
them "extern ..."). The banners for these sections of the
source file are "P R I V A T E F U N C T I O N S" and
"P U B L I C F U N C T I O N S", respectively.
All of the banners should be present in all of the modules,
even when the corresponding section of the module is empty,
in which case the comment
/*
* empty
*/
should appear alone after the banner.
3.3 COMMENTS
Comments, as differentiated from banners, can have three
forms:
1. A comment can appear in block form at the left margin,
preceded by a blank line:
/*
* the temporary label number
*/
static int tl_num;
/*
* the number of important characters of a
* symbolic name in MACRO-20
*/
#define MACSYM_LEN 6
2. A comment can also appear in block form either preceded
by a blank line and indented to the same position as the
surrounding source statements, or preceded by a line
containing only a "{" and indented one tab stop to the
right of the "{" (which will give it the same
indentation as the statements which follow it):
CODING STANDARDS Page 3-4
if (dbg_com)
{
/*
* emit debugging comments if desired; first
* do the tabs and generating function name
*/
fprintf(out, "%s;[%s]", tab_buf, fn_name);
/*
* output the comment if there is one
*/
if (comment != NULL)
fprintf(out, " %s", comment);
}
3. Finally, an in-line comment can be used as a note
pertaining to a single source line. In this case, the
comment must begin after column 40 (at or after the
ninth tab stop), either on the same line as the
statement being commented (in which case there must be
at least two spaces between the statement and the
comment), or on an otherwise blank line immediately
before the statement being commented. Remember that no
text of any kind can extend past column 80 of the source
file, so the maximum length of an in-line comment is
relatively limited. Here are some example in-line
comments:
{
/* print the label */
sprintf(buf, "$%05d", label_number);
return(0); /* all done */
}
3.4 PREPROCESSOR COMMANDS
The current preprocessor is relatively complete; it can
handle the "#define" (with and without arguments), "#undef",
"#ifdef", "#ifndef", #if" (restricted), "#else", "#endif",
"#line", and "#include" (with <> or "" around the filename)
statements. There are no standards for the use of
preprocessor commands, but the last section of this chapter
should be referred to for rules on breaking a long "#define"
over multiple source lines.
CODING STANDARDS Page 3-5
3.5 DECLARATIONS
Each declaration of a data object must be accompanied by a
comment which tells what the data object is.
3.6 THE FUNCTION LEVEL
Each function declaration is preceded by a function banner,
which looks much like the other banners:
/*==================================================
* type fun_name()
* function description
*
* arg1: description
* arg2: description
* . .
* . .
* . .
* argn: description
*
* return value description
*
*/
The function banner gives the type and name of the function,
the names of its arguments, a description of the function's
behavior, a description of each of the arguments (including
whether or not it is modified by the function), and a
description of the function's return value.
All functions must be typed, that is the type of the
function must be given in the function's declaration.
Functions which do not return values must be declared "void"
(the lowercase is important), and must always return
nothing.
All functions must explicitly return. In other words, no
function may exit by simply "falling off the end" of its
code.
3.7 THE STATEMENT LEVEL
Indentation is defined in terms of tab stops, which are
every four columns in the file (EMACS C Mode provides these
tab stops automatically).
The standards for statements are:
CODING STANDARDS Page 3-6
1. No more than one statement may appear on any given
source line.
2. The declaration for a function begins in the first
column.
3. When a simple statement "B" is subordinate to another
statement "A" (the body statement of an "if", for
example), "B" is indented one tab stop to the right of
"A"s indentation:
if (the_var < MAXIMUM)
the_var += INCREMENT;
Do not ever put two statements on a single line, like
this:
if (the_var < MAXIMUM) the_var++;
or this:
while (*++buf_ptr != '\0') ;
4. When a compound statement "B" (i.e., one requiring curly
braces "{" and "}" around it) is subordinate to another
statement "A", the opening and closing curly braces of
"B" appear alone on their lines, indented to the same
tab stop as "A" is. The statements within the curly
braces (the actual code of "B") are then indented one
tab stop further to the right:
if (the_var > MINIMUM)
{
the_var -= DECREMENT;
do_it_to(the_var);
}
5. The only exception to the previous rule is in the case
of the "do while" statement; in this statement, the
body is always surrounded by curly braces (even when
they are unnecessary), and the closing curly brace "}"
is followed by a space and then the "while" part of the
statement. The token sequence "} while" can then be
used by human readers to distinguish the word "while" in
a "do while" statement from the word "while" in a
"while" statement (which appears without a "}" before it
on the same line):
do
{
the_var -= DECREMENT;
do_it_to(the_var);
} while (the_var > MINIMUM);
CODING STANDARDS Page 3-7
6. The statement(s) associated with a switch case are
indented one tab stop to the right of the indentation of
the word "case". The "default" case of a "switch"
statement, if there is one, is always placed last in the
list of cases for that switch. The statement block for
each case must end with a "break" statement, unless it
is intended that the code fall through to the next case.
If the code does fall through, a comment must be
inserted where the "break" statement would have been,
stating that the fall through is on purpose. The last
case of a switch must end with a "break" statement:
switch (curr_char)
{
case '+':
switch_value = ON;
break;
case '-':
switch_value = OFF;
break;
default:
fprintf(stderr, "?what?_\n");
break;
}
3.8 THE TOKEN LEVEL
The standards in this section cover the lowest level of C
source code, including constants and symbolic names.
1. NEVER EVER use magic numbers ("cookies"). Nobody but
you (and you only for a short time) will know why that
particular funny-looking integer appears in that
particular context. Use "#define" to make the magic
number a mnemonic symbol (comment the "#define", too),
then use the symbol in place of the number.
2. Avoid short symbolic names (like "i") which have little
meaning. Use longer, mnemonic names; the underbar
character "_" is legal in symbolic names and is helpful
as a separator.
3. Remember that global symbolic names must be unique in
their first six characters in order to be unique in the
generated code. Remember also that a "_" in V7 C turns
into a "." in the generated code.
CODING STANDARDS Page 3-8
3.9 BREAKING LONG LINES
As mentioned in section 3.2, source lines cannot exceed 80
characters in length. Thus, some statements will
undoubtedly be too long to fit on one line. This section
describes standards for splitting various kinds of source
statements across line boundaries.
Where there is more than one rule for breaking a given kind
of statement, the statement must be broken using the first
rule before an attempt is made to break it with the second,
and so on. If all of the applicable rules have been used as
much as possible, and the expression still does not fit
within the 80-column constraint, rewrite the code; either
the flow of control requires too much indentation, or the
expression is just too complex. Either way, the code in its
original form is probably too complicated for human readers.
3.9.1 Breaking A Long "#define"
One feature of the C preprocessor is that the "#define"
command can be used to define a macro with a multiple-line
substitution text. Once the preprocessor begins collecting
the substitution text for a given "#define" command, it does
not stop doing so until it finds a newline in the source
which is not immediately preceded by a "\" (backslash).
When a multiple-line macro is being defined in a V7 C source
program, these backslashes must be in the same column in all
lines of the macro's substitution text. Furthermore, there
must be at least two spaces between the substitution text on
a given line and the backslash on that line:
#define FIND_SYM(sym, name, c0, c1) \
if (h0_tab[c0] > 0 && h1_tab[c1] > 0) \
sym = get_sym(name); \
else \
sym = NULL; \
3.9.2 Breaking "for" Headers
"For" statements frequently have headers that will not fit
on one line. The technique of choice here is to break the
header at one of the semicolons which separate the three
parts of the header. Leave the semicolon on the first line,
and start the next part of the header at the column to the
right of the left parenthesis which begins the header:
CODING STANDARDS Page 3-9
for (regis_idx = 0; regis_idx < NUM_REGS;
regis_idx++)
r_owner[regis_idx] = NO_OWNER;
If this fails to solve the problem, recursively apply the
statement-splitting rules to further break whichever of the
expressions is (are) too long.
3.9.3 Breaking Function Calls
When it is necessary to break a line within the argument
list of a function call, the break should be made
immediately after the comma following one of the function
arguments. The next function argument (on the following
line) should begin at the column immediately to the right of
the "(" (left parenthesis) which began the argument list.
If the argument list is sufficiently long that it must be
broken onto still more lines, the break should again be made
immediately after the comma following some function
argument, and the next argument should begin at the same
column as does the first argument on the preceding line.
Here are some examples:
/*
* here is an argument list broken over one line
*/
gg_gmstmt(store_op, val_reg, buf, sym_reg,
fn_name, symptr->sname);
/*
* here is an argument list broken over
* more than one line
*/
pp_err(TRUE, ppcol, 4, "More than ", YYLM_TEXT,
" characters of actual arguments to ",
macro_name);
If some argument is so long that it cannot fit on a line all
by itself, apply the expression-breaking rules recursively
to split it onto multiple lines.
3.9.4 Breaking Query Expressions
For the purposes of this discussion, a query expression will
be defined as consisting of a test expression and left and
right colon-expressions. When a query expression does not
fit on one line, first try moving the ":" and the right
colon-expression onto a new line, aligning the ":"
vertically with the "?":
CODING STANDARDS Page 3-10
switch_value = (ch == '+') ? SWITCH_ON
: SWITCH_OFF;
If this is not sufficient, then move the "?" and the left
colon-expression to a new line, aligning the "?" with the
test expression, and realigning the ":" with the new
position of the "?":
sym_ptr = (ho_tab[c0] > 0 && h1_tab[c1] > 0)
? get_sym(ident_buf)
: NULL;
If the expression still runs over the 80-column limit, then
recursively break whichever of the three expressions is
(are) too long.
3.9.5 Breaking Other Expressions
The basic rule for breaking expressions other than queries
is to split the expression before the lowest-precedence
binary operator available (excepting the assignment
operators; avoid splitting expressions at any of them).
Leave the left subexpression on the first line. Move the
operator and right subexpression to the new line and align
the operator vertically with the left subexpression, then
move then left subexpression (on the first line) right until
it is aligned with the right subexpression. This technique
results in expressions that look like the following:
yylval = (yylval << 4)
+ ((ppch & 0377) - 'A' + 10);
while ( nodptr != NULL
&& nodptr->nop == STATEMENT)
{
g_any_stmt(nodptr->right);
nodptr = nodptr->left;
}
If the expression needs to be split again, just apply the
expression-breaking rules recursively:
if ( (word_count = ( strlen(nodptr->nsconst)
/ NBYTES
+ 1))
> MAX_WORDS)
word_count = MAX_WORDS;
CODING STANDARDS Page 3-11
do
{
current_ch = (*targ_ptr++ = *src_ptr++);
ch_cnt++;
} while ( ( current_ch != '\0'
&& current_ch != '.')
&& ch_cnt < NAME_LEN);
Breaking expressions at unary operators is singularly
unhelpful; do not bother.