Trailing-Edge
-
PDP-10 Archives
-
SRI_NIC_PERM_FS_1_19910112
-
c/kcc/user.doc
There are 9 other files named user.doc in the archive. Click here to see a list.
KCC USER DOCUMENTATION
<1 About KCC>
KCC is a compiler for the C language on the PDP-10. It was
originally begun by Kok Chen of Stanford University around 1981 (hence
the name "KCC"), improved by a number of people at Stanford and
Columbia (primarily David Eppstein, KRONJ), and then adopted by Ken
Harrenstien (with help from Ian Macky) of SRI International as the
starting point for a complete and supported implementation of C.
KCC implements C as described by the following references:
ANSI: Draft Proposed ANSI C (as of 7-Dec-1988)
H&S: Harbison and Steele, "C: A Reference Manual",
HS1: (1st edition) Prentice-Hall, 1984, ISBN 0-13-110008-4
HS2: (2nd edition) Prentice-Hall, 1987, ISBN 0-13-109802-0
K&R: Kernighan and Ritchie, "The C Programming Language",
KR1: (1st edition) Prentice-Hall, 1978, ISBN 0-13-110163-3
KR2: (2nd edition) Prentice-Hall, 1988, ISBN 0-13-110362-8
Currently KCC is only supported for TOPS-10 and TOPS-20, although there
is no reason it cannot be used for other PDP-10 systems or processors.
The remaining discussion for the most part assumes you are on a TOPS-20
system.
<1 Using KCC>
C source files should have the extension ".C", such as PROG.C
and SUBS.C. To build a C program, whether from one or more source
files ("modules"), there are three things that must happen:
First, all modules have to be compiled with KCC to produce .REL
files (e.g. PROG.REL and SUBS.REL);
Second, the LINK loader must be invoked to load all of
the necessary modules into an executable core image;
Third, this image must be saved on disk as an .EXE file.
Every complete C program must contain one and only one module
that defines the function "main". This function is where control begins
when the program is executed, and unless otherwise specified the .EXE
file will be named after the module that "main" appears in.
You can make a C program either by using the TOPS-20 EXEC
commands COMPILE, LOAD, and SAVE, or by invoking KCC directly. For
example, suppose "main" is defined in PROG.C, and the file SUBS.C
contains auxiliary subroutines. Then,
To make: EXEC command Direct KCC invocation
------- ------------ ---------------------
PROG.EXE from .C files: @LOAD PROG,SUBS @CC -q PROG SUBS
@SAVE PROG
Just the .REL files: @COMPILE PROG,SUBS @CC -q -c PROG SUBS
PROG.EXE from .RELs: Same as 1st @CC PROG.REL SUBS.REL
One advantage of using the EXEC commands is that they will
only compile those files which appear to require it, i.e. modules for
which the .C file is more recent than the .REL file. The EXEC can also
translate TOPS-20 directory names into a format that the DEC loader will
understand, so that commands like @COMPILE <FOO>PROG are possible.
However, KCC will do a similar form of conditional compilation
if the -q switch is set, for those modules specified without a .C
extension. (This may become the default someday.) More commonly, the
EXEC at your site may not have been modified to know about KCC, or you
may wish to specify certain options to the compilation, or you may
just come from a UNIX background and feel more used to the direct
invocation method.
<1 Direct Invocation - Compiler switches>
The KCC compiler switches are intended to resemble those of the
UN*X "cc" command as closely as possible. If you are familiar with these,
you can probably use KCC instinctively. The command line is broken up into
argument strings each separated by a space (NOT by a comma). If an argument
string starts with a "-", it is a switch, otherwise it is a filename.
Case is significant in switches!
Normally, if a file exists for a given filename, that file is
always compiled regardless of what it contains or what the name looks
like. The exception is files with a ".REL" extension, which are never
compiled but are just passed on to the linking loader. If a filename does
not exist and appears to have no extension, ".C" is added. This feature
is primarily useful with the -q switch as it requests conditional
compilation. Case is not significant in filenames.
If none of -c, -E, or -S are given as switches, KCC will invoke
LINK after compilation and an executable file (*.EXE) will be produced.
The ordering of switches and filenames, in general, does not
matter; all switches are processed before compiling starts. However,
note that filenames and libraries will be compiled and/or loaded in
the order given, and -I paths will also be scanned in the order given.
It is possible to specify KCC switches while giving a
COMPILE-class command to the EXEC, if your EXEC recognizes the switch
/LANGUAGE-SWITCHES. The argument to this EXEC switch should be a
double-quoted string which starts with a space. For example:
@compile foo /laNGUAGE-SWITCHES:" -m -d=sym"
------------------------------------------------------------------------
The following are the available compiler switches, in alphabetical
order. They are the same as those used by UN*X "cc", except for the
ones marked with a "*", which are mainly of interest to KCC
implementors.
* -A<file> Specify a file name for the assembler header file (included
at the start of all assembler output).
-c Compile and assemble, but don't link (produce *.REL).
-C Retain comments in preprocessor (only useful with -E).
* -d Debugging output. Same as -d=all. Generates many debug files.
* -d=<fs> Debugging fine-tuning.
<fs> are flag names of particular kinds of debug output files.
The names can be abbreviated. Prefixing the name with a
'+' turns it on; '-' turns it off. All flags are initially
assumed off. Current flags are:
parse Parse tree output (*.DEB)
pho Peep-Hole Optimizer output (*.PHO) - HUGE!!!
sym Symbol table output (*.CYM)
all All of the above
E.g. "-d=parse+sym" == "-d=all-pho"
-D<ident> Define following ident to "1" or string after '='.
E.g. "-DMAXSIZE=25". Several of these may be specified.
-E Run source only through preprocessor, to standard output.
* -H<path> Specify a non-standard location for <>-enclosed #include files.
-i Loader: same as -i=extend.
* -i=<fs> Loader options.
<fs> are flags selecting particular options, as follows:
extend Load code for extended addressing (multi-section).
psect Load code using DATA and CODE PSECTs.
-I<path> Supply a search path for doublequoted #include files.
Several of these may be specified, and will be searched in
that order.
* -L<path> Loader: Specify a non-standard location for library files.
* -L=<str> Loader: Specify an arbitrary string argument to the loader.
Note that the syntax does not permit spaces to be included.
Several of these may be given.
-lnam Loader: Specify library filename for loader. The "nam"
argument is used to construct the filename LIBnam.REL in the
library directory path and this is searched when encountered
in the specifications.
* -m Use MACRO rather than FAIL. Semi-obsolete, same as -x=macro.
-O Optimize (no-op, defaults on). Same as -O=all.
* -O=<fs> Optimization fine-tuning. Mainly for debugging.
<fs> are flag names of particular kinds of optimizations.
The names can be abbreviated. Prefixing the name with a '+' turns
it on; '-' turns it off. All flags are initially assumed off,
so to ask for no optimization use -O= (same as -O=-all).
Current flags are:
parse Parse tree optimization
gen Code generator optimizations
object Object code (peephole) optimizations
all All of the above
E.g. "-O=parse+gen" == "-O=all-object"
-o=<file> Specify output filename for the executable image.
For UN*X-compatibility kicks, "-o <file>" also works.
* -P=<fs> Portability level specifications. Several switches may be given in
a format similar to that for -d and -O. The <fs> flags
specify the C implementation level that the compiler should use:
base Base level C -- most portable and restricted
carm H&S CARM level -- full implementation
ansi H&S CARM plus some ANSI -- working compromise
stdc ANSI C level (per latest Draft Proposed Standard)
Only one of the previous 4 is allowed, plus an optional:
kcc Permit KCC-specific extensions to the selected level.
The default is "stdc+kcc" if -P is not given. -P alone is
interpreted as "base".
* -q Conditional compilation. All file specs without an extension will
only be compiled if the .C file is more recent than the .REL file.
For example, "cc -q foo bar.c arf.rel"
compiles FOO.C if it is more recent than FOO.REL,
always compiles BAR.C, and never compiles ARF.
-S Don't assemble (produce *.FAI or *.MAC, plus *.PRE)
-U<ident> Undefine following identifier. All -U switches are processed
before any -D switches. See "predefined macros".
* -v Verbose - same as "-v=all".
* -v=<fs> Verbosity switches, similar to -d and -O.
fundef - print function names as they are defined (not yet).
stats - show statistics for run
load - show command string given to loader (if any)
-w Don't type out warnings.
* -x=<fs> Cross-compile switches. Several switches may be given in
a format similar to that for -d and -O. The <fs> flags
specify an aspect of the "target machine" that the
code should be compiled for (case is significant!):
Target System: tops20, tops10, waits, tenex, its
Target CPU: ka, ki, ks, kl0, klx
Target Assembler: fail, macro, midas
Target char size: ch7 (to compile with 7-bit chars)
e.g. "-x=ka+tenex". See "Cross-compiling".
------------------------------------------------------------------------
NOTE: <path> syntax
The -I, -H, and -L switches all take a "path" as argument.
This is interpreted as specifying both a prefix and a postfix string
which are used to sandwich a partial filename from some other source
(#include "xxx", #include <xxx>, and -lxxx respectively). The two
strings are separated by the character '+' (this is site dependent
however). Thus, for example:
Specification => Prefix Postfix Sample with "xxx"
-I+[SYS,NEW] "" "[SYS,NEW]" xxx[SYS,NEW]
-HNEWC: "NEWC:" "" NEWC:xxx
-LPS:<C>LIB+.REL "PS:<C>LIB" ".REL" PS:<C>LIBxxx.REL
NOTE: Obsolete features
The following switches and interpretations are obsolete. They will
likely be flushed altogether, but are documented here for historical reasons:
* -n same as -O= (no optimization)
* -s same as -d=sym (output *.CYM symbol table dump)
It used to be a feature that "simple" switches, which did not
take any arguments, could be lumped together into a single switch
string. For example, "cc -mS test" is the same as the more standard
"cc -m -S test". However, use of this feature is discouraged; the
potential confusion and inconsistency don't seem to be worth it.
NOTE: Switch Portability
The following lists the switches implemented by other systems
but not by KCC. This information seems useful and this is a convenient
place to put it. Other-system switches that KCC implements are not included.
Switches which mean one thing to KCC but another thing to other systems
are included. Currently only 4.2BSD switches are listed.
-g Output additional symtab info for dbx(1), pass -lg to ld(1)
-go Ditto for sdb(1).
-p Output profiling code for prof(1).
-pg Ditto but for gprof(1).
-R Passed on to as(1) to make initialized vars shared and read-only.
-Bpath Use substitute compiler pass programs specified by <path>.
-t[p012] Use only the pass programs from -B designated by -t.
ld(1) switches:
A, D, d, e, l, M, N, n, o, r, S, s, T, t, u, X, x, y, z
<1 User Program - Command line interpretation>
The C runtime startup interprets the command line to all
C programs in the same consistent fashion, and supports:
(1) argument string passing @PROG arg1 arg2 arg3 ...
(2) indirect files @PROG @fileof.args ...
(3) wild-card filenames @PROG file.*
(4) I/O redirection @PROG arg < infile > outfile
(5) pipes @PROG arg | PROG2 args
(6) background processing @PROG arg &
(7) Argument quoting @PROG "arg.*" "ar|g" "a>r<g"
There is also provision for suppressing the default command line
interpretation altogether.
(1) Command line arguments:
Command line arguments can be passed to the main() function
from the EXEC or monitor in the UN*X fashion. That is, main() is
given two arguments, the first of which is an argument count and
the second a pointer to an array of char pointers, each of which
constitutes an argument. It is conventional to declare the
parameters to main() in this way:
main(argc, argv)
int argc;
char **argv;
For example, if you have a C program saved as PROG.EXE, then invoking
PROG with the command:
@PROG one two
will set argc to 3, and the strings that argv points to will
be "PROG", "one", and "two". Note that arguments are separated by
blanks (whitespace) and not by commas!
(2) Indirect files:
If an argument begins with the character '@' it is interpreted
as an indirect file specification, and the rest of the argument is
assumed to be a filename; the contents of that file are parsed and
used as arguments. For example, if the file "two.txt" contained the
text "a b c" then the command:
@PROG one @two.txt three
would invoke PROG with the arguments "PROG", "one", "a", "b", "c", "three".
The format of the indirect file is the same as that for TOPS-20 indirect
files in general.
(3) Wild-card filename arguments
On TOPS-20, if any of the arguments contain the wild-card
characters '%' or '*' (without quoting -- see (7) below) then those
arguments are treated as wild-card filenames and are expanded into a
list of the filenames which match the pattern. For example,
@PROG foo.*
could expand into:
@PROG PS:<YOU>FOO.C.13 PS:<YOU>FOO.REL.5
(4) I/O redirection:
I/O redirection of stdin and stdout is also supported.
Thus:
1. @PROG <foo ; will take all stdin input from the file "foo".
2. @PROG >bar ; will send all stdout output to a new file "bar".
3. @PROG >>log ; will append all stdout output to the old file "log".
These can be combined:
@PROG <foo >bar ; does both 1 and 2. (from "foo", to "bar")
However,
@PROG <foo>bar ; interprets "<foo>bar" as a single argument string,
; because it looks like a <directory>filename.
(5) Pipes:
On TOPS-20 systems which implement the PIP: device (developed at
Stanford), pipes can also be supported, so that a command such as:
@PROG | BAZ
causes the stdout of program PROG to be redirected to the stdin of program
BAZ.
(6) Background processing:
Again, on TOPS-20 systems where the EXEC has been suitably
modified, a command line ending in an ampersand ('&') will cause the
program to be run in the background, while the user goes on to do other
things:
@PROG one two&
(7) Argument quoting:
Sometimes it is desirable to give an argument which contains
one of the special characters used by the above features. To quote
an argument string, you can surround it with either single or double
quote marks, as in "foo&bar" or 'foo&bar'. Anywhere on the line,
a backslash ('\') will quote the next character. A control-V (^V) will
also quote the next character, but will be retained in the argument
string; this is useful for filenames that have unusual characters,
because the ^V quoting must be passed along in system calls that refer
to such files.
(8) Suppressing the command line interpretation:
In certain unusual circumstances it may be necessary to suppress
the default command line interpretation, so that the user program itself
can handle it in a different way. For information on how to do this,
see the #include file <urtsud.h>.
<1 KCC Error Messages>
During normal compilation, KCC will merely announce the name of
each module as it is compiled; the assembler and linker make similar
announcements if they are invoked. However, if an error is encountered
a message will be printed in the following format:
"file.c", line 123: Undefined symbol: "bytsiz"
(getfil+6, p.3 l.45): et bytesize */ buf->st_blksize = ((512*bytsiz/
Each error message uses three lines: a Un*x-format error line, a
context line, and a blank separator line.
The first line has the same format as Un*x compiler errors. The
filename is given in quotes, followed by the line number from the start
of the file, and then the error message itself. If the error is
actually a "warning", the text "Warning - " will be at the start of the
message.
The second line is KCC-specific and is intended to provide more
context. The parenthesis-enclosed information at the beginning of the
line provides another way of locating the offending code; it gives the
name of the last function definition (plus an offset if within the
definition), and the page/line numbers within the file. Note that the
line number here is relative to the start of the indicated page, rather
than to the start of the file as for the Un*x-format line number. The
remainder of the context line is the last N characters of input seen by
KCC, with newlines converted to spaces. The very end of the line marks
the most recent character or token read by KCC. Because the compiler
does a lot of "peeking", often this is the next one after the actual
error location.
KCC always parses the entire source file regardless of how many errors
are encountered, in an attempt to report all possible problems. The
downside of this is that it is possible for a single error (such as a
missing '}' at the end of a function) to trigger a large number of
"spurious" following errors.
However, KCC stops producing assembly language output as soon as the
first error is seen, and will invoke neither the assembler nor the
linker at the end of compilation.
"Warning" messages:
Warning messages do not interfere with the rest of compilation;
they are for the user's information only. Normally it is best to fix
the source so that it no longer generates these warning messages, but if
necessary the warnings can be suppressed by using the -w switch.
"Internal error" messages:
If KCC is astonished by something extremely unpleasant, it may
generate an error message that starts with the words "Internal error - ".
These errors are due to some bug inside KCC and should be reported
as soon as possible, preferably with a sample source file that provokes
the error. KCC does attempt to continue.
"Out of memory" errors:
Under normal circumstances the regular-sized KCC has no
problems digesting source files; it can compile itself and the entire C
library, for example. But some programs may happen to be extremely
large, so large that KCC cannot hold all of the macro or symbol
definitions; in that case you will receive an "Out of memory" error
(which is always fatal). If this happens on TOPS-20, you can try using
CCX.EXE instead of CC.EXE; this is the same version of KCC, but built
with -i so that it runs with extended addressing and thus has much more
dynamically allocated memory available.
<1 C as implemented by KCC>
KCC is intended to conform to the description of C as described
in the latest Draft Proposed ANSI Standard (7-Dec-1988), plus
extensions described in Harbison & Steele's "C: A Reference Manual".
The -P (portability) switch controls the exact level at which
KCC attempts to compile a C program. There are four possible levels,
and only one of these may be in effect:
STDC - compiled code must conform to the ANSI C standard (or
the latest Draft Proposed standard). This is now the
default!
ANSI - The old default. Permits many ANSI constructs
to be recognized and compiled. This is basically CARM level
plus any new ANSI features that can be added without
significantly changing the language; thus, it is a
working compromise between CARM and STDC.
Users should be cautious about using ANSI features since
other compilers may not recognize them, or the features may
change before the standard becomes official.
CARM - Disables all ANSI-added features which are not in Harbison
and Steele's CARM book. KCC fully implements this level.
BASE - The most restrictive level. This disables some extensions
so that KCC will complain about some constructs
or usages that are likely to be unimplemented by some
other compilers. Good for ensuring that code is portable.
In addition, there is a "KCC extensions" flag which is independent
of the level; when enabled, this permits a number of KCC-specific extensions
to be recognized regardless of whatever level is in effect.
KCC now uses the STDC level with KCC extensions enabled;
this corresponds to "-P=stdc+kcc".
The next several pages document KCC's implementation of C by
following the general ordering of H&S and pointing out aspects where
KCC differs or describing which of several optional behaviors KCC
implements. Any ANSI features which are implemented are also described.
<2 ANSI Changes>
The major visible changes in KCC due to the new proposed ANSI standard are:
Input:
Trigraphs are recognized. Beware of existing "??" sequences.
Preprocessor:
Directives can have whitespace prior to #.
Formal macro parameters are NOT recognized in string/char literals.
New operators in macro body: # and ##.
New macro recursion rules.
A function-like macro will not be invoked if the next char is not '('.
No pragmas are recognized. They are ignored with a warning.
Constants:
New char escapes \a, \?
New type suffixes (U, F)
New char/string literal prefix (L)
Type qualifiers:
const, volatile
Function prototypes:
This is one of the most important changes.
New linkage defaults (omitted-extern etc)
New initializations:
Unions can now be initialized. A brace-enclosed expression will
initialize the first member of the union.
Automatic structures, unions, and arrays can now be initialized.
Initializer lists for these must use only constant expressions, but
auto structures and unions can be initialized with a single non-constant
expression of the same type.
The mechanism for initializing large auto aggregates is still
primitive, however; it consists of creating a static object of the same
type and copying that into the auto object when the block is entered.
Bear that in mind before trying to initialize large auto arrays.
Internal linkage:
File-scope identifiers with internal linkage (static variables
and functions are mapped into unique internally generated names, thus
normal C identifiers of up to 31 characters are always fully distinct.
New & operands:
The & (address-of) operator can now be applied to array and
function names. There is a difference between the ANSI meaning of
"&array" and the traditional meaning, however. Given "int arr[5];",
Expr Type
"arr" "array of 5 ints"
"arr+1" "pointer to int"
"&arr" Old: (if legal) "pointer to int"
ANSI: "pointer to array of 5 ints"
<2 KCC Lexical Elements> [H&S 2, "Lexical Elements"]
KCC uses the US ASCII character set. There is provision for
using a separate target character set, different from the source set,
but currently the only such is a target set for WAITS ASCII.
KCC has no maximum line length. The context displayed in error
messages ignores line boundaries.
KCC is standard in that nested comments are not supported. If
the sequence "/*" is seen within a comment, a warning message will be
printed just in case the user neglected to terminate the previous
comment.
<2 Identifier names>
KCC adheres to the standard definition of C identifier syntax,
allowing the character "_", the letters A-Z and a-z, and the digits
0-9 as valid identifier characters. Identifiers may have any length,
but only the first 31 characters (case sensitive) are unique during
compilation, which conforms to the ANSI minimum. This applies to all
of the following name spaces (as per H&S 4.2.4):
Macro names
Statement labels
Structure, union, and enum tags
Component (member) names
Ordinary names:
Enum constants and typedef names.
Variables (see discussion of storage classes).
However, the situation is different for symbols with external
linkage, which must be exported to the PDP-10 linker. Such names are
truncated to 6 characters and case is no longer significant. The
character '_' (underscore) is transformed into '.' (period); the PDP-10
software allows the additional symbol characters '$' and '%', but there
is no way to generate these with C unless special provision is made;
see #asm and '`' under "KCC Extensions". See also the discussion of
exported symbols.
<2 Reserved Words>
KCC has a number of additional reserved words depending on
the portability level setting. When KCC extensions are allowed, as
is normally the case, the following keywords exist:
"asm" - used for assembly code inclusion.
"entry" - only in certain special circumstances.
See the discussion of libraries and entry points.
_* - there are a number of keywords which begin with
the character "_". The user should never invent
such symbols, as they are all reserved for C
implementation purposes.
When ANSI or STDC level is in effect, there are three additional
reserved words. All can be considered type modifiers:
"signed" Indicates integer type is signed.
"const" Constant object
"volatile" Volatile object
<2 Constants>
The types "int" and "long" are the same -- one PDP-10 word of
36 bits, with the high bit a sign bit. Thus, the largest positive integer
constant is 0377777777777, or 34,359,738,368.
The type "double" is represented by a PDP-10 hardware format
standard range double precision number (two words). On KA processors
the format is slightly different. The decimal range is from 1.5e-39
to 1.7e38, with eighteen digits of precision.
Character constants have type "int". Multicharacter constants
of up to 4 chars are supported; they are right-justified in the word.
Because characters are 9-bit bytes, numeric escape code values can
range from '\0' to '\777'. Hexadecimal character constants are
permitted.
String constants are stored as 9-bit byte strings, and do not
share storage. That is, two instances of the constant string "foo"
will be stored in two distinct places. String constants are put in the
"pure" segment of a program along with the machine code, but this does
not actually enforce any read-only restrictions unless the user
executes a system call to protect that region.
If the portability level is ANSI or STDC then adjacent string
constants are concatenated into a single string. Thus, "foo" "bar" is
the same as "foobar".
<2 Preprocessor directives> [H&S 3, "The C Preprocessor"]
All standard C preprocessor directives are supported as described in
Harbison and Steele, including the new ANSI directives and operators.
This page specifies how KCC behaves for situations which are
implementation dependent.
Lexical Conventions: [H&S 3.2]
The '#' that starts a preprocessor directive can now be
preceded and followed by whitespace. Formal parameter names are no
longer recognized within character and string constants in macro body
definitions. Comments are treated as whitespace and not passed on to
anything else; however, KCC will print a "Nested comment" warning if it
encounters a comment which contains "/*". This serves both to catch
slightly non-portable usage (see H&S 2.2) and to detect places where
the user may have accidentally omitted a "*/".
Defining Macros: [H&S 3.3]
When defining a macro, formal parameter names are NOT recognized
within string and character constants; use the new '#' operator to stringize
macro parameters. Any comments and whitespace in the macro body
are replaced by a single space. KCC permits an argument token list
(arguments to a macro call) to extend over multiple lines. Arguments
to a call are converted in a fashion similar to that for macro bodies
-- comments and whitespace are replaced by a single space. Newlines
within an argument list are also considered whitespace. However,
string and character constants in arguments are treated as tokens, and
their contents are not scanned for macro names.
Predefined Macros: [H&S 3.3.4]
__LINE__ expands into the current decimal line number. (BSD)
__FILE__ expands into the current source filename. (BSD)
__DATE__ expands into the date of compilation, "Mmm dd yyyy".
__TIME__ expands into the time of compilation, "hh:mm:ss".
The date/time of compilation is cleared at the start of
compilation for each source file, and is set by the first
occurrence of __DATE__ or __TIME__ within that source file.
__STDC__ expands into the ANSI standard level #, currently 1.
__COMPILER_KCC__ expands into a string literal containing KCC
version information.
All macros but the last are specified by the ANSI standard. The first
two (__LINE__ and __FILE__) also exist in BSD Un*x; the next two
(__DATE__ and __TIME__) are also described in H&S. __STDC__ is only
defined when -P=stdc is in effect.
__COMPILER_KCC__ is the only non-standard predefined macro. If it is
defined, it implies that the file <c-env.h> also exists, which contains
standard KCC environment definitions. There are no other predefined
macros.
Undefining and Redefining Macros: [H&S 3.3.5]
It is not an error to redefine an already defined macro, but a
warning message will be output unless the new macro definition is the
same as the old definition; i.e. redundant definitions are allowed.
There is no macro definition stack, i.e. definitions are not
pushed/popped by #define/#undef. Attempting to define a macro named
"defined" will cause an error, since otherwise it would conflict with
the "defined" operator.
Converting Tokens to Strings: [HS2 3.3.8]
As per the ANSI standard, KCC no longer recognizes formal
parameter names within string and character constants. The '#'
operator should be used within macro bodies to convert macro arguments
into strings.
File Inclusion: [H&S 3.4]
Included files may be nested to 10 levels. Macro expansion
is done on the line if the filename does not start with '<' or '"'.
Filenames may contain '>' or '"' characters.
#include <filename> looks only in the standard directory.
#include "filename" looks first in DSK:,
then in the -I paths in order of specification (left to right),
then in the standard directory.
The standard directory for include files is C: on TOPS-10 and TOPS-20,
<KC> on TENEX, and [SYS,KCC] on WAITS, but this is site dependent in
any case.
Conditional Compilation: [H&S 3.5] #if,#else,#endif,#elif,#ifdef,#ifndef
The "defined" operator is recognized only within #if and #elif
expressions. Neither #elif nor "defined" will be recognized unless
the portability level is at least "carm". Within the body of a failing
conditional, only other conditional commands are recognized; all others,
even illegal commands, are ignored.
Explicit Line Numbering: [H&S 3.6] #line
The information from #line will be used in KCC error messages.
Macro expansion is performed on the line. Like all other
preprocessor commands, #line is eliminated and not passed on when
using the -E switch. With regard to "#" alone at the start of a line,
if there is no command name, the line is simply ignored without error
(as per ANSI).
KCC-specific Commands:
#asm, #endasm
These two commands cause the text delimited by them to be
macro-expanded (as for -E) and converted into an "asm()" expression
for direct inclusion in the output assembly language file. This
currently only works inside functions. This feature is very likely to
change, and should only be used where absolutely necessary. Keep the
code simple, as someday KCC may want to parse it.
See "KCC Extensions" for additional details.
<2 Storage classes> [H&S 4.3 "Storage Class Specifiers"]
KCC implements the ANSI standard storage classes of auto, extern, register,
static, and typedef, with the following notes:
REGISTER declarations are currently equivalent to AUTO. KCC does not
assign variables to registers, and optimizations are performed without
using the "hint" given by REGISTER. AUTO variables are almost always
more efficient, and in any case they are easier to implement. This
may someday change.
KCC now uses the ANSI concepts of linkage and definition to deal with
the question of top-level (file scope) definitions versus references.
This is similar to, but not the same as, the "omitted-extern" solution
in H&S sec 4.8 which KCC previously implemented. See the discussion of
exported symbols farther on.
Duplicate Declarations:
As per H&S 4.2.5, KCC permits any number of external
referencing declarations, if the types are the same. An external
reference may be later followed by a defining declaration.
<2 Initializers> [H&S 4.6 "Initializers"]
KCC adheres to ANSI and H&S in all required respects. The
following notes cover points which H&S describes as implementation
dependent:
Optional braces (as per ANSI) are allowed for all non-aggregate
initializers. It is permitted to drop braces from initializer lists
under the rules described in H&S 4.6.8 (HS1 4.6.9), but KCC attempts to
perform extremely stringent checking on the "shape" of initializers,
and will complain about too many or too few braces.
FLOATING-POINT initializers may be of any arithmetic type. KCC performs
compile-time floating-point arithmetic, so initializers for static and
external variables may use any constant arithmetic expression.
POINTER initializers, as described in H&S, must evaluate to an integer or
to an address plus (or minus) an integer constant.
ARRAY initializers are now allowed for automatic arrays, for level
ANSI or STDC.
ENUMERATION initializers may use any integer (as well as enum) expression.
STRUCTURE initializers can initialize bit-fields with any integer expression.
Automatic and register structures can now be initialized.
UNIONS can now be initialized. A brace-enclosed list will initialize
the first member of the union.
For automatic structures or unions, any expression with the type
of that struct/union can also be used as an initializer.
<2 Exported symbols> [H&S 4.8 "External Names"]
Symbols which are exported to the assembler file have special restrictions
imposed by current PDP-10 software, which only recognizes 6-character
symbols from the set A-Z, 0-9, '.', '$', and '%'. In particular, case
is not significant.
Also, there is a distinction between "local" symbols exported only to
the assembler and "global" symbols exported to both the assembler and
the linker. While there is technically no reason that any symbol has
to be given to the assembler if it is not also meant for the linker, in
practice it is convenient for debugging to have some "local" symbol
definitions available so that DDT can access them.
Here is a breakdown of export status by storage class:
typedef = Exports nothing. (Not a real storage class)
auto = Exports nothing. (Local stack variables use an internal offset)
register = Exports nothing. (Same as auto)
static = If not file scope (i.e. is within a block) then nothing exported;
an internally-generated label is used.
If file scope (within no block) then exported to assembler only.
A unique label is made, but no INTERN or ENTRY statement.
extern = May or may not be global, depending on previous declarations.
If not static (internal linkage), then it has external linkage, and is
always exported to both assembler and linker.
External DEFINITION: A label, INTERN, and ENTRY are output.
External REFERENCE: An EXTERN statement is output, but only
if the symbol is actually referenced by the code.
Tentative definitions:
External declarations seen for the first time, with or without
an explicit "extern" storage class, are assumed to be external
TENTATIVE DEFINITIONS. If no explicit reference or definition is seen
by the time the end of the file is reached, a definition with a zero
initializer is generated.
EXTERNAL LINKAGE symbols:
A defined external linkage symbol will have its own label, plus
an INTERN statement telling the assembler that this is an externally
visible symbol, plus an ENTRY statement which allows library routine
search to find this symbol. ENTRY statements will be put into the .PRE
output file rather than the main output file, since the assembler will
need to scan them prior to anything else.
A referenced external linkage symbol causes no output unless
the symbol is actually referenced by the code, in which case an EXTERN
line will be generated in the assembler output for that file. The
reason for the reference count check is that each assembler EXTERN
constitutes a library search request which must be satisfied by a
module with the corresponding symbol declared as an ENTRY. Unless this
is only done for actual references, the many superfluous declarations
found in *.h files will tend to cause many unneeded library modules to
be loaded.
INTERNAL LINKAGE symbols:
Note that global static symbols, which are internally generated,
are passed on to the assembler even though this is not necessary for
linkage purposes. The main reason this is done is to facilitate
debugging with DDT, otherwise it could be difficult to identify static
functions when looking at the machine instructions.
For identifiers with internal linkage, KCC maps each identifier into
a unique 6-character symbol. This mapping is done as follows:
(1) '%' is prefixed to the first 5 chars. If unique, done.
(2) Vowels and '_' are removed from the identifier, and (1) repeated.
(3) If the symbol resulting from (2) is still not unique, digits are added
at the end to fill it out to 6 characters, and then incremented
until a unique symbol is obtained.
Note that a symbol declared static within a given source file will
never be visible from another file that you may link later with it.
For example, a function declared as
static char *function()
{
...
}
will only be visible from other functions within the same source file.
This allows several modules to have functions with the same name, as
long as no two of the functions both have external linkage. It is
STRONGLY recommended for multi-module programs that you declare as many
functions as possible to be "static".
<2 Libraries and Entries>
REL files to be converted by MAKLIB into object libraries must have
any external symbols declared with ENTRY rather than merely INTERNing
them, and this declaration must be at the start of the REL file. In
order to do this, KCC generates a *.PRE "prefix" output file in
addition to the *.FAI or *.MAC output file, and invokes the assembler
in such a way that the PRE file is assembled before the main file.
This file contains ENTRY statements and any other predeclarations that
are needed before the assembler sees the actual code. Normally the
user will never see this file, but if the -S switch is used then it
will be left around as well as the FAI/MAC file. Note that if running
the assembler manually on the FAI/MAC file, you must invoke it with
a command line like this:
[@]FAIL [@]MACRO
[*]FOO=FOO.PRE,FOO.FAI [*]FOO=FOO.PRE,FOO.MAC
COMPATIBILITY INFO:
For compatibility, KCC will continue to recognize an "entry"
keyword for some time to come. The following describes the obsolete
syntax:
To declare an entry, use the "entry" keyword at the start of the source,
before any other declarations:
"entry" ident ["," ident ...] ";"
i.e., the keyword "entry", followed by a list of identifiers separated
by commas, followed by a semicolon. This is passed on essentially
verbatim to the assembler, and has no other affect on compilation. It
should be used at the start of any runtimes or other file intended for
a library, on all variables and functions that should be visible as
entries in the library.
Note that it should still be safe to use "entry" as a non-keyword; if
used other than at the start of the file it will be treated like any
other normal identifier.
To repeat: the "entry" statement is no longer necessary. It should not
be used in new code, and should be removed from old code.
<2 Types> [H&S 5 "Types"]
STORAGE UNITS:
A KCC storage unit (what "sizeof" returns) is a 9-bit byte, and
there are 4 of these in each 36-bit PDP-10 word, ordered left to right
from most significant to least significant.
INTEGERS:
KCC's integer types have the following sizes:
Type Bits "sizeof" value
char 9 1
short 18 2 (PDP-10 halfword)
int 36 4 (PDP-10 word)
long 36 4 (PDP-10 word)
All of these types may be explicitly declared as "signed". Single
variables declared as "char" or "short" are stored right-justified into
a full word; only when packed into an array or structure are they
stored as 9-bit (or 18-bit) bytes, left to right within each word.
UNSIGNED INTEGERS:
Unsigned integers are fully implemented; any integer object
may be either "signed" or "unsigned", and both forms use exactly the
same amount of storage, with the high order bit considered the sign
bit (if the object is signed). However, because the PDP-10 has
no instructions specifically for unsigned data, some operations are
slower for unsigned ints.
Addition (+) and subtraction (-) are the same.
== and != are the same.
Left shift (<<) always uses the LSH instruction (logical shift).
Right shift (>>) uses LSH for unsigned, ASH for signed operands.
ASH is an arithmetic shift which propagates the sign bit.
<,<=,>,>= are slightly slower for unsigned operands.
Casts to floating-point are slower.
Multiply (*) is also slightly slower.
Divide (/) and remainder (%) are much slower.
CHARACTER:
The plain "char" type is "unsigned char". Sign extension is
done only if chars are explicitly declared as "signed char". Normally
a char is 9 bits, although it is possible to compile code using a
7-bit assumption (see the section on char pointer hints).
An extension to KCC provides five additional types of "char"
objects, specified as "_KCCtype_charN", where N is the number of bits
in the char and may be one of 6, 7, 8, 9, or 18. All may be signed
or unsigned; their "plain" form is unsigned. See the "KCC Extensions"
section for additional details.
FLOATING-POINT:
The "float" type is represented by one word in the PDP-10
single precision floating point format; there is one bit of sign, 8
bits of exponent, and 27 bits of mantissa.
The "double" type uses two words in the PDP-10 double
precision format; there is one bit of sign, 8 bits of exponent, and
62 bits of mantissa. (Note that on the KA-10 this is a software format
with 54 mantissa bits, rather than the more usual hardware format.)
The exponent range is approximately 1.5e-39 to 1.7e38 in both
formats; single precision has about 8 significant digits and double
precision has 18. See a PDP-10 hardware reference manual for more details.
KCC also supports the new ANSI "long double" type. Currently
this is the same as "double" but this might someday change on KL-10s to
use "G" format floating point, which has an exponent range of 2.8e-309
to 9.0e307 but only 17 significant digits.
The (double) type can represent all values of (long). That
is, conversion of a (long) to a (double) and back to (long) results in
exactly the original value.
POINTERS:
Pointers are always a single word, but can have two different
internal formats. Pointers to void, chars, shorts, or bit-fields, are
PDP-10 byte pointers (local or one-word global); pointers to all other
objects and functions are PDP-10 global word addresses. Byte pointers
point to the byte itself rather than to the preceding byte, thus LDB
instead of ILDB is done to fetch the byte.
It is very important to ensure that functions which return byte
pointer values, typically (char *), be properly declared; likewise, any
arguments which a function expects to be a byte pointer must in fact be
byte pointers, using a cast if necessary. Operations which expect a
byte pointer will not work properly when given a word pointer, and vice
versa. See the section on "pointer hints" near the end of this file
for additional information.
The "NULL" pointer is represented internally as a zero word,
i.e. the same representation as the integer value 0, regardless of
the type of the pointer. The PDP-10 address 0 (AC 0) is zeroed and
never used by KCC, in order to help catch any use of NULL pointers.
ARRAYS:
The only special thing about arrays is that arrays of chars
consist of 9-bit bytes packed 4 to a word, and arrays of shorts have
18-bit halfwords packed 2 to a word; all other objects occupy at least
one word.
ENUMERATIONS:
KCC treats enumeration types simply as integers. In the words
of H&S 5.5 (HS1 5.6.1), KCC uses the "integer model" of enumerations,
which is what ANSI has adopted.
STRUCTURES and UNIONS:
Structures and unions are always word-aligned and occupy a
whole number of words. Unlike the case for other declarations of type
"char" or "short", adjacent "char" and "short" members in a structure
are packed together as for arrays. Structures and unions may be
assigned, passed as function parameters, and returned as function
values.
Bit-fields are implemented; the maximum size of a bit-field is
36 bits. They may be declared as "int", "signed int", or "unsigned
int"; plain "int" bit-fields are unsigned. Fields are packed left to
right, conforming to the PDP-10 byte ordering convention. Bit-fields
are not compacted with anything else; a word in a structure will never
have both bit-fields and another kind of object, not even a char or
short.
It's too bad that C does not allow pointers to bit-fields,
because the PDP-10 byte pointer instructions are perfectly suited to
this application!
FUNCTIONS:
As per H&S. A pointer to a function is simply a word address.
For the gory details of function calls and stack usage, see the
"Internals" section.
TYPEDEFS:
As per H&S. With regard to 5.10.2 (HS1 5.11.1), KCC has no
problems with redefining typedef names in inner blocks; ANSI allows this.
<2 Type Conversions> [H&S 6 "Conversions and Representations"]
Integer conversions:
There are no representation changes when converting any
integer type to any other integer type of the same size. Sign
extension and truncation are performed when necessary to convert from
one size to another. Conversions from pointers are done as per H&S
6.2.3 (V1 6.3.4); a pointer is treated as an unsigned int and then
converted to the destination type using the integral conversion rules.
Floating-point conversions:
Casting (float) to (double) or (long double) retains the
same value. However, (double) to (long double) may lose one digit
of precision, depending on the implementation chosen for (long double).
A cast to (float) of an int may lose some precision,
although a char or short can always be fully transformed. (double)
can retain the exact value of an int or long int, which can be
restored to its original value by converting back to int.
Casting an unsigned integer to a floating-point value always
results in a positive number.
Pointer conversions:
There are a great variety of pointer conversions possible; however,
you can make sense of them if you simply note the following Three Laws of
C Pointers:
(1) Nihil ex nihilis -- a NULL pointer always remains NULL.
(2) Smaller is pointier -- a pointer to any object can always
be converted into a pointer to a SMALLER (or equal-sized)
object, without losing any information. Converting it back
to the original type restores the original value.
(3) Bigger is blunter -- converting a pointer to any object to
a pointer to a LARGER object will force the pointer to
have an alignment suitable for that of the larger type;
any fine details of positioning within the new type are lost,
and the original pointer cannot be recovered (unless it
was already properly aligned to begin with). The new
object pointed to will completely enclose the smaller
object.
These rules apply to all of the standard C data types and should be all
you need to know (unless you use the non-standard _KCCtype_charN data
types). Chars are aligned on 9-bit byte boundaries, shorts on halfword
boundaries, and all other data types on word boundaries. Converting
any pointer to a (char *) and back is always possible, as a char is the
smallest possible object. If the original object was larger than a
char, the char pointer will point to the first byte of the object.
Pointer conversion details:
Pointers on the PDP-10 are classified as either word pointers
(word addresses) or byte pointers.
Word pointer to Word pointer:
No representation change.
Word pointer to Byte pointer:
The result points to the leftmost byte in the word.
Byte pointer to Word pointer:
The result points to the word that the byte was located in. All
information about the position of the byte within that word is lost.
Byte pointer to Byte pointer:
The result depends on the relative byte sizes, as follows:
18-bit to 9-bit: alignment preserved. Same as (short *) to (char *).
9-bit to 18-bit: halfword-aligned. Same as (char *) to (short *).
M-bit to N-bit: If M == N, there is no change.
Otherwise, the resulting pointer is word-aligned so that
it points to the leftmost N bits of the word.
Pointer to void:
ANSI has introduced the notion of (void *), i.e. a pointer to
void. Any pointer can be converted to (void *) and back without any
loss of information. Internally, (void *) is always a byte pointer of
some kind, normally the same as (char *).
When converting to and from normal object pointers, (void *) is
the same as (char *). HOWEVER, when converting to and from pointers to
any kind of "_KCCtype_charN", there is never any representation change.
Assignment conversions:
KCC permits any meaningful cast conversion during an
assignment, but will complain about an implied cast if the conversion
is not one of the legal assignment conversions.
Unary conversions:
The "Usual Unary Conversions" are different for CARM and ANSI:
Original operand type Converted type
CARM ANSI (default)
float double float
signed char/short/bit-field int int
unsigned char/short unsigned int int
unsigned bit-field unsigned int *int or @unsigned int
* = if bit-field has fewer bits than an int.
@ = if bit-field has more (or same #) bits than an int.
The first difference is (float) to (double). What H&S
describes as an "optional compilation mode" to suppress the unary
conversion of (float) to (double) is always in effect for ANSI level,
as ANSI is allowing this feature as part of the standard conversions,
and the resulting PDP-10 code is much more efficient. If ANSI level
is not selected, then all (float) values will be implicitly converted
into (double) as per the old C standard. Note that all portability
levels require that (float) values always be promoted to (double) in
function arguments, so this particular implicit conversion is always
in effect.
The second difference is in the integer promotions. CARM uses
what ANSI calls "unsigned preserving" rules; ANSI uses "value preserving"
rules, meaning that a conversion to a wider type should always result in
a signed integer type regardless of whether the shorter type was unsigned
or not, as long as the new type can represent all values of the old type.
Binary conversions:
As already noted, (float) values are not always implicitly
converted to (double) before being operated on, if ANSI level is in
effect. There is one other difference between ANSI and CARM
with respect to the usual binary conversions:
If one operand is "long" and the other is "unsigned int",
CARM: makes both "unsigned long".
ANSI: makes both "long".
<2 Expressions> [H&S 7 "Expressions"]
As per H&S, with the following notes:
[7.2.2] (V1 7.2.3) Overflow and underflow are neither noticed nor
handled. The result is whatever the PDP-10 hardware gives in those
cases.
[7.3.3] KCC correctly does not use parentheses to force the usual unary
conversions. However, ANSI introduces a new unary operator, '+', which
takes an operand of any arithmetic type; KCC merely applies the unary
conversions to this operand.
[7.4.2] (V1 7.3.5) KCC permits component selection for structures
returned from functions, except when the component is an array. That
is, "f().a" will work and will select component "a" of the returned
structure, but it is not legal to do "f().array[i]". The ANSI draft
standard has clarified that this behavior is correct.
[7.4.3] (V1 7.3.6) KCC now allows formal parameters of type "function",
converting them invisibly to type "pointer to function".
When ANSI/STDC level is in effect, KCC will check to see if the
types of the arguments match the types of the parameters for the called
function.
KCC does not issue any warnings about discarded function
return values.
[7.5.1] (V1 7.4.1) Casts - KCC correctly implements "narrowing" casts
for floating point and for integers.
[7.5.2] (V1 7.4.2) "sizeof" - the result of "sizeof" is unfortunately
mandated by ANSI to be type (unsigned) rather than (int), which means
that some computations on size_t values will use slow unsigned
arithmetic instead of fast signed arithmetic, and some comparisons will
be meaningless -- for example, "n < 0" will never be true. Any
expression using a sizeof or size_t value is suspect.
The result of sizeof is always in terms of 9-bit bytes,
regardless of the setting of -x=ch7, with two exceptions: the size of a
char is always 1, and the size of a char array is the # of elements
(chars) in the array. This is true no matter how many bits are in a
char, and applies to _KCCtype_charN chars as well.
[7.5.6] (V1 7.4.6) '&' - Attempting to apply '&' to a "register"
variable simply causes KCC to issue a warning message and force the
variable to class "auto". KCC now permits '&' to be applied to array or
function names, as per ANSI.
[7.5.7] (V1 7.4.7) '*' - Applying the indirection operator to
a null pointer (0) simply retrieves (or sets) the contents of AC 0,
which should always be zero if nothing accidentally sets it. Treating
the null pointer as a char pointer will always retrieve zeroes and set
nothing. This behavior should not be relied upon, as the result is
always undefined and unportable.
[7.6.1] (V1 7.5.1) '*','/','%' -
Division by zero is a no-op; the value will be that of the dividend.
Truncation is always toward zero whether the operands are negative or
not:
5/2 == (-5)/(-2) == 2
(-5)/2 == 5/(-2) == -2
For the remainder operator, (x)%0 gives unpredictable garbage.
The sign of the remainder will be the same as that of the dividend:
5%2 == 5%(-2) == 1
(-5)%2 == (-5)%(-2) == -1
These operations are slower for unsigned than for signed operands.
Unsigned division in particular is cumbersome and should be avoided if you
care about efficiency.
[7.6.2] (V1 7.5.2) '-' - The type of the difference between two
pointers is (int). This is defined as ptrdiff_t by <stddef.h>.
[7.6.3] (V1 7.5.3) '<<','>>' - Left shift (<<) always uses logical
shifting; bits can be shifted into the sign bit. Right shift uses
logical shifting for unsigned integer types (the sign bit is shifted
out, and 0-bits shifted in), but uses ARITHMETIC shifting for signed
integer types (the sign bit is propagated).
Using a negative value for the right operand reverses the
direction of the shift. Portable code cannot rely on this however as
other implementations may not behave this way. Using a large number (36
or greater) simply shifts everything to oblivion as you might expect.
Note that it is possible to use left-shift arithmetic shifting (the ASH
instruction) by giving a negative shift distance to >>; of course this
is very non-portable.
[7.8] (V1 7.7) '?' - KCC correctly permits the result of a conditional
expression to have structure, union, enumeration, or void types.
[7.9.1] (V1 7.8.1) Structure and union assignment is (of course) permitted.
[7.9.2] (V1 7.8.2) 'op=' Compound assignment -
KCC does not support the obsolete "=+" compound assignment forms.
[7.11] (V1 7.10) Constant expressions -
KCC can and does evaluate constant floating-point expressions at
compile time. Almost all casts are also allowed, except certain
pointer-pointer conversions where the result would depend on whether
the program was running multi-section.
KCC is still somewhat too liberal about the constant
expressions in preprocessor #if statements; it permits any operator,
but non-macro identifiers (such as "sizeof" and enum constants) now
cause warnings.
[7.12] (V1 7.11) KCC correctly does not interleave expression
computations.
[7.13] (V1 7.12) KCC tries to issue warnings about discarded values.
This may change with time if the warnings are judged too obnoxious.
[7.14] (V1 7.13) KCC does some optimization of memory accesses, but not
much. "volatile"-qualified objects are treated specially; any fetch
from or store into such an object is considered to have side effects,
and such expressions are never optimized away (except if execution would
never reach them). It is best to only use "volatile" if you really need
it.
<2 Statements> [H&S 8 "Statements"]
As per H&S, with the following notes:
[8.7] switch statement - KCC permits the control expression of a switch
statement to be of any integral or enumeration type. The maximum possible
number of case statements for a switch is 512; however, this is a simple
KCC parameter and can be changed.
<2 Functions> [H&S 9 "Functions"]
[9.4] Adjustments to Parameter Types
In the absence of a function prototype, parameters which are
declared as "char" or "short" are really passed on the stack as type
"int", and "float" is passed as "double". No narrowing is done when
the parameters are referenced within the function.
With a function prototype, "float" is passed as "float" and not
"double". "char" and "short" are still passed right-justified within
an "int", but narrowing is done properly if those parameters are
referenced.
KCC now follows ANSI and permits formal parameters of type "function
returning...", which are converted into "pointer to function returning...".
<1 The C Libraries> [H&S Part II (V1 11: "The Run-time Library")]
All of the ANSI C library facilities are provided, as well as
all of the facilities described in H&S part II. In addition, various
UN*X system call emulations and traditional library routines are also
supported.
The file LIBC.DOC furnishes a complete summary of the
implemented library routines, and the file USYS.DOC summarizes
the UN*X system-call simulations. In general, users are advised to read
H&S or a UPM (Unix Programmer's Manual) for complete descriptions of
library functions, as these files are primarily intended to document
KCC-specific differences rather than to provide a user guide.
<1 C Library - UN*X System Calls>
The KCC runtime environment is intended to resemble that of
UN*X to the maximum practicable extent. Many system calls are
emulated, such as open(), close(), read(), write(), signal(), ioctl(),
and the like. This emulation does not pretend to be complete, and the
calls exist primarily to help transport software to and from UN*X
systems. Whenever possible, the ANSI library facilities, or the
standard portable routines as described in H&S, should be used instead
of these "system calls".
The file USYS.DOC summarizes the calls which KCC supports, and
describes how they differ from the UN*X versions. A UPM (Unix
Programmer's Manual, preferably a 4.2 or 4.3BSD version) should be
consulted for descriptions of how these calls should behave on UN*X
itself.
<1 KCC Language Extensions>
KCC implements a number of extensions to the C language which
are intended to allow for better integration with other PDP-10 software.
It is possible to disable these extensions by means of the -P switch.
These extensions are:
[1] The "entry" keyword (obsolete).
[2] The '`' identifier quoting mechanism.
[3] The #asm and asm() assembly language mechanism.
[4] The "_KCCtype_charN" data types.
[5] The "_KCCsymval" and "_KCCsymfnd" built-in functions.
<2 Extension [1] - The "entry" keyword>
The use of this statement has been described earlier in the
discussion of library entry points. However, it is an obsolete feature
and should no longer be needed for any purpose. Future versions of KCC
will flush it if no one objects.
<2 Extension [2] - Identifier Quoting>
The current PDP-10 software allows symbols to have 6 characters
from the set A-Z, 0-9, ., %, $. KCC maps 0-9 to 0-9, a-z and A-Z to A-Z,
and '_' to '.'.
KCC supports a non-standard extension to C whereby any characters
enclosed within accent-grave ('`') marks are treated as a valid C identifier.
This allows the user to specify identifiers containing the characters '$'
and '%', as well as any arbitrary character, although KCC will print a
warning if a character not in the PDP-10 set is seen. An underscore
within a quoted identifier is still translated to a period, however.
Examples: `$FOO`, `OPENF%`, `$$BP`, `switch`
Identifiers with internal linkage (file-scope static) are
normally mapped into unique internal symbols; however, this mapping can
be suppressed by quoting the identifier. This is primarily useful for
combining asm() code with C code in the same module, since the asm()
code doesn't have to guess what symbol KCC will map an identifier into.
This is unnecessary for external linkage identifiers, which are not
mapped.
This quoting mechanism should be used ONLY where necessary. It
is not portable and should be conditionalized if used in portable code.
Identifiers defined in this way should be CONSISTENTLY quoted in this
way, because they are stored internally with '`' as their first
character to distinguish them from normal unquoted identifiers and
keywords. This avoids potential confusion and allows one to specify an
identifier which is otherwise a reserved keyword, such as `if`. Any
mention of a quoted identifier in error messages will show a '`' as the
first character of the identifier.
<2 Extension [3] - #asm and asm()>
Many C compilers have an escape mechanism which allows the
programmer to specify a series of assembly language instructions within
a C program. KCC's means of doing this is with the "asm()" expression,
which looks and is treated exactly like a function call.
Currently only one argument is allowed to asm() and this must
be a string literal. The text of the string is simply passed directly to
the assembler output file at that point in the compilation.
There is also a preprocessor command called #asm, which
converts everything up to an #endasm into a series of asm()
expressions. This is convenient for very long stretches of assembler
code, or where the enclosed text must be macro-expanded (remember
macros are not recognized within string literals).
Within assembler code, you must invoke the macros %%CODE or
%%DATA to switch between assembling pure and impure (variable)
code/data. #asm inclusions will always begin in the code segment, and
must always end in the code segment. Never use %%CODE when already in
the code segment, or %%DATA when already in the data segment.
Because asm() is syntactically an expression, it can only
appear where an expression is legal. However, any attempt to use it
anywhere but as the sole contents of a function body is highly fraught
with peril. If it is necessary to specify some assembler directives
separate from any function, an acceptable way of doing this is by
means of a static dummy function, such as:
static void
dummyfunct(){
asm("%%DATA\n STUFF: ASCIZ/foo/ \n %%CODE\n");
}
It cannot be repeated too often that use of asm() is strongly
discouraged. It is possible that someday its functionality will be
extended to the point that KCC can parse and understand the contents
(thus, for example, references to C auto variables would be allowed);
however, this would primarily be for the purpose of allowing KCC to
generate .REL files directly rather than to encourage wider use of asm().
At the start of the assembler file, a PURGE is done of all the
assembler IF pseudos. Thus, assembler code cannot use any IF pseudo
tests, nor macros which use them. Incidentally, attempting to use a
SEARCH MONSYM will cause FAIL to barf several times with a "FAIL BUG
IN SEARCH" message, due to the lack of the IF pseudos; this is
annoying but harmless. MACRO does not have this problem.
<2 Extension [4] - "_KCCtype_charN" data types>
Normally the "char" data type is 9 bits. In the PDP-10 world
much existing software depends on 7-bit characters, and to make it
easier to write the necessary system-dependent code a 7-bit char data
type was introduced and generalized. The 5 possible char sizes (6, 7,
8, 9, and 18) were chosen because it is only for those sizes that
OWGBPs exist (one-word global byte pointers), and thus only those sizes
can be guaranteed to work when using extended addressing.
Any of the char types can be signed or unsigned; if the plain
form is used, unsigned is assumed. Narrowing and widening is done
properly whatever the size. Note that the 18-bit size is similar to
but not the same as "short"; it is included mainly for completeness
rather than in the expectation that someone would actually use it. The
9-bit size is the same as regular "char", unless the -x=ch7 option is
in effect, in which case "char" is the same as the 7-bit size.
These types can normally be used just as for "char". However,
there are some special effects associated with certain operations:
(1) "sizeof" of a N-bit char array returns the number of N-bit
chars (elements) in the array. Usually this is what you
want. Giving this number to malloc will cause problems
only for chars of 18 bits.
(2) A cast (explicit or implicit) of a string literal to a
N-bit char pointer will cause the string literal to be
stored as N-bit bytes. This is NOT strict C, which would
merely convert the char pointer; however, this is the
most useful interpretation. This permits the somewhat
bizarre construct of using a string literal to make
an array of 18-bit bytes (this is the only aspect where
"_KCCtype_char18" differs from "short").
(3) 6-bit string literals are stored as SIXBIT rather than using
the low 6 bits of the ASCII char values. Note that while
such strings are null-terminated, null is also a valid
SIXBIT character (meaning space). The value of
characters which cannot be represented in SIXBIT
is undefined.
(4) Function parameters cannot be declared to have a type of
char size 7 or 8. The reason is complicated; see
the last part of this section.
Some examples:
_KCCtype_char6 tmp[] = "tmp"; /* A 4-element array of SIXBIT chars */
_KCCtype_char7 wd[5] = "word"; /* A 5-element array of 7-bit chars */
_KCCtype_char8 packet[40]; /* A 40-element array of 8-bit chars */
_KCCtype_char18 useless; /* Same as "unsigned short useless;" */
_KCCtype_char7 *arg = "text"; /* A pointer to an ASCIZ string */
_KCCtype_char6 *pt6; /* A pointer to a 6-bit char string */
arg = "othertext"; /* Implicit conversion to ASCIZ */
pt6 = "dskdmp"; /* Implicit conversion to SIXBIT */
pkg_call((_KCCtype_char7 *)"argtext"); /* Explicit cast to ASCIZ */
/* Macro to generate sixbit constant values from literals */
#define const6(s) (*(int *)((_KCCtype_char6 *)s))
uname = const6("UNAME"); /* Typical usage */
Portability issues:
The long names for these types were deliberately chosen so as to
minimize the chances of possible conflict with identifiers in software
imported from elsewhere, and to discourage the indiscriminate (non-portable)
use of the types. Note that users who must make heavy use of them (for
good reasons, we hope) can simply use typedefs or #defines at the start
of their code in order to equate them with simpler names; e.g.
#define char7 _KCCtype_char7 /* Use shorter typename */
This method also has the advantage of localizing non-portable
constructs in a way that gives others a fighting chance to port the
software elsewhere by changing the initial definitions.
Storage:
There are a few aspects of the way N-bit char objects are stored
which may be surprising at first. Char arrays are always packed starting
with the leftmost byte in a word; however, single-char objects (such as
"char c;" have their value stored in the rightmost ALIGNED byte.
This is a necessary consequence of the fact that the '&'
operator applied to a char object must result in a valid char pointer,
and the very strong desire that all C code work with extended addressing.
There are only a few possible kinds of OWGBPs and they all require this
alignment. For 6, 9, and 18 bits this causes no difficulty since bytes
of those sizes completely fill a word, and there are no unused low-order
bits; thus char values may be stored completely right-justified, and in
some cases full-word operations can be performed on them.
However, for 7 and 8 bit bytes the rightmost byte will leave 1
and 4 unused low-order bits, respectively, and this is where KCC
stores the values for such objects. Debuggers examining a program with
DDT may be surprised that "_KCCtype_char8 foo = 1;" results in a
word labelled FOO with its value 020 instead of 1.
This alignment restriction causes no real problems except for
the obscure case of function parameter declarations. In the absence
of ANSI function prototypes, the default "function argument
promotions" are performed when a call is made; all integers shorter
than (int) are converted to (int) and passed as such. But this means
that the integer value is right-justified; if the function parameter
was declared to match the promoted type (int) then all is well, but
attempts to declare it as a 7 or 8 bit char will just result in a
confused function (attempts to read the parameter value or take its
address will fail since the value is not properly aligned). This
could be fixed by having KCC do an implicit conversion upon function
entry, but it is far simpler and much, much more efficient to simply declare
such parameters as (int) in the first place.
Conversion of pointers to charN data types:
This was touched on in the general discussion of type conversions.
Byte pointers are really divided into two classes depending on whether
they point to a "char" type or just a "smaller-than-int" type. The former
are "char pointers" (of 6, 7, 8, 9, or 18 bits) as well as "byte pointers";
the latter are simply byte pointers. The distinction is not significant
except when doing conversions to and from (void *); in that case, a char
pointer undergoes no change at all, whereas a non-char byte pointer will
always be crunched into an aligned 9-bit byte pointer.
<2 Extension [5] - "_KCCsymval" and "_KCCsymfnd" symbol lookup mechanism>
The _KCCsymfnd and _KCCsymval constructs are macro-type
facilities recognized by the preprocessor, since they are intended to
be used in #if expressions. They are heavily used by the "monsym"
macro in <monsym.h> to access the TOPS-20 system's MONSYM symbol table,
but they can be applied to any other symbol table, such as TOPS-10's
UUOSYM. Both _KCCsymval and _KCCsymfnd have the same syntax and almost
the same actions; the following discussion will focus on _KCCsymval.
Usage: int _KCCsymval(filename, symbol);
char *filename, *symbol;
"_KCCsymval" is a macro predefined in the preprocessor which
takes two arguments like a function call; both must evaluate to string
literal constants. KCC looks for the definition of <symbol> in
<filename>; if found, its value is used as the the resulting constant
value of the expression, with type (int).
The file must exist and it must be in MACRO UNV (universal)
format. The file is only read once, and it is then remembered for the
entire duration of the KCC run, even when compiling different modules.
This strategy, based on the desire for more efficiency, may possibly
cause a non-extended KCC to run out of storage if _KCCsymval is used
with many different UNV files, but this is unlikely.
If the symbol is not found, or is not a constant, _KCCsymval
generates a compilation error and returns 0. Because it is evaluated
at preprocessing time, _KCCsymval can be used for initializations and
in #if preprocessor conditionals.
_KCCsymfnd, by contrast, will always return either 1 or 0 depending
on whether the symbol has a constant definition or not. The only time
it will cause a compilation error is if the specified file cannot be read
the first time the name is encountered; further attempts to query that
file will fail silently as if the symbol was not found.
Normally users will not use these constructs directly. The
<monsym.h> file is a good example of how a simplified facility can be
provided.
<1 KCC Internals>
<2 KCC Internals - Memory organization>
A C program compiled by KCC has four distinct memory regions:
data, text (code), stack, and free.
DATA - This contains all user-declared data variables, both
initialized (set to user's specification) and
un-initialized (set to zero).
The first address following this region is stored in "_edata".
TEXT - This is the UNIX terminology for program code.
The first address following this region is stored in "_etext".
STACK - The program stack. This grows upwards in memory.
FREE - The region of memory that malloc() can dynamically allocate.
This starts at the address stored in "_end" and can allocate
memory up to (but not including) the address stored in
"_ealloc".
In addition, there may be small unused areas of memory.
The normal layout on TOPS-20 for a single-section program:
Start addr End addr Region Name
LOW _edata-1 DATA
_edata <??> STACK
<??> HIGH-1 - (unused)
HIGH _etext-1 TEXT
_etext _ealloc-1 FREE
_ealloc 777777 - (unused, reserved)
Normally LOW == 0 and HIGH == 400000. These correspond to the normal
addresses for low and high segments. Also, normally _ealloc is set to
770000, so that pages 770-777 can be reserved for mapping DDT (some people
seem to prefer that to IDDT).
The normal layout on TOPS-20 for a MULTI-section program:
Start End Region Name
Section 0 - (unused)
Section 1
1,,LOW _edata-1 DATA
_edata 1,,HIGH-1 - (unused)
1,,HIGH _etext-1 TEXT
_etext 1,,777777 - (unused)
Section 2
2,,0 <??> STACK
<??> 2,,777777 - (unused)
Sections 3-37
3,,0 _ealloc-1 FREE (all sections up to 37)
_ealloc 37,777777 - (unused, reserved)
Normally _ealloc is set to 37,,700000 so that pages 700-777 of section 37
are reserved for mapping XDDT (again, for those people who don't know about
IDDT).
The normal layout on TOPS-10 for a program:
Start addr End addr Region Name
LOW _edata-1 DATA
_edata _edata+$STKSZ STACK
_end HIGH-1 FREE
_ealloc HIGH _etext-1 TEXT
_etext 777777 - (unavailable, sigh)
Normally LOW == 0 and HIGH == 400000. These correspond to the normal
addresses for low and high segments. However, the high segment location
may need to be changed in order to provide enough dynamic memory space.
This can be done by inserting the command -L=/SET:.HIGH.:n in the KCC
command line before any of the filenames. A "n" of 500000 will provide
64 additional pages of dynamically allocated memory, for example.
<2 KCC Internals - Stack structure>
The organization of the portion of the stack seen by a C routine is
shown in the following diagram (with the top of the stack being the
earlier lines in this file, and the stack pointer at the very top):
SP-->________________________________________________________________
| Spilled registers |
| generated when we need more intermediate values than |
| there are available PDP-10 registers |
|________________________________________________________________|
| | |
| (as many | Arguments being stacked for the next call |
| repetitions | These are generated in the reverse of |
| of these | lexical order; thus the first argument |
| two areas | appears at the top of the stack. This is |
| as levels | so that functions like printf which take a |
| of nesting | variable number of arguments can work. |
| in function |__________________________________________________|
| calls) | |
| | Values to be saved over the call |
| | e.g. if we do foo()+bar() then one function |
| | has to be called first, and we save its |
| | value here so we can add it to the other |
| | result once the second call returns |
|_____________|__________________________________________________|
| |
| Local variables |
| stored in lexical order, i.e. the first declared |
| variable is lowest on the stack |
|________________________________________________________________|
| |
| Return address for calling function |
|________________________________________________________________|
| Pointer for return value |
| this only exists if the function returns a struct |
| that takes more than two words; otherwise the result |
| is returned in registers 1 and (if two words) 2 |
|________________________________________________________________|
| |
| Arguments to this call |
| in reverse lexical order as described above |
|________________________________________________________________|
Of course, not all of these areas are likely to appear at once.
There is no frame pointer, only a stack pointer; generated code always
knows the location of the stack pointer in relation to changes in the
above structure (as arguments get pushed and popped, registers get
spilled and despilled, etc). Thus code to access an argument or local variable
will use a different offset from the stack pointer depending on where
it is generated.
<2 KCC Internals - Calling conventions and register use>
Arguments to KCC C functions are passed on the stack and
returned in the registers. Functions are not expected to save
any registers upon entry, and in fact are assumed to clobber all
of ACs 1-16 inclusive.
Caller conventions - argument passing:
Since all function calls are assumed to clobber the registers,
it is up to the caller to save on the stack any register values which
it wishes to preserve over the function call.
As described in the section on stack structure, function
arguments are then pushed in reverse order onto the stack; the last
argument is pushed first, and the first argument is pushed last.
Passing a structure as argument consists of copying it whole onto the
stack. If the function is expected to return a structure or union
longer than two words, a "zeroth arg" must also be pushed, which is
the address of a location that the function should copy the returned
structure into. The function is then called with a PUSHJ 17,
instruction which adds the return address onto the stack.
Caller conventions - result returning:
All accumulators (except AC17) are at the callee's disposal.
However, AC0 is never used by generated code, as some old programs
assume NULL always points to zero, and as the hardware imposes several
restrictions on its use. AC15 and AC16 are also reserved for minor
KCC runtime functions.
Single word function return values are left in AC1; double
word returns go in AC1 and AC2. Return values larger than that are
copied into the location specified by the struct-return pointer, which
is provided by the caller as the "zeroth" argument.
<2 KCC Internals - Extended addressing>
A C program can be run in an extended section by specifying
this in either of two ways at load time, depending on whether you are
using KCC or the EXEC to do the loading.
(a) KCC: Use the "-i" switch.
e.g. @cc -i prog.c
or: @cc -i=extend prog.c
(b) LOAD (or LINK): The first module should be C:LIBCKX.
e.g. @load c:libckx,prog
No special switches need be given to KCC for the generated code to be
suitable for extended addressing - the same code will always run
either extended or non-extended.
In extended sections, code and permanently allocated data
(i.e. global variables) live in section N, the stack lives in section
N+1, and allocated memory begins in section N+2, expanding to fill all
higher sections. Normally N==1; this can be changed if really
necessary. All byte pointers not intended for immediate use (e.g. not
literal arguments to a LDB or DPB instruction) are constructed as
OWGBPs (One-Word Global Byte Pointer) so they can be stored in and
referenced from any section.
<1 Cross-compiling>
The -x, -L, -H, and -A switches allow some degree of cross-compilation.
The effects of the various -x specifications are listed below:
CPU: ka, ki, ks, kl0, klx
KCC can compile code to run on any CPU type; this is done both
by means of different code generation sequences and by assembler
macros which KCC also generates as needed. "ka" specifies a KA-10
using software format floating point doubles (all other types use
hardware format). "ki" specifies a KI-10, and "ks" both a KS-10 and a
KL-10A without extended addressing. "kl0" specifies a KL-10B capable
of extended addressing, but restricts the code to section 0; "klx"
specifies a KL-10B non-zero section environment.
It is possible to specify more than one CPU type; the intent is
to allow for producing code that will run on all specified machines.
As distributed, KCC code is compiled for "ks+kl0+klx". Not all
combinations have been tested or are even meaningful ("ka+klx" would
not work, for example).
SYSTEM: tops20, tenex, tops10, waits, its
Currently there are only two things affected by this setting:
character and string constant values, and ERJMP.
[1] If compiling for WAITS (or for anything else if on WAITS),
character values are mapped to and from WAITS ASCII and standard US
ASCII.
[2] If compiling for TOPS20 or TENEX, the proper value of
ERJMP and an auxiliary definition called ERJMPA are generated.
There may be more distinctions in the future.
ASSEMBLER: fail, macro, midas
The assembler selection is independent of the system or CPU.
Currently either FAIL and MACRO can be selected and both will work.
Selecting MIDAS also works, barring a few obscure situations.
CHARSIZE: ch7
It is possible to request that KCC generate code which assumes
that chars are 7 bits, and char pointers are 7-bit byte pointers.
Thus, arrays of chars will have 5 chars per word, instead of 4. This
feature, invoked by the "-x=ch7" switch, is mainly of use to people
who must integrate C code with old software that cannot deal with
anything but 7-bit bytes. It is not really guaranteed to work in all
conceivable cases. In particular, you should be aware that many of
the normally-compiled library routines (such as malloc) will continue
to return 9-bit char pointers, although the str- and mem- functions
should work with either 9-bit or 7-bit strings.
The values returned by "sizeof" will not change. As explained
in the discussion of the sizeof operator, sizes are always in terms of
9-bit bytes, except that the size of a char array is always the number of
elements (chars) in the array. sizeof(char) is always 1.
General comments:
Ideally KCC (on any system) should be able to generate code
for any other PDP-10 system. To actually do this requires some
understanding of how the various parts of a program come together. It
is not enough just to specify some -x switches; you must take care of
the following:
1. #include files. You may need to use an alternate standard
include-file directory to satisfy <>-type includes. -H can be
used to specify an alternate location.
2. Switches. You should use -D to predefine any parameters
from <c-env.h> which are not properly defaulted.
Alternatively you can put a different version of c-env.h in
a non-standard location pointed to by -H (as above). This is
more convenient for everything except one-shot compiles.
3. Library. The C runtime library loaded with the program must
be the correct one (already cross-compiled for the target). KCC
always generates a default "-lc" request for the C runtime library;
the location searched for this can be specified by the -L switch.
For details on porting the C library and KCC itself, see the file PORT.DOC
in the KCC source directory.
<1 Char Pointer Hints>
The code generated for handling char pointers always uses
byte-pointer instructions, and so will work for any byte size (at
least on machines implementing the ADJBP instruction). This can
sometimes be useful when dealing with PDP-10 based data structures.
However, such pointers will generally have to be constructed "by hand"
since all char pointers that KCC itself generates are aligned versions
of the (_KCCtype_charN *) data types. See also the -x=ch7 option in
"Cross-compiling".
In general, when char pointers are involved, constructs like
*++ptr are faster than *ptr++. This is because *++ptr can usually be
folded by the optimizer into an ILDB (or IDBP) instruction. There is
no equivalent on the PDP-10 to a *ptr++ construct; this must always
be done as at least two instructions.
Whenever possible, try to avoid using two char pointers in
subtraction, as in (ptr1-ptr2). Several instructions have to be
executed to find the difference between two char pointers, due to the
strange internal format. For the same reason, try to avoid less-than
(<, <=) or greater-than (>, =>) comparison of char pointers; the code
generated for subtraction and comparison is extremely clever (much
better than an assembler programmer might expect), but is still not a
single instruction.
Tests for equality (== and !=) are fine, however. Finally, on
machines which do not implement the ADJBP instruction (KA, KI), it is
also helpful to avoid addition or subtraction of integers to char
pointers.
None of this applies to other types of pointers, such as (int *),
which are simple addresses and can be manipulated very efficiently.
Pointer casts between word pointers and byte pointers are not
always trivial, but they are reasonably fast (from 1 to 4 instructions
depending on the alignment requirements).
<1 Portable Math Library>
* Menu:
* PML: (KCC-PML) Portable Math Library
<1 Local library additions>
* Menu:
* LIBLCL: (KCC-LIBLCL) Local library additions
* LIBT20: (KCC-LIBT20) Frank Wancho's TOPS-20 library