PDP-10 Archive: c/kcc5/kcc.vmshelp from SRI_NIC_PERM_FS_1

Trailing-Edge - PDP-10 Archives - SRI_NIC_PERM_FS_1_19910112 - c/kcc5/kcc.vmshelp
There are no other files named kcc.vmshelp in the archive.
                        KCC USER DOCUMENTATION
1 About KCC
<About KCC>

       KCC is a compiler for the C language on the PDP-10.  It was
originally begun by Kok Chen of Stanford University around 1981 (hence
the name "KCC"), and has had many improvements made to it since then
by a number of people at Stanford, Columbia, and SRI.  It implements C
as described by the following references:
       H&S: Harbison and Steele, "C: A Reference Manual",
               Prentice-Hall, 1984, ISBN 0-13-110008-4
       K&R: Kernighan and Ritchie, "The C Programming Language",
               Prentice-Hall, 1978, ISBN 0-13-110163-3

       Currently KCC is only supported for TOPS-20, although there is
no reason it cannot be used for other PDP-10 systems or processors, if
the need arises.  The remaining discussion assumes you are on a
TOPS-20 system.
1 Using KCC
<Using KCC>

       C source files should have the extension ".C", such as PROG.C
and SUBS.C.  To build a C program, whether from one or more source
files ("modules"), there are three things that must be done.  First,
all modules have to be compiled with KCC to produce .REL files (e.g.
PROG.REL and SUBS.REL); second, the LINK loader must be invoked to
load all of the necessary modules into an executable core image; and
third, this image must be saved on disk as an .EXE file.

       Every complete C program must contain one and only one module
that defines the function "main".  This function is where control begins
when the program is executed, and unless otherwise specified the .EXE
file will be named after the module that "main" appears in.

       You can make a C program either by using the EXEC commands
COMPILE, LOAD, and SAVE, or by invoking KCC directly.  For example,
suppose "main" is defined in PROG.C, and the file SUBS.C contains
auxiliary subroutines.  Then,

To make:               EXEC command            Direct KCC invocation
-------                ------------            ---------------------
PROG.EXE from .C files:@LOAD PROG,SUBS         @CC -q PROG SUBS
                       @SAVE PROG

Just the .REL files:   @COMPILE PROG,SUBS      @CC -q -c PROG SUBS

PROG.EXE from .RELs:   Same as 1st             @CC PROG.REL SUBS.REL

       One advantage of using the EXEC commands is that they will
only compile those files which appear to require it, i.e. modules for
which the .C file is more recent than the .REL file.  The EXEC can also
translate TOPS-20 directory names into a format that the DEC loader will
understand, so that commands like @COMPILE <FOO>PROG are possible.
       However, KCC will do a similar form of conditional compilation
if the -q switch is set, for those modules specified without a .C
extension. (This may become the default someday.)  More commonly, the
EXEC at your site may not have been modified to know about KCC, or you
may wish to specify certain options to the compilation, or you may
just come from a UNIX background and feel more used to the direct
invocation method.
1 Direct Invocation - Compiler switches
<Direct Invocation - Compiler switches>

       The KCC compiler switches are intended to resemble those of the
UN*X "cc" command as closely as possible.  If you are familiar with these,
you can probably use KCC instinctively.  The command line is broken up into
argument strings each separated by a space (NOT by a comma).  If an argument
string starts with a "-", it is a switch, otherwise it is a filename.
Case is significant in switches!
       Normally, if the filename as given exists, it is used
regardless of its form.  The exception is files with a ".REL"
extension, which are never compiled but are passed on to the linking
loader.  If a filename does not exist and appears to have no
extension, ".C" is added.  This feature is primarily useful with the
-q switch as it requests conditional compilation.  Case is not
significant in filenames.

       If none of -c, -E, or -S are given as switches, KCC will invoke
LINK after compilation and an executable file (*.EXE) will be produced.

       The ordering of switches and filenames, in general, does not
matter; all switches are processed before compiling starts.  However,
note that filenames and libraries will be compiled and/or loaded in
the order given, and -I paths will also be scanned in the order given.

       It is possible to specify KCC switches while giving a
COMPILE-class command to the EXEC, if your EXEC recognizes the switch
/LANGUAGE-SWITCHES.  The argument to this EXEC switch should be a
double-quoted string which starts with a space.  For example:
       @compile foo /laNGUAGE-SWITCHES:" -m -d=sym"

------------------------------------------------------------------------
The following are the available compiler switches, in alphabetical order.
They are the same as those used by UN*X "cc", except where marked with
a "*" -- these are mainly of interest to KCC implementors.

* -A<file> Specify a file name for the assembler header file (included
       at the start of all assembler output).
  -c   Compile and assemble, but don't link (produce *.REL).
  -C   Retain comments in preprocessor (only useful with -E).
* -d   Debugging output.  Same as -d=all.  Generates many debug files.
* -d=<fs>      Debugging fine-tuning.
       <fs> are flag names of particular kinds of debug output files.
       The names can be abbreviated.  Prefixing the name with a
       '+' turns it on; '-' turns it off.  All flags are initially
       assumed off.  Current flags are:
               parse   Parse tree output (*.DEB)
               pho     Peep-Hole Optimizer output (*.PHO) - HUGE!!!
               sym     Symbol table output (*.CYM)
               all     All of the above
       E.g. "-d=parse+sym" == "-d=all-pho"
  -D<ident> Define following ident to "1" or string after '='.
       E.g. "-DMAXSIZE=25".  Several of these may be specified.
  -E   Run source only through preprocessor, to standard output.
* -H<path> Specify a non-standard location for <>-enclosed #include files.
  -i   Loader: load code for multi-section (extended addressing) operation.
  -I<path> Supply a search path for doublequoted #include files.
       Several of these may be specified, and will be searched in
       that order.
* -L<path> Loader: Specify a non-standard location for library files.
* -L=<str> Loader: Specify an arbitrary string argument to the loader.
       Note that the syntax does not permit spaces to be included.
       Several of these may be given.
  -lnam        Loader: Specify library filename for loader.  The "nam"
       argument is used to construct the filename LIBnam.REL in the
       library directory path and this is searched when encountered
       in the specifications.
* -m   Use MACRO rather than FAIL.  Semi-obsolete, same as -x=macro.
  -O   Optimize (no-op, defaults on).  Same as -O=all.
* -O=<fs>      Optimization fine-tuning.  Mainly for debugging.
       <fs> are flag names of particular kinds of optimizations.
       The names can be abbreviated.  Prefixing the name with a '+' turns
       it on; '-' turns it off.  All flags are initially assumed off,
       so to ask for no optimization use -O= (same as -O=-all).
       Current flags are:
               parse   Parse tree optimization
               gen     Code generator optimizations
               object  Object code (peephole) optimizations
               all     All of the above
       E.g. "-O=parse+gen" == "-O=all-object"
  -o=<file>    Specify output filename for the executable image.
       For UN*X-compatibility kicks, "-o <file>" also works.
* -P=<fs> Portability level specifications.  Several switches may be given in
       a format similar to that for -d and -O.  The <fs> flags
       specify the C implementation level that the compiler should use:
               base    Base level C -- most portable and restricted
               carm    H&S CARM level -- full implementation
               ansi    ANSI C draft level (only partly effective)
       Only one of the previous 3 is allowed, plus an optional:
               kcc     Permit KCC-specific extensions to the selected level.
       The default is "ansi+kcc" if -P is not given.  -P alone is
       interpreted as "base".
* -q   Conditional compilation.  All file specs without an extension will
       only be compiled if the .C file is more recent than the .REL file.
       For example, "cc -q foo bar.c arf.rel"
               compiles FOO.C if it is more recent than FOO.REL,
               always compiles BAR.C, and never compiles ARF.
  -S   Don't assemble (produce *.FAI or *.MAC, plus *.PRE)
  -U<ident> Undefine following identifier.  All -U switches are processed
       before any -D switches.  Only __FILE__ and __LINE__ are predefined.
* -v   Verbose - same as "-v=all".
* -v=<fs> Verbosity switches, similar to -d and -O.
               fundef  - print function names as they are defined (not yet).
               stats   - show statistics for run
               load    - show command string given to loader (if any)
  -w   Don't type out warnings.
* -x=<fs>      Cross-compile switches.  Several switches may be given in
       a format similar to that for -d and -O.  The <fs> flags
       specify an aspect of the "target machine" that the
       code should be compiled for (case is significant!):
               Target System:  tops20, tops10, waits, tenex, its
               Target CPU:     ka, ki, ks, kl0, klx
               Target Assembler: fail, macro, midas
               Target char size: ch7   (to compile with 7-bit chars)
       e.g. "-x=ka+tenex".  See "Cross-compiling".
------------------------------------------------------------------------

NOTE: <path> syntax
       The -I, -H, and -L switches all take a "path" as argument.
This is interpreted as specifying both a prefix and a postfix string
which are used to sandwich a partial filename from some other source
(#include "xxx", #include <xxx>, and -lxxx respectively).  The two
strings are separated by the character '+' (this is site dependent
however).  Thus, for example:
       Specification           Prefix  Postfix         Sample with "xxx"
       -I+[SYS,NEW]            ""      "[SYS,NEW]"     xxx[SYS,NEW]
       -HNEWC:                 "NEWC:" ""              NEWC:xxx
       -LPS:<C>LIB+.REL    "PS:<C>LIB" ".REL"          PS:<C>LIBxxx.REL

NOTE: Obsolete features

       The following switches and interpretations are obsolete.  They will
likely be flushed altogether, but are documented here for historical reasons:

       * -n    same as -O= (no optimization)
       * -s    same as -d=sym (output *.CYM symbol table dump)

       It used to be a feature that "simple" switches, which did not
take any arguments, could be lumped together into a single switch
string.  For example, "cc -mS test" is the same as the more standard
"cc -m -S test".  However, use of this feature is discouraged; the
potential confusion and inconsistency don't seem to be worth it.

NOTE: Switch Portability

       The following lists the switches implemented by other systems
but not by KCC.  This information seems useful and this is a convenient
place to put it.  Other-system switches that KCC implements are not included.
Switches which mean one thing to KCC but another thing to other systems
are included.  Currently only 4.2BSD switches are listed.
       -g   Output additional symtab info for dbx(1), pass -lg to ld(1)
       -go  Ditto for sdb(1).
       -p   Output profiling code for prof(1).
       -pg  Ditto but for gprof(1).
       -R   Passed on to as(1) to make initialized vars shared and read-only.
       -Bpath Use substitute compiler pass programs specified by <path>.
       -t[p012] Use only the pass programs from -B designated by -t.
ld(1) switches:
       A, D, d, e, l, M, N, n, o, r, S, s, T, t, u, X, x, y, z
1 User Program - Command line interpretation
<User Program - Command line interpretation>

       The C runtime startup interprets the command line to a C program
in a consistent fashion, and supports (1) argument string passing,
(2) I/O redirection, (3) pipes, and (4) background processing.

(1) Command line arguments:
       Command line arguments can be passed to the main() function
from the EXEC or monitor in the UN*X fashion.  That is, main() is
given two arguments, the first of which is an argument count and
the second a pointer to an array of char pointers, each of which
constitutes an argument.  Thus it is conventional to declare the
parameters to main() in this way:
               main(argc, argv)
               int argc;
               char **argv;
For example, if you have a C program saved as PROG.EXE, then invoking
PROG with the command:
               @PROG one two
will set argc to 3, and the strings that argv points to will
be "PROG", "one", and "two".  Note that arguments are separated by
blanks and not by commas!

(2) I/O redirection:
       I/O redirection of stdin and stdout is also supported.
Thus:
  1.  @PROG <foo       ; will take all stdin input from the file "foo".
  2.  @PROG >bar       ; will send all stdout output to a new file "bar".
  3.  @PROG >>log      ; will append all stdout output to the old file "log".
These can be combined:
      @PROG <foo >bar  ; does both 1 and 2. (from "foo", to "bar")
However,
      @PROG <foo>bar   ; interprets "<foo>bar" as a single argument string,
                       ; because it looks like a <directory>filename.

(3) Pipes:
       On TOPS-20 systems which implement the PIP: device (developed at
Stanford), pipes can also be supported, so that a command such as:
       @PROG | BAZ
causes the stdout of PROG to be redirected to the stdin of BAZ.

(4) Background processing:
       Again, provided the EXEC has been suitably modified, a
command line ending in an ampersand ('&') will cause the program
to be run in the background, while the user goes on to do other
things:
       @PROG one two&
1 C as implemented by KCC
<C as implemented by KCC>

       KCC is intended to conform to the description of C as
specified by Harbison & Steele's "C: A Reference Manual".  It is
strongly recommended that all C programmers use this book in preference
to Kernighan & Ritchie.  As the ANSI C standard becomes more concrete,
KCC will likewise evolve to conform to this standard; some of the
proposed ANSI features are already implemented.

       The -P (portability) switch controls the exact level at which
KCC attempts to compile a C program.  There are three possible levels,
and only one of these may be in effect:
       ANSI - permits all currently implemented ANSI constructs to be
               recognized and compiled.  This is basically CARM level
               plus some new things; KCC does not yet fully
               implement the ANSI draft standard, as it keeps changing.
               Users should be cautious about using ANSI features.
       CARM - Disables all ANSI-added features which are not in Harbison
               and Steele's CARM book.  KCC fully implements this level.
       BASE - The most restrictive level.  This is basically the same as
               CARM, but will make KCC complain about some constructs
               or usages that are likely to be unimplemented by some
               other compilers.
       In addition, there is a "KCC extensions" flag which is independent
of the level; when enabled, this permits a number of KCC-specific extensions
to be recognized regardless of whatever level is in effect.
       Normally KCC uses the ANSI level with KCC extensions enabled;
this corresponds to "-P=ansi+kcc".

       The next several pages document KCC's implementation of C by
following the general ordering of H&S and pointing out aspects where
KCC differs or describing which of several optional behaviors KCC
implements.  Any ANSI features which are implemented are also described.
2 Lexical Elements
<KCC Lexical Elements>         [H&S 2, "Lexical Elements"]

       KCC uses the US ASCII character set.  There is provision for
using a separate target character set, different from the source set,
but currently the only such is a target set for WAITS ASCII.

       KCC has no maximum line length.  Error messages will quote
only the most recent part of an offending line if it is longer than 80
characters.

       KCC is standard in that nested comments are not supported.  If
the sequence "/*" is seen within a comment, a warning message will be
printed just in case the user neglected to terminate the previous
comment.

2 Identifier names
<Identifier names>

       KCC adheres to the standard definition of C identifier syntax,
allowing the character "_", the letters A-Z and a-z, and the digits
0-9 as valid identifier characters.  Identifiers may have any length,
but only the first 31 characters (case sensitive) are unique during
compilation, which conforms to the ANSI minimum.  This applies to all
of the following name spaces (as per H&S 4.2.4):
       Macro names
       Statement labels
       Structure, union, and enum tags
       Component (member) names
       Ordinary names:
               Enum constants and typedef names.
               Variables (see discussion of storage classes).

       However, the situation is different for symbols which must be
exported to the PDP-10 linker.  Such names are truncated to 6
characters and case is no longer significant.  The character '_'
(underscore) is transformed into '.' (period); the PDP-10 software
allows the additional symbol characters '$' and '%', but there is no
way to generate these with C unless special provision is made; see
#asm and '`' under "KCC Extensions".  See also the discussion of
exported symbols.

2 Reserved Words
<Reserved Words>
       KCC has a number of additional reserved words depending on
the portability level setting.  When KCC extensions are allowed, as
is normally the case, the following keywords exist:
               "asm"   - used for assembly code inclusion.
               "entry" - only in certain special circumstances.
                       See the discussion of libraries and entry points.
        When ANSI level is in effect (again, the normal case), there
are three additional reserved words.  All can be considered type
modifiers:
               "signed"        Indicates integer type is signed.  Implemented.
               "const"         Constant object (recognized but unimplemented)
               "volatile"      Volatile object (recognized but unimplemented)

2 Constants
<Constants>

       The types "int" and "long" are the same -- one PDP-10 word of
36 bits, with the high bit a sign bit.  Thus, the largest positive integer
constant is 0377777777777, or 34,359,738,368.
       The type "double" is represented by a PDP-10 hardware format
standard range double precision number (two words).  On KA processors
the format is slightly different.  The decimal range is from 1.5e-39
to 1.7e38, with eighteen digits of precision.
       Character constants have type "int".  Multicharacter constants
are non-standard and not supported.  Because characters are 9-bit bytes,
numeric escape code values can range from '\0' to '\777'.  Hexadecimal
character constants are not permitted.
2 Preprocessor directives
<Preprocessor directives>      [H&S 3, "The C Preprocessor"]

All standard C preprocessor directives are supported as described in
Harbison and Steele, including #elif and the "defined" operator.  This
page specifies how KCC behaves for situations which are implementation
dependent.

Lexical Conventions: [H&S 3.2]
       Preprocessor commands must have '#' as the first character on
the line; whitespace cannot precede it.  KCC allows whitespace between
the '#' and the command name (this is non-portable).  Formal parameter
names ARE recognized within character and string constants in macro
body definitions.  Comments are treated as whitespace and not passed
on to anything else; however, KCC will print a "Nested comment"
warning if it encounters a comment which contains "/*".  This serves both
to catch slightly non-portable usage (see H&S 2.2 p.12) and to detect
places where the user may have accidentally omitted a "*/".

Defining Macros: [H&S 3.3]
       When defining a macro, formal parameter names are recognized
within string and character constants, and therefore no check is made
for lexical correctness of such constants; this will change when the
ANSI standard firms up.  Any comments and whitespace in the macro body
are replaced by a single space.  KCC permits an argument token list
(arguments to a macro call) to extend over multiple lines.  Arguments
to a call are converted in a fashion similar to that for macro bodies
-- comments and whitespace are replaced by a single space.  Newlines
within an argument list are also considered whitespace.  However,
string and character constants in arguments are treated as tokens, and
their contents are not scanned for macro names.

Predefined Macros: [H&S 3.3.4]
       __LINE__ expands into the current decimal line number.
       __FILE__ expands into the current source filename.
These macros are furnished for compatibility with 4.2BSD.
Eventually ANSI will add __DATE__, __TIME__, and __STDC__, but not yet.
There are no other predefined macros.  Use the file <c-env.h> for
standard KCC environment definitions.

Undefining and Redefining Macros: [H&S 3.3.5]
       It is not an error to redefine an already defined macro, but a
warning message will be output unless the new macro definition is the
same as the old definition; i.e. redundant definitions are allowed.
There is no macro definition stack, i.e. definitions are not
pushed/popped by #define/#undef.  Attempting to define a macro named
"defined" will cause an error, since otherwise it would conflict with
the "defined" operator.

File Inclusion: [H&S 3.4]
       Included files may be nested to 10 levels.  Macro expansion
is done on the line if the filename does not start with '<' or '"'.
Filenames may contain '>' or '"' characters.
       #include <filename> looks only in the standard directory.
       #include "filename" looks first in DSK:,
               then in the -I paths in order of specification (left to right),
               then in the standard directory.
The standard directory for include files is C: on TOPS-20, <KC> on
TENEX, and [SYS,KCC] on WAITS, but this is site dependent in any case.

Conditional Compilation: [H&S 3.5] #if,#else,#endif,#elif,#ifdef,#ifndef
       The "defined" operator is recognized only within #if and #elif
expressions; note that neither #elif nor "defined" are in K&R, and
H&S is used as the reference here.  Within the body of a failing
conditional, only other conditional commands are recognized; all others,
even illegal commands, are ignored.

Explicit Line Numbering: [H&S 3.6] #line
       The information from #line will be used in KCC error messages.
Macro expansion is performed on the line.  Like all other
preprocessor commands, #line is eliminated and not passed on when
using the -E switch.  With regard to "#" alone at the start of a line,
remember that whitespace is allowed between the "#" and the command
name, thus KCC will not recognize a "#" alone as a synonym for "#line".
If there is no command name, the line is simply ignored without error.

KCC-specific Commands:
       #asm, #endasm
       These two commands cause the text delimited by them to be
macro-expanded (as for -E) and converted into an "asm()" expression
for direct inclusion in the output assembly language file.  This
currently only works inside functions.  This feature is very likely to
change, and should only be used where absolutely necessary.  Keep the
code simple, as someday KCC may want to parse it.
See "KCC Extensions" for additional details.
2 Storage classes
<Storage classes>              [H&S 4.3  "Storage Class Specifiers"]

KCC implements the standard storage classes of auto, extern, register,
static, and typedef (H&S sec 4.3), with the following notes:

REGISTER declarations are currently equivalent to AUTO.  KCC does not
assign variables to registers, and optimizations are performed without
using the "hint" given by REGISTER.  AUTO variables are almost always
more efficient, and in any case they are easier to implement.

KCC uses the "omitted-EXTERN" solution to deal with the question of
top-level definitions versus references (H&S sec 4.8.1).  That is,
omitting "extern" from a top-level declaration has the effect of
indicating that this is a defining declaration rather than a referencing
declaration.

Duplicate Declarations:
       As per H&S 4.2.5, KCC permits any number of external
referencing declarations, if the types are the same.  However, because
KCC treats omitted-extern declarations as defining declarations, these
references must all have an explicit "extern".  Likewise, an external
reference may be later followed by a defining declaration.
       KCC has additional special handling for declarations of
functions, because it can always be determined whether a function
declaration is a reference or a definition.  Any number of "static"
referencing declarations are allowed.  Conflicts are resolved as
follows: If an implicit external reference is followed by a static
reference or definition, KCC will assume the function is static.  It
is an error if the first reference has an explicit "extern".  It is
also an error if a static reference is followed by an external
reference or definition.  In either case compilation proceeds as if
the function was static.
2 Initializers
<Initializers>                         [H&S 4.6 "Initializers"]

       KCC adheres to H&S in all required respects.  The following
notes cover points which H&S describes as implementation dependent:

Optional braces are allowed for all non-aggregate initializers.  It is
permitted to drop braces from initializer lists under the rules described
in H&S 4.6.9, but KCC attempts to perform extremely stringent checking on
the "shape" of initializers, and will complain about too many or too few
braces.

FLOATING-POINT initializers may be of any arithmetic type.  KCC performs
compile-time floating-point arithmetic, so initializers for static and
external variables may use any constant arithmetic expression.

POINTER initializers, as described in H&S, must evaluate to an integer or
to an address plus (or minus) an integer constant.

ARRAY initializers are currently not allowed for automatic arrays.
This will change as ANSI permits it.

ENUMERATION initializers may use any integer (as well as enum) expression.

STRUCTURE initializers can initialize bit fields with any integer expression.
As for arrays, automatic and register structures cannot be initialized.
This will change as ANSI permits it.

UNIONS currently cannot be initialized.  This will change as ANSI
permits it.
2 Exported symbols
<Exported symbols>                     [H&S 4.8 "External Names"]

Symbols which are exported to the assembler file have special restrictions
imposed by current PDP-10 software, which only recognizes 6-character
symbols from the set A-Z, 0-9, '.', '$', and '%'.  In particular, case
is not significant.

Also, there is a distinction between symbols exported only to the assembler
and those exported both to the assembler and the linker.  While there is
technically no reason that any symbol has to be given to the assembler if
it is not also meant for the linker, in practice it is convenient for
debugging to have some "local" symbol definitions available so that DDT
can access them.

Here is a breakdown of export status by storage class:

typedef         = Exports nothing.  (Not a real storage class)
auto    = Exports nothing.  (Local stack variables use an internal offset)
register = Exports nothing.
static  = If not global scope (i.e. is within a block) then nothing exported;
               an internally-generated label is used.
       If global (top-level, within no block) then exported to assembler only.
               A label is made, but no INTERN or ENTRY statement.
extern  = Always exported to both assembler and linker.
        Omitted-extern:  a DEFINITION.  A label, INTERN, and ENTRY are output.
        Explicit-extern: a REFERENCE.  An EXTERN statement is output, but only
               if the symbol is actually referenced by the code.

Omitted-Extern:
       External declarations with no "extern" storage class
explicitly given are assumed to be external DEFINITIONS.  A defined
extern symbol will have its own label, plus an INTERN statement
telling the assembler that this is an externally visible symbol, plus
an ENTRY statement which allows library routine search to find this
symbol.  ENTRY statements will be put into the .PRE output file rather
than the main output file, since the assembler will need to scan them
prior to anything else.

Explicit-Extern:
       If an "extern" is explicitly given, the compiler assumes that
it is simply a REFERENCE.  Nothing will be done unless the symbol is
actually referenced by the code, in which case an EXTERN line will be
generated in the assembler output for that file.  The reason for the
reference count check is that each assembler EXTERN constitutes a
library search request which must be satisfied by a module with the
corresponding symbol declared as an ENTRY.  Unless this is only done
for actual references, the many superfluous declarations found in *.h
files will tend to cause many unneeded library modules to be loaded.

Static symbols:
       Note that global static symbols are passed on to the assembler
even though this is not necessary; an internally-generated label could
be used just as well.  The main reason this is done is to facilitate
debugging with DDT, otherwise it could be difficult to identify static
functions when looking at the machine instructions.  This may cause
problems if identifiers which are otherwise distinct become identical
as a result of the conversion to a 6-char PDP-10 symbol.

However, a symbol declared static within a given source file will
never be visible from another file that you may link later with it.  For
example, a function declared as

       static char *function()
       {
          ...
       }

will only be visible from other functions within the same source file.
This allows several modules to have functions with the same name
modulo the six character limit, as long as no two of the functions are
both extern.  It is STRONGLY recommended for multi-module programs
that you declare as many functions as possible to be "static".
2 Libraries and Entries
<Libraries and Entries>

REL files to be converted by MAKLIB into object libraries must have
any external symbols declared with ENTRY rather than merely INTERNing
them, and this declaration must be at the start of the REL file.  In
order to do this, KCC generates a *.PRE "prefix" output file in
addition to the *.FAI or *.MAC output file, and invokes the assembler
in such a way that the PRE file is assembled before the main file.
This file contains ENTRY statements and any other predeclarations that
are needed before the assembler sees the actual code.  Normally the
user will never see this file, but if the -S switch is used then it
will be left around as well as the FAI/MAC file.  Note that if running
the assembler manually on the FAI/MAC file, you must invoke it with
a command line like this:
       [@]FAIL                         [@]MACRO
       [*]FOO=FOO.PRE,FOO.FAI          [*]FOO=FOO.PRE,FOO.MAC


COMPATIBILITY INFO:
       For compatibility, KCC will continue to recognize an "entry"
keyword for some time to come.  The following describes the obsolete
syntax:

To declare an entry, use the "entry" keyword at the start of the source,
before any other declarations:

    "entry" ident ["," ident ...] ";"

i.e., the keyword "entry", followed by a list of identifiers separated
by commas, followed by a semicolon.  This is passed on essentially
verbatim to the assembler, and has no other affect on compilation.  It
should be used at the start of any runtimes or other file intended for
a library, on all variables and functions that should be visible as
entries in the library.

Note that it should still be safe to use "entry" as a non-keyword; if
used other than at the start of the file it will be treated like any
other normal identifier.

To repeat: the "entry" statement is no longer necessary.  It should not
be used in new code, and should be removed from old code.
2 Types
<KCC Types>                            [H&S 5 "Types"]

STORAGE UNITS:
       A KCC storage unit (what "sizeof" returns) is a 9-bit byte, and
there are 4 of these in each 36-bit PDP-10 word, ordered left to right
from most significant to least significant.

INTEGERS:
       KCC's integer types have the following sizes:
               Type    Bits    "sizeof" value
               char    9       1
               short   18      2       (PDP-10 halfword)
               int     36      4       (PDP-10 word)
               long    36      4       (PDP-10 word)

All of these types may be explicitly declared as "signed" if ANSI
level is in effect.  Single variables declared as "char" or "short"
are stored right-justified into a full word; only when packed into an
array or structure are they stored as 9-bit (or 18-bit) bytes, left to
right within each word.

UNSIGNED INTEGERS:
       Unsigned integers are fully implemented; any integer object
may be either "signed" or "unsigned", and both forms use exactly the
same amount of storage, with the high order bit considered the sign
bit (if the object is signed).  However, because the PDP-10 has
no instructions specifically for unsigned data, some operations are
slower for unsigned ints.
       Addition (+) and subtraction (-) are the same.
       == and != are the same.
       Left shift (<<) always uses the LSH instruction (logical shift).
       Right shift (>>) uses LSH for unsigned, ASH for signed operands.
               ASH is an arithmetic shift which propagates the sign bit.
       <,<=,>,>= are slightly slower for unsigned operands.
       Casts to floating-point are slower.
       Multiply (*) is also slightly slower.
       Divide (/) and remainder (%) are much slower.

CHARACTER:
       The plain "char" type is "unsigned char".  Sign extension is
done only if chars are explicitly declared as "signed char".  Normally
a char is 9 bits, although it is possible to compile code using a
7-bit assumption (see the section on char pointer hints).
       Old versions of KCC used to store the chars of a string
constant in 7-bit form, packed 5 to a word (ASCIZ format); this is no
longer the case and string constants are normally now full 9-bit char
strings.
       An extension to KCC provides five additional types of "char"
objects, specified as "_KCCtype_charN", where N is the number of bits
in the char and may be one of 6, 7, 8, 9, or 18.  All may be signed
or unsigned; their "plain" form is unsigned.  See the "KCC Extensions"
section for additional details.


FLOATING-POINT:
       The "float" type is represented by one word in the PDP-10
single precision floating point format; there is one bit of sign, 8
bits of exponent, and 27 bits of mantissa.
       The "double" type uses two words in the PDP-10 double
precision format.  (Note that for the KA-10 this is a software format
rather than the more usual hardware format.)  The exponent range is
approximately 1.5e-39 to 1.7e38 in both formats; single precision has
about 8 significant digits and double precision has 18.  See a PDP-10
hardware reference manual for details.
       KCC also supports the new ANSI "long double" type when ANSI
level is in effect.  Currently this is the same as "double" but this
will probably change on KL-10s to use "G" format floating point, which
has an exponent range of 2.8e-309 to 9.0e307 but only 17 significant
digits.
       The (double) type can represent all values of (long).  That
is, conversion of a (long) to a (double) and back to (long) results in
exactly the original value.

POINTERS:
       Pointers are always a single word, but can have two different
internal formats.  Pointers to chars, shorts, or bit fields, are PDP-10 byte
pointers (local or one-word global); pointers to all other objects are
PDP-10 global word addresses.  Byte pointers point to the byte itself
rather than to the preceding byte, thus LDB instead of ILDB is done
to fetch the byte.
       It is very important to ensure that functions which return
values of (char *) be properly declared; likewise, any function
arguments which are expected to be (char *) must be cast to this if
necessary.  Operations which expect a char pointer will not work
properly when given a word pointer, and vice versa.  See the section
on "pointer hints" near the end of this file for additional information.
       The "NULL" pointer is represented internally as a zero word,
i.e. the same representation as the integer value 0, regardless of
the type of the pointer.  The PDP-10 address 0 (AC 0) is zeroed and
never used by KCC, in order to help catch any use of NULL pointers.

ARRAYS:
       The only special thing about arrays is that arrays of chars
consist of 9-bit bytes packed 4 to a word, and arrays of shorts have
18-bit halfwords packed 2 to a word; all other objects occupy at least
one word.

ENUMERATIONS:
       KCC treats enumeration types simply as integers.  In the words
of H&S 5.6.1, KCC uses the "integer model" of enumerations, which is
what ANSI has adopted.

STRUCTURES and UNIONS:
       Structures and unions are always word-aligned and occupy a
whole number of words.  Unlike the case for other declarations of type
"char" or "short", adjacent "char" and "short" members in a structure
are packed together as for arrays.  Structures and unions may be
assigned, passed as function parameters, and returned as function
values.
       Bit fields are implemented; the maximum size of a bit field is
36 bits.  They may be declared as "int", "signed int", or "unsigned
int"; plain "int" bitfields are unsigned.  Fields are packed left to
right, conforming to the PDP-10 byte ordering convention.  It's too
bad that C does not allow pointers to bit fields, because the PDP-10
byte pointer instructions are perfectly suited to this application!

FUNCTIONS:
       As per H&S.  A pointer to a function is simply a word address.
For the gory details of function calls and stack usage, see the
"Internals" section.

TYPEDEFS:
       As per H&S.  With regard to 5.11.1, KCC has no problems with
redefining typedef names in inner blocks.
2 Type Conversions
<KCC Type Conversions>                 [H&S 6 "Type Conversions"]

Integer conversions:

       There are no representation changes when converting any
integer type to any other integer type of the same size.  Sign
extension and truncation are performed when necessary to convert from
one size to another.  Conversions from pointers are done as per H&S
6.3.4; a pointer is treated as an unsigned int and then converted to
the destination type using the integral conversion rules.

Floating-point conversions:

       Casting (float) to (double) or (long double) retains the
same value.  However, (double) to (long double) may lose one digit
of precision, depending on the implementation chosen for (long double).
       A cast to (float) of an int may lose some precision,
although a char or short can always be fully transformed.  (double)
can retain the exact value of an int or long int, which can be
restored to its original value by converting back to int.
       Casting an unsigned integer to a floating-point value always
results in a positive number.

Pointer conversions:

       There are a great variety of pointer conversions possible; however,
you can make sense of them if you simply note the following Three Laws of
Pointers:
       (1) Nihil ex nihilis -- a NULL pointer always remains NULL.
       (2) Smaller is finer -- a pointer to any object can always
               be converted into a pointer to a SMALLER (or equal-sized)
               object, without losing any information.  Converting it back
               to the original type restores the original value.
       (3) Bigger is blunter -- converting a pointer to any object to
               a pointer to a LARGER object will force the pointer to
               have an alignment suitable for that of the larger type;
               any fine details of positioning within the new type are lost,
               and the original pointer cannot be recovered (unless it
               was already properly aligned to begin with).  The new
               object pointed to will completely enclose the smaller
               object.

Specifically:
       Chars are aligned on 9-bit byte boundaries, shorts on halfword
boundaries, and all other data types on word boundaries (with the
exception of bitfields and the _KCCtype_charN types).  Converting any
pointer to a (char *) and back is always possible, as a char is the
smallest possible object.  If the original object was larger than a
char, the char pointer will point to the first byte of the object; this
is the leftmost 9-bit byte in a word (if word-aligned) or in the halfword
(if a short).

       A cast to (int *) of a char pointer produces an address that
points to the word that the char pointer indicates, regardless of
which byte in the word was being pointed at.

       Pointer casts are not always trivial, but they are reasonably
fast (from 1 to 4 instructions depending on the alignment requirements).

       The only exception to the 3 rules is the case of pointers to
objects of _KCCtype_charN types (see the KCC extensions section).
Casting any pointer to or from those types is performed by first
converting the original pointer into a word pointer (thus forcing
alignment to a word boundary) and then applying the desired
conversion.

Assignment conversions:

       KCC permits any casting conversion during an assignment, but
will complain about an implied cast if the conversion is not one of
the legal assignment conversions.

Unary conversions:

       The "Usual Unary Conversions" are different for CARM and ANSI:
       Original operand type               Converted type
                                       CARM            ANSI (default)
       float                           double          float
       signed char/short/bitfield      int             int
       unsigned char/short             unsigned int    int
       unsigned bitfield               unsigned int    *int or @unsigned int
                       * = if bitfield has fewer bits than an int.
                       @ = if bitfield has more (or same #) bits than an int.

       The first difference is (float) to (double).  What H&S
describes as an "optional compilation mode" to suppress the unary
conversion of (float) to (double) is always in effect for ANSI level,
as ANSI is allowing this feature as part of the standard conversions,
and the resulting PDP-10 code is much more efficient.  If ANSI level
is not selected, then all (float) values will be implicitly converted
into (double) as per the old C standard.  Note that all portability
levels require that (float) values always be promoted to (double) in
function arguments, so this particular implicit conversion is always
in effect.
       The second difference is in the integer promotions.  CARM uses
what ANSI calls "unsigned preserving" rules; ANSI uses "value preserving"
rules, meaning that a conversion to a wider type should always result in
a signed integer type regardless of whether the shorter type was unsigned
or not, as long as the new type can represent all values of the old type.

Binary conversions:

       As already noted, (float) values are not always implicitly
converted to (double) before being operated on, if ANSI level is in
effect.  There is one other difference between ANSI and CARM
with respect to the usual binary conversions:
       If one operand is "long" and the other is "unsigned int",
               CARM: makes both "unsigned long".
               ANSI: makes both "long".
2 Expressions
<Expressions>                          [H&S 7 "Expressions"]

As per H&S, with the following notes:

[7.2.3] Overflow and underflow are neither noticed nor handled.  The result is
whatever the PDP-10 hardware gives in those cases.

[7.3.3] KCC correctly does not use parentheses to force the usual
unary conversions.

[7.3.5] KCC permits component selection for structures returned from
functions, except when the component is an array.  That is, "f().a"
will work and will select component "a" of the returned structure, but
it is not legal to do "f().array[i]".  This point may be clarified in
the future by the ANSI draft standard.

[7.3.6] KCC correctly does not allow formal parameters of type "function",
so the issue of converting this type does not arise.
       KCC does not currently do any checking to see if the types of
the arguments match the types of the parameters for the called
function.  When ANSI function prototypes are implemented, this will
change.  KCC does not issue any warnings about discarded function
return values.

[7.4.1] Casts - KCC correctly implements "narrowing" casts for floating point
and for integers.

[7.4.2] "sizeof" - the result of "sizeof" currently has type (int).
This is far more than adequate for any possible size value.  The
result of sizeof is ALWAYS in terms of 9-bit bytes, regardless of the
setting of -x=ch7, with two exceptions: the size of a char is always
1, and the size of a char array is the # of elements (chars) in the
array.  This is true no matter how many bits are in a char.

[7.4.6] '&' - Attempting to apply '&' to a "register" variable
simply causes KCC to issue a warning message and force the variable to
class "auto".  KCC does not permit '&' to be applied to array or
function names; this will change as ANSI permits it.

[7.4.7] '*' - Applying the indirection operator to a null pointer (0)
simply retrieves (or sets) the contents of AC 0, which should always
be zero if nothing accidentally sets it.  Treating the null pointer as
a char pointer will always retrieve zeroes and set nothing.

[7.5.1] '*','/','%' -
       Division by zero is a no-op; the value will be that of the dividend.
Truncation is always toward zero whether the operands are negative or
not:
               5/2 == (-5)/(-2) == 2
               (-5)/2 == 5/(-2) == -2
       For the remainder operator, (x)%0 gives unpredictable garbage.
The sign of the remainder will be the same as that of the dividend:
               5%2 == 5%(-2) == 1
               (-5)%2 == (-5)%(-2) == -1
       These operations are slower for unsigned than for signed operands.
Division in particular is slow.

[7.5.2] '-' - The type of the difference between two pointers is (int).

[7.5.3] '<<','>>' - Left shift (<<) always uses logical shifting; bits
can be shifted into the sign bit.  Right shift uses logical
shifting for unsigned integer types (the sign bit is shifted out, and
0-bits shifted in), but uses ARITHMETIC shifting for signed integer
types (the sign bit is propagated).
       Using a negative value for the right operand reverses the
direction of the shift.  Using a large number (36 or greater) simply
shifts everything to oblivion as expected.  Note that it is possible
to use left-shift arithmetic shifting (the ASH instruction) by giving
a negative shift distance to >>; of course this is very non-portable.

[7.7] '?' - KCC correctly permits the result of a conditional expression
to have structure, union, enumeration, or void types.

[7.8.1] Structure and union assignment is (of course) permitted.

[7.8.2] 'op=' Compound assignment -
       KCC does not support the obsolete "=+" compound assignment forms.

[7.10] Constant expressions -
       KCC can and does evaluate constant floating-point expressions at
compile time.  Almost all casts are also allowed, except certain
pointer-pointer conversions where the result would depend on whether
the program was running multi-section.
       KCC is currently somewhat too liberal about the constant
expressions in preprocessor #if statements; it allows the use of any
integral constant expression, including enum constants and sizeof
operators.  This is possible because the preprocessor is integrated
with the compiler.  The eventual fix for this will probably issue a
warning but permit the usage.

[7.11] KCC correctly does not interleave expression computations.

[7.12] KCC currently does not issue any warnings about discarded values.
This may change.

[7.13] KCC does some optimization of memory accesses, but not much.
This may change with the coming of ANSI's "volatile" type modifier.
2 Statements
<Statements>                           [H&S 8 "Statements"]

As per H&S, with the following notes:

[8.7] switch statement - KCC permits the control expression of a switch
statement to be of any integral or enumeration type.
2 Functions
<Functions>                            [H&S 9 "Functions"]

9.4 Adjustments to Parameter Types
       Parameters which are declared as "char" or "short" are really
handled as type "int", and "float" is really "double"; however, KCC
does not implement narrowing as per 9.4, because the description of
this is too unclear -- what happens if such a parameter is used as
an lvalue?
       The situation will improve with ANSI function prototypes.

       KCC follows the language strictly and does not permit formal
parameters of type "function returning...".
1 Run-time C Library
<KCC Run-time C Library>               [H&S 11 "The Run-time Library"]

       ALL of the facilities described in H&S chapter 11 are
implemented as described.  In addition, various UN*X system call
emulations and standard library routines are also supported.  Users
are advised to read H&S or a UPM (Unix Programmer's Manual) for
complete descriptions; only KCC-specific differences are documented
here.  If the file LIBC.DOC exists (unfinished as of this writing) it
will furnish more details on library routines.

[11.1] Character Processing
       All CARM facilities supported.  <ctype.h> must be included.
All work with any 9-bit character value and EOF; most are macros and very
fast.  None evaluate their argument more than once.
       The ispunct() function differs from the CARM description,
which claims that "space" is included in the set.  Neither the BSD nor
the ANSI draft version of ispunct() does this however, so we have
assumed that H&S made a mistake here, and the KCC version excludes
"space".
       Neither BSD nor ANSI have these facilities (KCC and CARM do):
               iscsym, iscsymf, isodigit, toint, _tolower, _toupper.
       BSD has isascii and toascii; ANSI doesn't.  KCC and CARM do.
       BSD's implementation of tolower and toupper is incorrect (corresponds
               to _tolower and _toupper).  KCC's corresponds to CARM.

[11.2] String Processing
       All CARM facilities supported.  <string.h> must be included.
This includes the functions "strpos", "strrpbrk", and "strrpos", which are
not in either BSD or ANSI.
"index" and "rindex" are recognized as synonyms for "strchr" and "strrchr".


[11.3] Mathematical Functions
       All CARM facilities supported.  <math.h> must be included.
These are mostly derived from the Portable Math Library.

[11.3.5] For atan2(0, 0), the value 0 is returned and errno set to EDOM.

[11.3.22] sinh() of a negative argument that is too large returns the
       largest representable negative float-point number.

[11.3.25] According to CARM, "If the argument is so close to an odd multiple
       of pi/2 that the correct result value is too large to be represented,
       then the largest representable positive floating-point number is
       returned and the error code ERANGE is stored into the external
       variable errno".  The actual error check done is to see if for tan(x),
       cos(x) == 0.  If so, the error behavior above is done.

[11.4] Storage Allocation
       All CARM facilities supported.  <stdlib.h> can be included.
This includes "cfree", "clalloc", "mlalloc", and "relalloc", which are
not in ANSI (and not needed); nor are they in BSD except for "cfree".
Since "long int" is the same size as "int" for KCC, the long and int
forms of calls are functionally identical.
       Despite CARM's claim that "the facilities described in this
section are predeclared in the C compiler, and so their use does not
require the inclusion of a library header file", that does not appear
in general to be the case.  So you should either include <stdlib.h>
which is an ANSI invention declaring these functions (among others),
or you should be VERY careful about pre-declaring these functions
properly, and be SURE that routines which expect a char pointer
argument are given one.
       A common mistake is failing to declare malloc(), so that the
compiler is unaware of the proper conversions that must be applied to
the return value (which is a PDP-10 byte pointer).  This sort of type
mismatch error can go undetected on some machines but will cause you
all kinds of mysterious grief on the PDP-10.
       Using brk() and sbrk() is not prohibited, but doing so is
guaranteed to confuse the storage allocator and cause problems if you
also use malloc() and friends.

[11.4.1] calloc() will return NULL if either argument is zero (as per ANSI).
[11.4.2] cfree() and free() are identical and interchangeable.
       CARM claims that for maximum portability it is best to use
cfree() only to deallocate memory allocated by calloc(), and free()
only to deallocate memory allocated by malloc().  However, the ANSI draft
has flushed cfree() altogether.

[11.4.3] clalloc() == calloc() on the PDP-10.
[11.4.4] free() does nothing if given a NULL argument (as per ANSI).
If given a bad pointer, free() calls abort() after sending the following
message to stderr: "free(): tried to free invalid block"

[11.4.5] malloc() also returns NULL if given a zero argument (as per ANSI).
[11.4.6] mlalloc() == malloc() on the PDP-10.

[11.4.7] realloc() behaves as per ANSI for unusual arguments: if the
pointer is NULL, it acts like malloc(); if the size is zero, it acts
like free() and returns NULL.  If given a bad pointer, realloc() calls
abort() after sending the following message to stderr:
       "realloc(): tried to reallocate invalid block".
[11.4.8] relalloc() == realloc() on the PDP-10.


[11.5] Standard I/O
       All supported.  <stdio.h> must be included.
In addition, all 4.2BSD functions are implemented.  The additional
facilities provided are:
       fdopen, fileno, getw, putw, setbuf, setbuffer, and setlinebuf.

       Note that some facilities, in particular putc and getc, are
implemented as macros.

       In general, the sequence CR-LF is converted to LF on 7-bit
input, and LF converted to CR-LF on 7-bit output.  This conversion is
performed by the system call read/write functions and not by STDIO,
however.  See the notes on fopen [11.5.10] below for details.

[11.5.7] fflush() called on an input stream flushes any buffered but
unread data.

[11.5.10] fopen() implements all the H&S type specification characters,
with certain defaults and settings appropriate for the PDP-10 world:

String    Mode Start   Description
"r", "rb"  R   0       Open existing file for reading.  Error if not found.
"w", "wb"  W   0       Create a new file for writing.
"a", "ab"  W   EOF     Append to existing file (create new if necessary).
"r+","r+b" R/W 0       Open existing file for updating.  Error if not found.
"w+","w+b" R/W 0       Create a new file for updating.
"a+","a+b" R/W EOF     Append to existing file (create new if necessary).

       Note that on TOPS-20 and TENEX, files have version numbers, and
writing a file never truncates an existing one; "w" and "w+" always create
new versions.
       A stream can be either "text" or "binary", as per the ANSI
draft description; a "b" in the string specifies binary.  The
characteristics of the two types of streams are:
               Bytesize(old)   Bytesize(new)   LF-conversion
       TEXT    <file's> or 7      7            yes if size 7
       BINARY  <file's> or 9      9            no, never

       When an OLD, existing file is opened (for reading, appending,
or updating), normally the bytesize of the file is used as the
bytesize of the stream.  If the file bytesize is 0 or 36 then the
default (7 or 9) is used instead.  If the file bytesize is anything
other than 0,7,8,9, or 36 then the behavior is undefined.
       When a NEW file is created, its bytesize will be that of the
stream, which is normally 7 for text, 9 for binary.  Note that older
versions of a file may have a different bytesize -- the notion of
checking these to set the bytesize was considered, but rejected in the
interest of simplicity.
       Whether LF conversion is performed on the stream is a little
more complicated.  Normally a text stream is always converted, but if
the stream bytesize is anything but 7 then conversion does NOT happen.
A binary stream is never converted, regardless of the bytesize.

       The user can override either the bytesize or the conversion
by adding explicit specification characters:
       "C"     Force LF-conversion.
       "C-"    Force NO LF-conversion.
       "7"     Force 7-bit bytesize.
       "8"     Force 8-bit bytesize.
       "9"     Force 9-bit bytesize.

       These are KCC-specific however, and are not portable to other
systems.

[11.5.11] fprintf(): see the notes on printf() [11.5.23].

[11.5.14] fread() is implemented assuming that the input stream is open
in 9-bit binary mode, such that all 36 bits of an int can be read with four
successive bytes.  No byte-size or mode checking is done by fread(), so
it is the user's responsibility to make sure the stream is open correctly.

[11.5.15] freopen(): see fopen() [11.5.10]
[11.5.16] fscanf(): see scanf() [11.5.28]
[11.5.19] fwrite(): see fread() [11.5.14]

[11.5.23] printf(): An additional facility has been provided for the
user to assign his own conversion specification character to arbitrary
functions.  Until this is documented, see the printf source for details.

[11.5.28] scanf(): Common sense was used in implementing the various
conversion routines when there was doubt about CARM's description:

       For numeric conversions ('d', 'u', 'o', 'x', 'f'), there must
be at least one digit present for the parse to succeed, despite CARM's
claim that "some number" of digits, "possibly none" are allowed.  For
string scanners ('s' and '['), at least one character must be read.

[11.5.29] See printf() [10.5.23]
[11.5.30] See scanf() [10.5.28]

[11.5.34] The number of characters able to be pushed back with ungetc
is a site-dependent option available at library compile-time.
_SIO_NPBC in STDIO.H defaults to 1.

[11.6] Error Codes
       Somewhat disorganized at present.  EDOM and ERANGE exist.  The
standard set of UN*X errors are present now, and are used by the UN*X
simulation calls, but are not very useful.

Additional STDIO routines:

       sopen(): opens a string as a source or destination for I/O.
The first arg is a string pointer, second is a standard fopen type
specification.  The implementation of this is not yet complete: 'a'
(append) mode does NOT do the obvious thing; place has been kept for
'w+' to automatically expand the given string if the end is reached
(assuming it was allocated by malloc); the file pointer cannot be
repositioned (e.g. a string can be scanned only once).  These things
will be finished some day.
1 C Library - UN*X System Calls
<KCC C Library - UN*X System Calls>

       The KCC runtime environment is intended to resemble that of UN*X
to a limited extent.  For example, main() is invoked with "argc, argv"
arguments parsed from the command line, and several system calls are
emulated.  This emulation is not intended to be complete, and the calls
exist primarily to help transport software to and from UN*X systems.
Whenever possible, the standard portable routines as described in
the previous page (H&S 11) should be used instead of these "system calls".

Emulated UN*X System Calls:
       access
       close
       dup, dup2
       execl, execle, execlp, execv, execve, execvp    (no real envp)
       fork, vfork
       getpid          T20 returns ((job#)<<9)+(fork)
       gettimeofday
       lseek, tell
       open, creat             (flags as for BSD, mode not supported)
       pipe                    (monitor must have PIP: device)
       read
       rename
       sbrk, brk
       signal, sigsys, kill    - (crude, not really implemented)
       sleep, pause
       stat, fstat
       time
       unlink
       wait
       write
1 C Library - UPM(3) Library Functions
<KCC C Library - UPM(3) Library Functions>

       Other facilities exist which are mostly derived from section 3 of
the UPM; some are slated for inclusion in the ANSI C standard, some aren't.

       abort           (ANSI and BSD)
       assert          (ANSI and BSD)
       atoi,atol,atof  (ANSI and BSD)
       bcopy,bzero,bcmp        (BSD)
       ctime,localtime (ANSI and BSD)
       memchr,memcmp,memcpy,memset     (ANSI and BSD)
       mktemp                  (BSD)
       perror          (ANSI and BSD)
       qsort           (ANSI and BSD)
       setjmp,longjmp  (ANSI and BSD)
       system          (ANSI and BSD)
       varargs                 (BSD)

KCC-specific stuff:
       pfork           It's best if only the compiler uses this.
       regex                   (BSD) - from GNU, needs more work
1 Language Extensions
<KCC: Language Extensions>

       KCC implements a number of extensions to the C language which
are intended to allow for better integration with other PDP-10 software.
It is possible to disable these extensions by means of the -P switch.
These extensions are:
       [1] The "entry" keyword (obsolete).
       [2] The '`' identifier quoting mechanism.
       [3] The #asm and asm() assembly language mechanism.
       [4] The "_KCCtype_charN" data types.


[1] The "entry" keyword.

       The use of this statement has been described earlier in the
discussion of library entry points.  However, it is an obsolete feature
and should no longer be needed for any purpose.  Future versions of KCC
will flush it if no one objects.


[2] Identifier Quoting

       The current PDP-10 software allows symbols to have 6 characters
from the set A-Z, 0-9, ., %, $.  KCC maps 0-9 to 0-9, a-z and A-Z to A-Z,
and '_' to '.'.
       KCC supports a non-standard extension to C whereby any characters
enclosed within accent-grave ('`') marks are treated as a valid C identifier.
This allows the user to specify identifiers containing the characters '$'
and '%', as well as any arbitrary character, although KCC will print a
warning if a character not in the PDP-10 set is seen.
               Examples: `$FOO`, `OPENF%`, `$$BP`, `switch`

       This mechanism should be used ONLY where necessary.  It is not
portable and should be conditionalized if used in portable code.
Identifiers defined in this way should be CONSISTENTLY quoted in this
way, because they are stored internally with '`' as their first
character to distinguish them from normal unquoted identifiers and
keywords.  This avoids potential confusion and allows one to specify
an identifier which is otherwise a reserved keyword, such as `if`.
KCC Extension - [3] #asm and asm() - Assembly language inclusion

       Many C compilers have an escape mechanism which allows the
programmer to specify a series of assembly language instructions within
a C program.  KCC's means of doing this is with the "asm()" expression,
which looks exactly like a function call.
       Currently only one argument is allowed to asm() and this must
be a string literal.  The text of the string is simply passed directly to
the assembler output file at that point in the compilation.
       There is also a preprocessor command called #asm, which converts
everything up to an #endasm into an asm() expression.  This is convenient
for very long stretches of assembler code, or where the enclosed text
must be macro-expanded.

       Invoke %%CODE or %%DATA to switch between assembling pure and
impure (variable) code/data.  #asm inclusions will always begin in the
code segment, and must always end in the code segment.  Never use
%%CODE when already in the code segment, or %%DATA when already in the
data segment.

       Because asm() is syntactically an expression, it can only
appear where an expression is legal.  However, any attempt to use it
anywhere but as the sole contents of a function body is highly fraught
with peril.  If it is necessary to specify some assembler directives
separate from any function, an acceptable way of doing this is by
means of a static dummy function, such as:
       static void
       dummyfunct(){
               asm("%%DATA\n STUFF: ASCIZ/foo/ \n %%CODE\n");
       }

       It cannot be repeated too often that use of asm() is strongly
discouraged.  It is possible that someday its functionality will be
extended to the point that KCC can parse and understand the contents
(thus, for example, references to C auto variables would be allowed);
however, this would primarily be for the purpose of allowing KCC to
generate .REL files directly rather than to encourage wider use of asm().

       At the start of the assembler file, a PURGE is done of all the
assembler IF pseudos.  Thus, assembler code cannot use any IF pseudo
tests, nor macros which use them.  Incidentally, attempting to use a
SEARCH MONSYM will cause FAIL to barf several times with a "FAIL BUG
IN SEARCH" message, due to the lack of the IF pseudos; this is
harmless, but annoying.  MACRO does not have this problem.

       Be very careful about using apostrophes (') or double-quotes
(") within #asm code, because they will be interpreted as the start of
character or string literals.  Within asm(), these can be quoted with
backslash.
KCC Extension - [4] "_KCCtype_charN" data types

       Normally the "char" data type is 9 bits.  In the PDP-10 world
much existing software depends on 7-bit characters, and to make it
easier to write the necessary system-dependent code a 7-bit char data
type was introduced and generalized.  The 5 possible char sizes (6, 7,
8, 9, and 18) were chosen because it is only for those sizes that
OWGBPs exist (one-word global byte pointers), and thus only those sizes
can be guaranteed to work when using extended addressing.

       Any of the char types can be signed or unsigned; if the plain
form is used, unsigned is assumed.  Narrowing and widening is done
properly whatever the size.  Note that the 18-bit size corresponds
to "short"; it is included mainly for completeness rather than in the
expectation that someone would actually use it.  The 9-bit size
is the same as regular "char", unless the -x=ch7 option is in effect,
in which case "char" is the same as the 7-bit size.

       These types can normally be used just as for "char".  However,
there are some special effects associated with certain operations:
       (1) "sizeof" of a N-bit char array returns the number of N-bit
               chars (elements) in the array.  Usually this is what you
               want.  Giving this number to malloc will cause problems
               only for chars of 18 bits.
       (2) A cast (explicit or implicit) of a string literal to a
               N-bit char pointer will cause the string literal to be
               stored as N-bit bytes.  This is NOT strict C, which would
               merely convert the char pointer; however, this is the
               most useful interpretation.  This permits the somewhat
               bizarre construct of using a string literal to make
               an array of 18-bit bytes (this is the only aspect where
               "_KCCtype_char18" differs from "short").
       (3) 6-bit string literals are stored as SIXBIT rather than using
               the low 6 bits of the ASCII char values.  Note that while
               such strings are null-terminated, null is a valid
               SIXBIT character (meaning space).  The value of invalid
               SIXBIT characters is undefined.
       (4) Function parameters cannot be declared to have a type of
               char size 7 or 8.  The reason is complicated; see
               the last part of this section.

Some examples:
       _KCCtype_char6 tmp[] = "tmp";   /* A 4-element array of SIXBIT chars */
       _KCCtype_char7 wd[5] = "word";  /* A 5-element array of 7-bit chars */
       _KCCtype_char8 packet[40];      /* A 40-element array of 8-bit chars */
       _KCCtype_char18 useless;        /* Same as "unsigned short useless;" */
       _KCCtype_char7 *arg = "text";   /* A pointer to an ASCIZ string */
       _KCCtype_char6 *pt6;            /* A pointer to a 6-bit char string */

       arg = "othertext";      /* Implicit conversion to ASCIZ */
       pt6 = "dskdmp";         /* Implicit conversion to SIXBIT */
       pkg_call((_KCCtype_char7 *)"argtext");  /* Explicit cast to ASCIZ */

Portability issues:

       The long names for these types were deliberately chosen so as to
minimize the chances of possible conflict with identifiers in software
imported from elsewhere, and to discourage the indiscriminate (non-portable)
use of the types.  Note that users who must make heavy use of them (for
good reasons, we hope) can simply use typedefs or #defines at the start
of their code in order to equate them with simpler names; e.g.

               #define char7 _KCCtype_char7    /* Use shorter typename */

       This method also has the advantage of localizing non-portable
constructs in a way that gives others a fighting chance to port the
software elsewhere by changing the initial definitions.

Storage:

       There are a few aspects of the way N-bit char objects are stored
which may be surprising at first.  Char arrays are always packed starting
with the leftmost byte in a word; however, single-char objects (such as
"char c;" have their value stored in the rightmost ALIGNED BYTE.
       This is a necessary consequence of the fact that the '&'
operator applied to a char object must result in a valid char pointer,
and the very strong desire that all C code work with extended addressing.
There are only a few possible kinds of OWGBPs and they all require this
alignment.  For 6, 9, and 18 bits this causes no difficulty since bytes
of those sizes completely fill a word, and there are no unused low-order
bits; thus char values may be stored completely right-justified, and in
some cases full-word operations can be performed on them.
       However, for 7 and 8 bit bytes the rightmost byte will leave 1
and 4 unused low-order bits, respectively, and this is where KCC
stores the values for such objects.  Debuggers examining a program with
IDDT may be surprised that "_KCCtype_char8 foo = 1;" results in a
word labelled FOO with its value 020 instead of 1.
       This alignment restriction causes no real problems except for
the obscure case of function parameter declarations.  In the absence
of ANSI function prototypes, the default "function argument
promotions" are performed when a call is made; all integers shorter
than (int) are converted to (int) and passed as such.  But this means
that the integer value is right-justified; if the function parameter
was declared to match the promoted type (int) then all is well, but
attempts to declare it as a 7 or 8 bit char will just result in a
confused function (attempts to read the parameter value or take its
address will fail since the value is not properly aligned).  This
could be fixed by having KCC do an implicit conversion upon function
entry, but it is far simpler and much, much more efficient to simply declare
such parameters as (int) in the first place.
       If the code will never be run on a KL then, of course, this and
many other things could be simplified.
1 Making system calls
<KCC: Making system calls>

The jsys() function has been provided for ease in performing simple
TOPS-20 monitor calls without being forced to resort to asm().
The calling convention is:

       int jsys(num, acs)
       int num, acs[5];

The jsys number is given in num, and registers 1 through 4 are given
and returned in the acs array.  Offsets in acs correspond to machine
registers; thus acs[1] goes into AC1 before the call and then takes
tha value of AC1 after the call.  acs[0] is not referenced.  The
function returns:
       1 if it skipped (known success)
       0 if ERJMP taken on TOPS-20 (or .ICILI on TENEX) (known failure)
       -1 if didn't skip on TENEX (possible failure)

       On TENEX, the PSI system is enabled in an attempt to trap
       .ICILI (illegal instruction) interrupts, and emulate ERJMP if
       possible.  If .ICILI is detected, 0 will be returned from jsys().
       If no .ICILI is detected but the jsys didn't skip, -1 will be returned;
       there is no general way to know for sure in this case whether
       the non-skip indicates success or failure.
1 Internals
2 Memory organization
<KCC Internals - Memory organization>

       A C program compiled by KCC has four distinct memory regions:
data, text (code), stack, and free.
       DATA - This contains all user-declared data variables, both
               initialized (set to user's specification) and
               un-initialized (set to zero).
               The first address following this region is stored in "_edata".
       TEXT - This is the UNIX word for program code.
               The first address following this region is stored in "_etext".
       STACK - The program stack.  This grows upwards in memory.
       FREE - The region of memory that malloc() can dynamically allocate.
               This starts at the address stored in "_end" and can allocate
               memory up to (but not including) the address stored in
               "_ealloc".

In addition, there may be small unused areas of memory.

The normal layout on TOPS-20 for a single-section program:

       Start   End             Region-Name
       LOW     _edata-1        Data
       _edata  <??>            Stack
       <??>    HIGH-1           - (unused)
       HIGH    _etext-1        Text
       _etext  _ealloc-1       Free
       _ealloc 777777           - (unused, reserved)

Normally LOW == 0 and HIGH == 400000.  These correspond to the normal
addresses for low and high segments.  Also, normally _ealloc is set to
770000, so that pages 770-777 can be reserved for mapping DDT (some people
seem to prefer that to IDDT).

The normal layout on TOPS-20 for a MULTI-section program:
               Start   End             Region-name
       Section 0                        - (unused)
       Section 1
               1,,LOW  _edata-1        Data
               _edata  1,,HIGH-1        - (unused)
               1,,HIGH _etext-1        Text
               _etext  1,,777777        - (unused)
       Section 2
               2,,0    <??>            Stack
               <??>    2,,777777        - (unused)
       Sections 3-37
               3,,0    _ealloc-1       Free (all sections up to 37)
               _ealloc 37,777777        - (unused, reserved)

Normally _ealloc is set to 37,700000 so that pages 700-777 of section 37
are reserved for mapping XDDT (again, for those people who don't know about
IDDT).
2 Stack structure
<KCC Internals - Stack structure>

The organization of the portion of the stack seen by a C routine is
shown in the following diagram (with the top of the stack being the
earlier lines in this file, and the stack pointer at the very top):

SP-->________________________________________________________________
    |    Spilled registers                                           |
    |    generated when we need more intermediate values than        |
    |    there are available PDP-10 registers                        |
    |________________________________________________________________|
    |             |                                                  |
    | (as many    |    Arguments being stacked for the next call     |
    | repetitions |    These are generated in the reverse of         |
    | of these    |    lexical order; thus the first argument        |
    | two areas   |    appears at the top of the stack.  This is     |
    | as levels   |    so that functions like printf which take a    |
    | of nesting  |    variable number of arguments can work.        |
    | in function |__________________________________________________|
    | calls)      |                                                  |
    |             |    Values to be saved over the call              |
    |             |    e.g. if we do foo()+bar() then one function   |
    |             |    has to be called first, and we save its       |
    |             |    value here so we can add it to the other      |
    |             |    result once the second call returns           |
    |_____________|__________________________________________________|
    |                                                                |
    |    Local variables                                             |
    |    stored in lexical order, i.e. the first declared            |
    |    variable is lowest on the stack                             |
    |________________________________________________________________|
    |                                                                |
    |    Return address for calling function                         |
    |________________________________________________________________|
    |    Pointer for return value                                    |
    |    this only exists if the function returns a struct           |
    |    that takes more than two words; otherwise the result        |
    |    is returned in registers 1 and (if two words) 2             |
    |________________________________________________________________|
    |                                                                |
    |    Arguments to this call                                      |
    |    in reverse lexical order as described above                 |
    |________________________________________________________________|

Of course, not all of these areas are likely to appear at once.
There is no frame pointer, only a stack pointer; generated code always
knows the location of the stack pointer in relation to changes in the
above structure (as arguments get pushed and popped, registers get
spilled and despilled, etc).  Thus code to access an argument or local variable
will use a different offset from the stack pointer depending on where
it is generated.
2 Calling conventions and register use
<KCC Internals - Calling conventions and register use>

       Arguments to KCC C functions are passed on the stack and
returned in the registers.  Functions are not expected to save
any registers upon entry, and in fact are assumed to clobber all
of ACs 1-16 inclusive.

Caller conventions - argument passing:

       Since all function calls are assumed to clobber the registers,
it is up to the caller to save on the stack any register values which
it wishes to preserve over the function call.
       As described in the section on stack structure, function
arguments are then pushed in reverse order onto the stack; the last
argument is pushed first, and the first argument is pushed last.
Passing a structure as argument consists of copying it whole onto the
stack.  If the function is expected to return a structure or union
longer than two words, a "zeroth arg" must also be pushed, which is
the address of a location that the function should copy the returned
structure into.  The function is then called with a PUSHJ 17,
instruction which adds the return address onto the stack.

Caller conventions - result returning:
       All accumulators (except AC17) are at the callee's disposal.
However, AC0 is never used by generated code, as some old programs
assume NULL always points to zero, and as the hardware imposes several
restrictions on its use.  AC15 and AC16 are also reserved for minor
KCC runtime functions.
       Single word function return values are left in AC1; double
word returns go in AC1 and AC2.  Return values larger than that are
copied into the location specified by the struct-return pointer, which
is provided by the caller as the "zeroth" argument.
2 Extended addressing
<KCC Internals - Extended addressing>

       A C program can be run in an extended section by specifying
this in either of two ways at load time, depending on whether you are
using KCC or the EXEC to do the loading.

       (a) KCC: Use the "-i" switch.
               e.g.    @cc -i prog.c
       (b) LOAD (or LINK): The first module should be C:LIBCKX.
               e.g.    @load c:libckx,prog

No special switches need be given to KCC for the generated code to be
suitable for extended addressing - the same code will always run
either extended or non-extended.

       In extended sections, code and permanently allocated data
(i.e. global variables) live in section N, the stack lives in section
N+1, and allocated memory begins in section N+2, expanding to fill all
higher sections.  Normally N==1; this can be changed if really
necessary.  All byte pointers not intended for immediate use (e.g.
literal arguments to a LDB or DPB instruction) are constructed as
OWGBPs (One-Word Global Byte Pointer).
1 Cross-compiling
<KCC Cross-compiling>

The -x, -L, -H, and -A switches allow some degree of cross-compilation.
The effects of the various -x specifications are listed below:

CPU: ka, ki, ks, kl0, klx
       KCC can compile code to run on any CPU type; this is done both
by means of different code generation sequences and by assembler
macros which KCC also generates as needed. "ka" specifies a KA-10
using software format floating point doubles (all other types use
hardware format).  "ki" specifies a KI-10, and "ks" both a KS-10 and a
KL-10A without extended addressing.  "kl0" specifies a KL-10B capable
of extended addressing, but restricts the code to section 0; "klx"
specifies a KL-10B non-zero section environment.

       It is possible to specify more than one CPU type; the intent
is to allow for producing code that will run on all specified
machines.  As distributed, KCC code is compiled for "ks+kl0+klx".
However, the results of other combinations are somewhat unpredictable
and should be avoided at the moment.

SYSTEM: tops20, tenex, tops10, waits, its

       Currently there are only two things affected by this setting:
character and string constant values, and ERJMP.
       [1] If compiling for WAITS (or for anything else if on WAITS),
       character values are mapped to and from WAITS ASCII and standard US
       ASCII.
       [2] If compiling for TOPS20 or TENEX, the proper value of
       ERJMP and an auxiliary definition called ERJMPA are generated.
There may be more distinctions in the future.


ASSEMBLER: fail, macro, midas

       The assembler selection is independent of the system or CPU.
Currently either FAIL and MACRO can be selected and both will work.
Selecting MIDAS does not yet work completely.


CHARSIZE: ch7

       It is possible to request that KCC generate code which assumes
that chars are 7 bits, and char pointers are 7-bit byte pointers.
Thus, arrays of chars will have 5 chars per word, instead of 4.  This
feature, invoked by the "-x=ch7" switch, is mainly of use to people
who must integrate C code with old software that cannot deal with
anything but 7-bit bytes.  It is not really guaranteed to work in all
conceivable cases.  In particular, you should be aware that many of
the normally-compiled library routines (such as malloc) will continue
to return 9-bit char pointers, although the str- and mem- functions
should work with either 9-bit or 7-bit strings.
       The values returned by "sizeof" will not change.  As explained
in the discussion of the sizeof operator, sizes are always in terms of
9-bit bytes, except that the size of a char array is always the number of
elements (chars) in the array.  sizeof(char) is always 1.

General comments:
       Ideally KCC (on any system) should be able to generate code
for any other PDP-10 system.  To actually do this requires some
understanding of how the various parts of a program come together.  It
is not enough just to specify some -x switches; you must take care of
the following:

       1. #include files.  You may need to use an alternate standard
       include-file directory to satisfy <>-type includes.  -H can be
       used to specify an alternate location.

       2. Switches.  You should use -D to predefine any parameters
       from <c-env.h> which are not properly defaulted.
       Alternatively you can put a different version of c-env.h in
       a non-standard location pointed to by -H (as above).

       3. Library.  The C runtime library loaded with the program must
       be the correct one (already cross-compiled for the target).  KCC
       always generates a default "-lc" request for the C runtime library;
       the location searched for this can be specified by the -L switch.

For details on porting the C library and KCC itself, see the file PORT.DOC
in the KCC source directory.
1 Char Pointer Hints
<Char Pointer Hints>

       The code generated for handling char pointers always uses
byte-pointer instructions, and so will work for any byte size (at
least on machines implementing the ADJBP instruction).  This can
sometimes be useful when dealing with PDP-10 based data structures.
However, such pointers have to be constructed "by hand" since all char
pointers that KCC generates are either 9-bit or 7-bit.  See also the
-x=ch7 option in "Cross-compiling".

       In general, when char pointers are involved, constructs like
*++ptr are faster than *ptr++. This is because *++ptr can usually be
folded by the optimizer into an ILDB (or IDBP) instruction.  There is
no equivalent on the PDP-10 to a *ptr++ construct; this must always
be done as at least two instructions.

       Whenever possible, try to avoid using two char pointers in
subtraction, as in (ptr1-ptr2).  Many instructions have to be executed
to find the difference between two char pointers, due to the strange
internal format.  For the same reason, try to avoid less-than (<, <=)
or greater-than (>, =>) comparison of char pointers.  Tests for
equality (== and !=) are fine, however.  Finally, on machines which do
not implement the ADJBP instruction (KA, KI), it is also helpful to
avoid addition or subtraction of integers to char pointers.

       None of this applies to other types of pointers, such as (int *),
which are simple addresses and can be manipulated very efficiently.

1 Portable Math Library
* Menu:

* PML: (KCC-PML)                Portable Math Library

1 Local library additions
* Menu:

* LIBLCL: (KCC-LIBLCL)          Local library additions

* LIBT20: (KCC-LIBT20)		Frank Wancho's TOPS-20 library