PDP-10 Archive: c/kcc/intern.doc from SRI_NIC_PERM_FS_1

Trailing-Edge - PDP-10 Archives - SRI_NIC_PERM_FS_1_19910112 - c/kcc/intern.doc

There are 7 other files named intern.doc in the archive. Click here to see a list.

This file attempts to document the internal workings of KCC.  It is
still incomplete.

Current holes in documentation:
	Need to explain what pseudo-code looks like, what types there are,
and which of them use which parts of a pseudo-code entry.  CCCODE etc.
	Need to explain KCC's notion of "register" and how these registers
are allocated and handled.  CCGEN, CCREG etc.

This diagram (based on a msg from David Eppstein) lists each
KCC module with a summary of what its routines do.
  
    CCDATA -- allocation for global variables, tables.  Used in various places
    CC -- main control -- parses command line and calls parser for each file
	CCINP -- file input and preprocessor.  Provides input chars.
	CCLEX -- lexical analysis.  Provides input tokens.
	CCDECL, CCSTMT -- parsing (recursive descent) and semantic analysis
data		CCDECL handles declarations; CCSTMT handles statments & exprs.
flows	    CCSYM -- symbol table manipulation
this	    CCTYPE -- type checking and automatic coercion
way	    CCERR -- error messages
 |	CCEVAL, CCFOLD -- parse tree optimization and constant folding
 |	CCGEN, CCGEN1, CCGEN2 -- code generation.  Decls, Stmts, Exprs.
 V	    CCGSWI -- code generation for "switch" statements
	    CCLAB - label management
	    CCREG -- register allocation
	    CCCODE -- subroutines to send pseudo-codes to optimizer
	CCOPT, CCJSKP, CCCSE -- peephole optimization (some elsewhere too)
	    CCCREG -- retroactively change registers
	CCOUT -- emission to assembly file
	CCASMB -- invoke assembler/linker on output file
	CCDUMP -- various debugging output routines

--------------------------
Message from David Eppstein, 4-Dec-85:

As indicated, you can think of the data as moving through successive
stages of interpretation -- input files get read into chars and munged
by the preprocesser, turned into tokens by the lexer, turned into
parse trees by the parser, munged by the parse tree optimizer, turned
into pseudo-codes by the generator, munged by the peepholer and then
turned into chars again as assembly language by the emitter.

(this parag modified --KLH) The control structure is a little different
-- the file argument handler is the main control, and it calls the
parser which calls the lexer which calls the preprocessor to pull data
in from the file.  The parser returns a parse tree, which the main loop
then pushes out in the other direction by calling the code generator
which calls the peepholer and emitter.
	The parse tree optimizer is called (for ccopt)
by the parser and (for cceval, whose sole purpose is to see if a for
loop test can be moved to the end of the loop) by code generation.
The symbol table is used and maintained by many different phases.
Also, the preprocessor can call the parser recursively to handle #if.

General overview: data flow

[NOTE: for clarity the following discussion talks only about "the" input
file.  KCC can compile multiple files, but each one is compiled
separately and handled in the same way.]

KCC is a one-pass compiler; it reads its input only once, and does so
sequentially without ever looking back.  It outputs an intermediate
assembly-language file which is then processed by the system's
assembler, in effect adding a second pass.  Each top-level declaration
statement (a function is a single such declaration statement) is handled
separately, and each goes through the following complete cycle:
	(1) The statement is completely parsed into a single "parse tree",
	which is a linked list of "NODE" structures.  (CCDECL, CCSTMT)
	(2) Some optimization is done on the parse tree. (CCOPT, CCEVAL)
	(3) The code generator is called on this parse tree, to fill a
	"peephole" buffer with what are called "pseudo-code" instructions.
	The buffer is a linear array of "PSEUDO" structures.  The code
	generator is subdivided into two general phases:
		(3a) CCGEN - Determine from parse-tree what pseudo-code to use.
			This phase knows about BOTH parse trees and
			pseudo-code, but does not manipulate the buffer.
		(3b) CCCODE - Put pseudo-code in buffer, applying optimization
			and forcing out buffer when there is no more need to
			look back at buffered pseudo-code.  (or when full!)
			This phase only knows about pseudo-code.
	(4) Whenever the buffer is forced out, the output processor (CCOUT) is
	called on each pseudo-code instruction to generate the actual
	character strings needed for the assembler file.  The output
	processor never sees the parse tree, only the pseudo-code
	as it comes from the buffer.

Optimization occurs at various places.  Some optimization can be done
directly on the parse tree itself, but it looks as if most actual optimization
comes during (3b) -- as pseudo-code is being added to the peephole buffer,
various checks are made trying to combine the current op with previous
ops in more efficient ways.  A few optimizations are done in (4) when
CCOUT selects the best way to do some minor things.
This description can use some more details.

Since the entire parse tree for a declaration is kept in memory, it
may be possible that extremely large functions or data initializations
could cause problems for KCC.  However, such things are considered
bad practice and are rarely if ever seen; this is unlikely to ever be
a problem.  Note this only applies to single declaration statements, not
to the entire file.

More about input processing:
	All input is handled by the CCINP module, which provides the
"nextc" function to read the next character.  CCINP handles all of the
pre-processor '#' commands as they are encountered; the rest of KCC
never sees them.  CCINP also provides macro-expansion text just as if
it was part of the regular input stream.
	The only thing it cannot do is start macro expansions; this is
taken care of by the next level of input, CCLEX, which provides the
"nextoken" function.  This is THE main input function as far as the
rest of KCC is concerned; nothing else but CCLEX calls CCINP's
"nextc".  "nextoken" simply reads the next C-language token and
returns its code (codes defined in CC.H).  Nothing outside of CCLEX
should be concerned with input characters; only with tokens.
"nextoken" does set certain globals as a side effect, since values may
be associated with some types of tokens (notably constants).  CCLEX
does not know anything about parse trees.  The comments at the start
of CCLEX pretty much explain what variables nextoken() can set as
side effects.

CCDECL and CCSTMT perform the actual semantic analysis of the tokens
that "nextoken" returns, building the parse tree.  Nothing else calls
"nextoken", except CCERR in order to attempt recovery from some kinds
of errors.

How a program gets compiled (control flow):

main()	does simple switch inits, and scans all arguments once in order to
	process all switches furnished by the user.  Then scans the arguments
	once more, and calls cfile() to process each remaining file argument.
	After all arguments are processed, the linking loader
	(LINK) may or may not be invoked.

    cfile()	Hacks an argument filename.  Processing depends on current
		switch values.  Does necessary per-file initializations,
		then normally tries to compile the file by first
		calling entdef() (a special hack, can be ignored) and
		then (TA DA!) loops calling extdef() to process external
		definitions as long as input exists, feeding the results
		into the code generator via gencode().
		When input is done, postamble() takes care of final output
		to the assembler file.  An additional "prefix" assembler
		file may also be generated at this point.  Finally,
		asmb() is used to invoke the assembler on the file.

So, extdef() and gencode() actually do all the real work; everything
seen in the file is some kind of top-level declaration, either of data
or of a function.
	extdef() initializes a few things (see note below re ENTRY hacking)
	and then calls declarator() to parse as far as the first identifier
	(symbol name).
	If it was anything but a function, datadef() is called to
		generate a parse tree from not only this definition but
		all succeeding definitions until the end of the declaration
		statement is seen.
	If it was a function, extdef will parses the parameter list
		itself and then invoke funcdef().  This routine parses the
		parameter declarations and the function body.

Note on ENTRY and entdef():
	In order to remain compatible with older source files used with
	KCC which used the "entry" statement, a special routine called
	entdef() is invoked at the start of compilation.  This scans
	ENTRY statements, which must precede any other kind
	of declaration.  Thereafter ENTRY is ignored.
	The ENTRY statement is peculiar to KCC.  It was needed
	only because the PDP10 assembler insists that any library
	entry points be specified BEFORE any code is produced!
	This was fixed by having KCC generate an auxiliary "prefix"
	assembler file and feeding that to the assembler just ahead of
	the main file.  ENTRY should no longer be used, and KCC should
	eventually not support it.

In more code-like form:

CC main()			- Main program
CC	<Initialize compiler switch values>
CC	Scan all arguments for switches.  For each switch, call
CC		cswitch() - process a switch.
CC	Scan all arguments again for filenames.  For each file, call
CC		cfile() - process one file.
CC			<initialize all per-file stuff>
CCDECL			Invoke entdef() for "entry" stmt compatibility
CC			Until EOF on input, call
CCDECL				extdef() - process one external def.
CCGEN				gencode() - generate code for it.
CCOUT			postamble()	- output assembler-file postamble.
CCOUT			makprefile()	- Make assembler prefix file if needed.
CCASMB			asmb()		- Invoke assembler.
CC	If invoking linking loader, call
CCASMB		runlnk().
	

Where the real work happens:

CCDECL extdef()	- Read one external definition, and return parse tree if any.
CCDECL	Initialize for a declaration statement, then call
CCDECL		declarator()	- Parse one identifier
CCDECL	If got non-function, call
CCDECL		datadef()	- Parse rest of declaration, return it.
CCDECL			Until semicolon seen, call
CCSTMT				defnode()	- Make parse-tree node
CCDECL				declarator()	- parse another identifier
CCDECL	else got function.
CCDECL		<parse parameter list>, then call
CCDECL		funcdef()	- Parse rest of function, return it.
CCDECL			<Set up various things>
CCDECL			Parse parameter declarations, by calling
CCDECL				declarator()
CCDECL			<Set up parameter (local var) stuff>
CCDECL			Then parse the function body, by calling
CCSTMT				statement()	- Parse a statement
CCDECL			Then finish consing together the resulting parse-tree
CCSTMT				with defnode().

CCGEN	gencode() is fairly easy to follow.


This page describes some of the basic KCC data structures and constant types.

In general, structures are declared in the following way:
	#define NAME _name
	struct _name {...};
and references to the structure tag should use NAME.  Pointers should
be declared as NAME *.

Important structures:
	TOKEN	Token definition (entry in table)
	  T_	  Token identifiers (codes)
	  TKTY_	  Token types
	TYPE	Type Definition
	  TS_	  Type Specifier codes
	SYMBOL	Symbol Definition
	  SC_	  Symbol Classes
	NODE	Parse tree node
	  N_	  Node identifiers (opcodes)
	  NF_	  Node flags
	  CAST_	  Cast (coercion) types
	PCODE	Pseudo-Code entry (in peephole buffer)
	  P_	  Opcode values (almost == to PDP10 instructions)
	  POF_	  Opcode flags
	  POS_	  Opcode skip values
	  POSF_	  Opcode skip value bits
	  PTF_	  Type Flags
	  PTA_	  Type Addressing modes
	  PTV_	  Type Values (flag/addr combos)

Minor structures:
	SMEM	Structure Member
	RESWD	Reserved Word table entry
	TYNAME	Built-in Type table entry

NOTE: Currently, for historical reasons the token types, node op-codes, and
a few other things are all mixed together.  The following prefixes indicate
symbols which used to all be "token constants":
	N_	- Only used as a node op-code
	T_	- Only used as a token value
	Q_	- Used as BOTH a node op-code and token value!
		(Why Q?  Well, it's exactly halfway between N and T...
		 also, think about a Q-ship, which isn't what it seems.)

See CCSYM.H for details on the format of symbol table entries (SYMBOL)
and type specifiers (TYPE).

---------------------- PARSE TREE STRUCTURE ---------------------

	Only the parser creates parse trees, and the CCDECL and CCSTMT
modules contain all of the parser.  The modules which form the code
generator and which read the parse tree are CCGEN, CCGEN1, CCGEN2 and
CCGSWI.

	The parse tree generated by KCC's parser (and passed to the
code generator) is made of NODE structures.  The format and many
details are explained in CCNODE.H.  The node-ops, flags, and token
types are defined in CCTOKS.H.  The following pages describe specifics
of the way each node-op uses the NODE structure.

------- GENERAL ---------

<node-op symbol>	<brief description>		<syntax>
	T: <C type pointer>
	F: <flags>
	X: <extra variable, if used.  Normally isn't.>
	L: <Nleft pointer>
	R: <Nright pointer>

NOTE: NULL is not allowed as a value for any pointer unless explicitly
mentioned as an option!

NOTE: Only a few types of nodes can ever be seen as "lvalues".  H&S 7.1
describes the kinds of expressions that can be lvalues.  It is slightly
wrong; [*] marks corrective footnotes.

	0. Names of scalar/struct/union type variables, but not functions,
	   arrays, or enum constants.
		Q_IDENT (depending on type)
	1. Subscript expression e[k], always. [*]
		(Same thing as 5, N_PTR).
	2. Parenthesized expression, if contents are lvalue.
		(Irrelevant)
	3. Direct component selection e.name, if e is lvalue. [*]
		Q_DOT (sometimes).
	4. Indirect component selection e->name, always. [*]
		Q_MEMBER (always).
	5. Indirection *e, always. [*]
		N_PTR (always).

	[*] "Always" is untrue.  NONE of those are lvalues if the type
	of the expression is "array" or "function"!!!

Note on array types:
	To simplify matters, whenever an expression of array type is seen,
it is NOT immediately converted into a pointer.  It is the responsibility
of each operator to perform this type conversion if required.  "sizeof",
"&", and function calls don't.
	The lvalue expressions noted above are the only ones that can
yield an array.  Because they are so common, the nodes Q_IDENT and
N_PTR + Q_PLUS have some special type fudging to eliminate the
overhead of a "&" and "*".  Normally a Q_IDENT's Ntype is the same
as its symbol Stype; but if Ntype is array, conversion results in simply
setting Ntype to pointer (the original type is still available in Stype).
	By contrast, normally a Q_PLUS's Ntype is pointer (where
arrays are involved), but it is sometimes changed to an array type.
Array subscripting, e[x], is represented by *(e + x); if this result
still of array type then the almost inevitable conversion would result
in &(*(e + x)) which is the same as (e + x).  In order to avoid the
overhead of the &*, the * is flushed and Q_PLUS's Ntype is made "array"
as if the * was present.  Then when the conversion to pointer happens,
Q_PLUS's Ntype is clobbered from "array of T" to "pointer to T"
(dropping another level of arrayness) rather than creating a N_ADDR (&) node.
	For the other possible things that can be arrays (N_PTR,
Q_DOT, and Q_MEMBER), the conversion is accomplished by applying
N_ADDR to the node.  However, unlike the normal usage where, given an
operand of type T, the result type is "pointer to T", the
array-conversion usage gives a result type of "pointer to T's
subtype".  That is, it takes out a level of arrayness.
	Someday array handling may be made more consistent.

------- Primary Expressions (type TKTY_PRIMARY) ---------

Q_IDENT: <identifier>				TERMINAL NODE
	T: <type>
	F: <flags?>
	X: (actually "Nid") pointer to symbol
	L,R: null

	Note: the node type (Ntype) will be identical to the symbol's type
	(Nid->Stype) in all cases EXCEPT when the symbol is an array or
	function, in which case the Ntype will be "pointer to <Stype>".
	This avoids the overhead of a N_ADDR node for the frequent
	conversion of array/function names to array/function pointers.
	Routines that need the actual type (eg data defs) can use the Stype.

	Note: This node is an lvalue expression unless the symbol type is
	enum, array, or function.

N_ICONST: <integer constant>			TERMINAL NODE
	T: <type> - normally "int" but may be any integral type.
	F: ?
	X: (actually "Niconst") - contains value of constant
	L,R: null

N_FCONST: <floating-point constant>		TERMINAL NODE
	T: <type> - either "double" or "float"
	F: ?
	X,L: (actually "Nfconst") - value of constant.
	R: null

N_SCONST: <string constant>			TERMINAL NODE
	T: <type> - always "char *"
	F: ?
	X,L: String constant value in "Nsconst" and "Nsclen".
	R: null
		Also uses "Nsclab" in code generation.

N_PCONST: <pointer constant>			TERMINAL NODE
	T: <type> - pointer to X
	F: ?
	X: (actually "Niconst") - contains value of constant
	L,R: null

	This node is used for casts of constant values to some pointer type.

N_VCONST: <void "constant">			TERMINAL NODE
	T: <type> - always "void"
	F: ?
	L,R: null

	This node is used for casts of constant values to "void".

Q_ASM:	asm(string-literal)			TERMINAL NODE
	T: <type "void">
	F: 0
	L: null
	R: null or <exprlist>	arguments.  Only N_SCONST allowed.

	Note: This op is considered to have side effects!
	Note: this is deliberately similar to N_FNCALL.  Currently the
	string literal argument is passed directly to the assembler output
	file.

N_FNCALL:	<function-call>		<expr>(<expr1>, ... <lastexpr>)
	T: <type of function>
	F: <various flags>
	X: (actually Nretstruct) pointer to auto symbol for returning struct.
	L: <expr>		must evaluate to function address.
	R: null			if no args
		or, if one or more args, R points to an <exprlist>:
	R: N_EXPRLIST
		T: <type of lastexpr>
		F: 0
		L: N_EXPRLIST
		    ...
			N_EXPRLIST
				T: <type of expr1>
				F: 0
				L: null
				R: <expr1>
		    ...
		R: <lastexpr>

	Note: This op is considered to have side effects!
	Note: Nretstruct is only set if the function returns a structure 
	larger than 2 words.  It is a SYMBOL * pointer which identifies
	a temporary, unique internal auto struct declaration which the returned
	structure can be copied to, if the code generation wishes to do so.

Q_DOT:	<direct structure member selection>	<struct>.<member>
	T: <C type of member>
	F: <various flags>
	X: (actually Nxoff) (int) offset from start of struct (see note below)
	L: <expr>		Must evaluate to structure address
	R: null

	Note: This node is an lvalue expression if <struct> is an lvalue
	AND <member> is not an array type.

	Note: Nxoff is encoded just as for Ssmoff in structure member symbols.
	If positive, it is a word offset from start of struct.
	If negative, it is a bitfield.  After making it positive again,
		the low 12 bits are the P+S fields of a byte pointer, and
		the remaining high bits are the word offset.

Q_MEMBER: <indirect structure member selection>	<&struct> -> <member>
	See Q_DOT; same structure.
	Note: This node is always an lvalue expression unless <member>
	is an array type.

------- Unary Expressions (type TKTY_UNOP, TKTY_BOOLUN) ---------

T_SIZEOF:	<sizeof-expression>		sizeof <expr or typename>
	This is only seen as an input token, not a node.  The
	result value of the sizeof is evaluated immediately and
	converted into a N_ICONST (integer constant) node.

N_CAST:	<cast-expression>		(<type>) <expr>
	T: <desired type of result>
	F: <various flags>
	X: (actually Ncast) holds CAST_ type to apply.
	L: <expression>
	R: null

N_POSTINC:	<postincr-expr>		(lvalue-expr)++
N_POSTDEC:	<postdecr-expr>		(lvalue-expr)--
N_PREINC:	<preincr-expr>		++(lvalue-expr)
N_PREDEC:	<predecr-expr>		--(lvalue-expr)
Q_COMPL:	<bitwise-not-expr>	~(expr)
Q_NOT:		<logical-not-expr>	!(expr)     (the only TKTY_BOOLUN op)
N_NEG:		<unary-minus-expr>	-(expr)
N_ADDR:		<address-expr>		&(lvalue-expr)
N_PTR:		<indirect-expr>		*(expr)
	T: <type of expr>
	F: <flags>
	L: <expr>
	R: <null - unused>

	Note: The 4 inc/dec ops all require that their operand be an lvalue.
		They are also all considered side-effect ops!
	Note: N_ADDR requires that its operand be an lvalue.
	Note: N_PTR is always an lvalue expression unless the result type
	is an array.

------- Binary Expressions (type TKTY_BINOP, TKTY_BOOLOP) ---------

	All of these operators have the same basic structure.  All are
left-associative during parsing.  It does not matter which operand
(left or right) is evaluated first, except for || and && which are
required to evaluate the left operand first.  These are the only operators
for which the precedence table is significant.

Q_MPLY: Q_DIV: Q_MOD:	13 * / %  Multiply, Divide, Remainder
Q_PLUS: Q_MINUS:	12 + -	 Add, Subtract
Q_LSHFT: Q_RSHFT:	11 << >>  Left shift, Right shift
Q_LESS: Q_GREAT: Q_LEQ: Q_GEQ:	10 < > <= >=  Comparisons
Q_EQUAL: Q_NEQ:		9 == !=  Equal, Unequal
Q_ANDT:			8 &	 Bitwise AND
Q_XORT:			7 ^	 Bitwise XOR
Q_OR:			6 |	 Bitwise OR
Q_LAND:			5 &&	 Logical AND
Q_LOR:			4 ||	 Logical OR
	T: <C type of result>
	F: <flags>
	L: <left operand expr>
	R: <right operand expr>

	Note: Q_PLUS has a special usage for array subscripting.
	See discussion of arrays above.

------- Binary Assignment Expressions (type TKTY_ASOP) ---------

	All operators have the same structure as the basic Q_ASGN
assignment op, and all require that the left operand be an lvalue.
They are also all considered side-effect ops.

Q_ASPLUS: Q_ASMINUS: Q_ASMPLY: Q_ASDIV: Q_ASMOD:
Q_ASRSH:  Q_ASLSH:   Q_ASAND:  Q_ASXOR: Q_ASOR:
Q_ASGN:	<assignment operation>		<lvalue-expr> <asop> <expr>
	T: <resulting C type>
	F: <flags>
	X: (actually Nascast) Applicable assignment conversion (CAST_ value)
	L: <lvalue expression>
	R: <value expression>

Note: if a type conversion of the <lvalue expression> is required, L will
point to a N_CAST node which points to the <lvalue>.  However, that expression
is evaluated only once; the cast is applied only for the operation.  If another
conversion is needed before storing the result value back into the <lvalue>
then this is specified by the contents of Nascast; anything but CAST_NONE
indicates a conversion.

------- Ternary Expression (type TKTY_TERNARY) ---------

Q_QUERY:	<conditional-expression>	  <EC> ? <E1> : <E2>
	T: <type of E1>
	F: <flags>
	L: <expr>			EC (condition)
	R: <node>
		T: <type of E1>
		F: 0
		L: null or <expr>	E1 ("then")
		R: null or <aexpr>	E2 ("else")		

	Note: Normally the types of E1 and E2 are the same as the overall
type of the expression.  However, if the value of the ternary expression
is not used for anything (NF_DISCARD will be set) then it is possible for
the types to be mismatched or for either or both expressions to be NULL.

------- Comma Expression (type TKTY_SEQ) ---------

N_EXPRLIST:	<expression-list>  expr1, expr2, ... lastexpr
	T: <type of lastexpr>
	F: 0
	L: N_EXPRLIST
	  ...
		T: <type of expr2>
		F: 0
		L: N_EXPRLIST
			T: <type of expr1>
			F: 0
			L: null
			R: <expr1>
		R: <expr2>
	R: <lastexpr>

------- Compound statements (type TKTY_RWCOMP) ---------

Q_GOTO:	<goto-statement>			goto <label>;
	T: null, F: 0
	X: (actually Nxfsym) (SYMBOL *) set to plabel(sym,0)
	L, R: null

Q_RETURN:	<return-statement>		return <expr>;
	T: null or <type of current function> if a return-value exists.
	F: 0
	L: null
	R: null or <expr>	Return value

Q_BREAK:	<break-statement>		break;
	No T, F, L, R.

Q_CONTINUE:	<continue statement>		continue;
	No T, F, L, R.

Q_IF:	<conditional-statement>     if (<expr>) <thenstmt> [else <elsestmt>]
	T: null, F: 0
	L: <expr>		Used as condition
	R: <node>
		T: null, F: 0
		L: <then-statement>
		R: <else-statement>	NULL if no "else"

Q_FOR:	<for-statement>			for ( <e1> ; <e2> ; <e3> ) <stmt>
	T: null, F: 0
	L: <node>
		T: null, F: 0
		L: <node>
			T: null, F: 0
			L: null or <expr>	E1 init
			R: null or <expr>	E2 condition
		R: <node>
			T: null, F: 0
			L: null or <expr>	E3 increment
			R: null
	R: <statement>		Body

Q_DO:	<do-statement>			do <stmt> while ( <expr> );
	T: null, F: 0
	L: <expr>		Condition
	R: <statement>		Body

Q_WHILE: <while-statement>		while ( <expr> ) <stmt> ;
	T: null, F: 0
	L: <expr>		Condition
	R: <statement>		Body

Q_SWITCH:	<switch-statement>	switch ( <expr> ) <stmt> ;
	T: null, F: 0
	X: (actually Nxswlist)	Points to head of a list of case/default stmts.
	L: <expr>		Condition
	R: <statement>		Body (usually a compound stmt)

Q_CASE:	<case-statement>		case <expr> : <stmt> ;
	T: null, F: 0
	X: (actually Nxfint or Nxfsym) <case value>
	L: <statement>		Statement labelled by this case-label
	R: null, or ptr to next Q_CASE node within this switch body.

Q_DEFAULT:	<default-statement>	default: <stmt> ;
	T: null, F: 0
	L: <statement>	Statement this is a label for
	R: null, or ptr to next Q_CASE node within this switch body.

	This is the same as Q_CASE except that no case value is furnished.
	During the code generation phase Q_DEFAULT is turned into a Q_CASE.

------- Other statement node ops (type TKTY_NULL) ---------

N_FUNCTION: <function-definition>
	T: null, F: 0
	L: <node>
		T: null, F: 0
		L: Q_IDENT for header (contains name, type)
		R: <local-scope static decls> or null if none.
			Note the decl list is made of all static decls within
			the entire function, regardless of actual scope.
	R: N_STATEMENT
		L: <statement>	Function body (will be compound stmt)
		R: N_STATEMENT
			T: null, F: 0
			L: Q_RETURN
				T: null, F: 0, L: null, R: null
			R: null

N_STATEMENT: <compound-statement>	{ decls stmt1 stmt2 ... last-stmt }
	T: null, F: 0
	L: <decls>		; If no decls, is stmt1.
	R: N_STATEMENT
		T: null, F: 0
		L: <stmt1>
		R: N_STATEMENT
		     ...
			T: null, F: 0
			L: <last-stmt>
			R: null

N_LABEL:	<label-statement>	<label>: <statement>
	T: null, F: 0
	X: (actually Nxfsym) (SYMBOL *) to result of plabel(sym,1)
	L: <statement>
	R: null

------- Data definition node ops (type TKTY_NULL) ---------

N_DATA:	<data declaration or definition>	Any declaration not a funct def
	T: null, F: 0				e.g. int A,B=2,C;
	L: N_IZ for A
	R: N_DATA
		T: null, F: 0
		L: N_IZ for B
		R: N_DATA
			... until R: null

	Note that the structure for N_IZ is the same as that for Q_ASGN.
	This makes it easy to generate initialization code by replacing the
	N_IZ op with a Q_ASGN op when it is encountered.

N_IZ:	<storage alloc for variable>
	T: null, F: 0
	L: Q_IDENT
		T: <C type of data>
		F: 0
		X: (actually Nid) (SYMBOL *) for this identifier
		L,R: unused
	R: null or <expr> or N_IZLIST		Optional initializer

	Note: the Ntype of the Q_IDENT here is never examined when
	generating code for static-extent data (gendata()); the symbol's
	Stype is used instead.  See the note for Q_IDENT.

N_IZLIST:	<initializer-list>	= { e1, e2, ... elast }
	T: null, F: 0
	L: <expr> or N_IZLIST		E1
	R: N_IZLIST
		T: null, F: 0
		L: <expr> or N_IZLIST		E2
		R: N_IZLIST
			T: null, F: 0
			L: <lastexpr> or N_IZLIST	Elast
			R: null

------- Miscellaneous node ops (type TKTY_NULL) ---------

N_NODE:	<node>
	This node op is used for random nodes that are treated
	specially anyway by whatever points to them.  Everything that
	used to set a node op to 0 now sets it to N_NODE; the only
	purpose of this value is to ensure that all active nodes have
	non-zero op values.
---------------------------------------------------------------------

Note about null statements:
	<null-statement> has no representation other than by a null pointer.
	If CCSTMT's statement() parses a null statement, a null pointer is
	returned.

Definition of <statement>:
	The node pointer returned from CCSTMT's statement() will be one of the
following:
	<null> i.e. a null statement
	<statement> - a node op of type TKTY_RWCOMP or TKTY_NULL.
	<expr> - any kind of expression.

The above node descriptions make a few rudimentary distinctions between
different kinds of expressions, such as <expr> and <aexpr>.  These are more
in the nature of hints than accurate descriptions -- see the syntax in
H&S or ANSI for the real thing.
	To illustrate:
		<expr>			<aexpr>
	K&R	<expression-list>	<expression>
	H&S v1	<comma-expression>	<no-comma-expression>
	H&S v2	<comma-expression>	<assignment-expression>
	ANSI	<comma-expression>	<assignment-expression>

	The first can be a comma-expression, the second cannot.  The
difference is mainly syntactic, significant during parsing.  There is
very little difference between them in terms of parse-tree nodes since
a true <expr> (i.e. a list of expressions separated by commas) can
be converted into an <aexpr> just by enclosing it in parentheses,
which results in exactly the same parse-tree node structure except
that the top node has its NF_INPARENS flag set.
	ANY of the nodes described as "expressions" may be found as an
<expr> or <aexpr>.  The only restriction is that if an <aexpr> is
required, then N_EXPRLIST cannot be seen unless NF_INPARENS is set.

	Only a few nodes are guaranteed to be terminating nodes:
Q_IDENT and the various types of constants.  All other expression nodes
MUST point to further non-null expression nodes.  This is different
from the situation with <statement> nodes, which can always be null to
represent null statements.

	The NF_ flags as far as I know are only used with expression
nodes and are meaningless in any other type of node.  That is, they are
not used for TKTY_COMP or TKTY_NULL nodes.  The flags themselves
are for the most part only used within the expression parser itself
(CCSTMT).

NF_LVALUE	/* expr can be an lvalue */
	Mainly used in CCSTMT.  One reference in CCGEN2 to see if trying to
extract a structure component from a "virtual" non-addressable
structure, ie one returned on stack from a function.  A couple of refs
in CCTYPE when doing type conversion checking.

NF_RETEXPR	/* want result in RETVAL so can be returned */
	Only used in the code generator (CCGEN,1,2).  The parser never
uses this.

NF_GLOBAL	/* unable to cause a stackref */
	Mainly used in CCSTMT; one ref in CCTYPE.
This is used to help with NF_STKREF but not clear on this one yet.

NF_STKREF	/* counted already as a stackref */
	Mainly in CCSTMT.  Its usage is documented in CCSTMT.
One ref in CCTYPE.

NF_INPARENS	/* Expr was parenthesized */
	Only in CCSTMT.  This flag is set only to indicate
that an expression node was enclosed in parentheses, and thus may occur
for any type of expression node.  Its only use is for printing nice
warning (not error) messages about possible operator precedence
problems (ternary and binary).

NF_WASCOMP 	/* Operator is a comparison or logical &&, || */
	Only in CCSTMT's binary().  This flag is set only for
nodes of type TKTY_BOOLOP, and is only used in order to print a nice
warning message if it looks like the user has gotten the precedence of
the | and & operators mixed up with the comparison/logical operators.

NF_ARRFCONV	/* This N_ADDR is an implicit array/function conversion */
	OBSOLETE -- to be flushed soon!
	Only in CCSTMT.  This flag is set only for N_ADDR nodes which
were created as part of an "immediate conversion" of an array or function
name parsed as a primary expression.  The flag is for the benefit of
the "sizeof" operator and function call (.)() processing, so that the
conversion can be reversed if the flag is seen.  Perhaps someday this
can be flushed when the code generation knows how to handle array/function
objects properly.

NF_DISCARD	This expression's value will be discarded.
	Set by CCFOLD's evaldiscard(), used by CCSTMT and CCGEN2.  As
per H&S 7.12, there are three places where an expression may have this
flag set: (1) an expression statement, (2) the first operand of a
comma expression, and (3) the initialization and incrementation
expressions in a "for" statement.  As much as possible of the expression
is recursively discarded, and the remaining expression will have this
flag set for the topmost node.  CCGEN2 checks this in various places to
optimize code generation.

NF_USERCAST	This N_CAST operation was explicitly given by user.
	A N_CAST node can be generated either by an explicit user cast
in the code, or by an implied type coercion (the "usual unary/binary/etc
conversions").  This flag is set by CCSTMT and distinguishes the two;
it is only used by CCFOLD's evaldiscard() to avoid generating warning
messages when discarding implicit cast conversions.

NF_SIDEFF	This expression has side effects.
	Not yet implemented; may or may not be a good idea.  Idea is
that any operator that has a side effect will set this flag, which
should be propagated upwards (that is, a node will have the flag set
if any of its operands have it set).

For any particular expression node, at any time:
	T: <the C type of the expression this node represents>
	F: <NF_ flags for this expression result>

CODE GENERATOR

Virtual Registers:

	The code generator translates the parse-tree nodes into
pseudo-code operations on "virtual registers".  A virtual register
(vreg) is treated as if it was a machine register, but whether its
contents are actually in a machine register, or on the stack, is
generally unspecified; the register allocation routines in CCREG are
responsible for saving virtual registers on the stack as needed.
	A vreg may consist of one or two words, and this is sufficient
to hold an object of any type except structures/unions larger than 2 words.
Not all of the bits in a vreg are actually significant; this depends on
the type.  For example, types which are smaller than 1 word (such as
chars and bit-fields) can only treat the rightmost N bits as significant,
where N is the number of bits in the type.  Cast conversions can change
the contents of a vreg in two ways:
	(1) Change the number of significant bits (or # of words)
	(2) Change the representation (for those bits significant in the
		new type)

Note that, in general, cast conversion to a smaller type does not result
in any code being produced.  This is because all that changes is the
number of bits considered significant, and the representation of the
significant bits remains the same.  Conversion to a larger type, however,
will enlarge the number of significant bits, and these new bits have
to be set to some value, so some code is likely to be generated to
accomplish this.

For example, in the expression:
	int a, b;
	a = (char)b;
The value of b is first retrieved into a vreg, which amounts to a "MOVE vreg,b"
instruction, and the cast to (char) is then applied.  However, this does NOT
generate any code, because all that happens is that only the rightmost 9
bits of the vreg contents become significant.  It is only when this value
is implicitly converted back to (int) for the store into "a" that the 
cast from (char) to (int) takes effect and the non-significant bits are
zapped.