Trailing-Edge
-
PDP-10 Archives
-
SRI_NIC_PERM_FS_1_19910112
-
kcc-4/kcc/seqnce.doc
There are 5 other files named seqnce.doc in the archive. Click here to see a list.
KCC CODE GENERATION SEQUENCES
There are certain constructs in C that do not correspond to
any direct PDP-10 machine instruction equivalents, and require KCC to
generate a code sequence to implement the construct properly. Since
the most efficient sequence is often a quite subtle and non-obvious
collection of instructions, this file exists to document the sequences
and the reasons why they look like they do. There is no guarantee
that these are optimal, of course; some of them may well be
susceptible to improvement by very clever programmers.
UNSIGNED ARITHMETIC
It is most unfortunate that the PDP-10 does not include any
unsigned arithmetic instructions; they would be easy to do. Anyway,
here is the code generated for all the possible unsigned operations,
where those operations are DIFFERENT from those for signed arithmetic.
Note that all logical and bit-wise operators use exactly the same
code, and because the PDP-10 uses twos-complement arithmetic, the
addition and subtraction instructions also work for both signed and
unsigned ints.
!,^,~,|,& (logical and bit-wise ops) SAME.
+,- (addition & subtraction) SAME (ADD, SUB).
<< (left-shift) SAME (LSH).
==,!= (equality) SAME (CAME/CAIE, CAMN/CAIN).
/,% (divide,remainder) DIFFERENT: see next page (very hairy).
Casts (cast conversions) DIFFERENT: see int->float and int->double pages
>> (right-shift) DIFFERENT:
Uses LSH instead of (signed) ASH.
<,<=,>,>= (inequality ops) DIFFERENT:
Unsigned CAMx of A,B becomes:
MOVE R,A ; Can optimize if A or B are constants.
TLC R,400000 ; Toggle sign bit
MOVE S,B ; Same thing for other operand, sigh.
TLC S,400000
CAMx R,S ; Now can compare.
* (multiply) DIFFERENT:
Unsigned MUL of A,B becomes:
MOVE R,A ; R is a double-word register pair
MUL R,B ; Do long multiply
TRNE R,1 ; Check low bit in high-order word of result.
TLOA R+1,400000 ; Copy it into high bit of low word.
TLZ R+1,400000 ; Whether 1 or 0.
<result in R+1>
Alternative sequences, which take less space but are
slower (MUCH slower on non-KLs):
MUL R,B or MUL R,B
LSH R+1,1 LSH R+1,1
LSHC R,-1 LSHC R,-35.
result in R+1 result in R
UNSIGNED ARITHMETIC - INTEGER DIVISION:
This is the hairiest unsigned operation, particularly when
the divisor has its high bit set; for that case we currently do the
division by hand. The other cases are more amenable but still need
to be distinguished.
Dividend / Divisor
Case 1: + / +
Case 2: - / +
Case 3: + / -
Case 4: - / -
What KCC currently outputs for the pseudo-instruction P_UIDIV, which
leaves its quotient in RQ and the remainder in RQ+1:
UIDIV RQ,MEM: /* RR = RQ+1 */
SKIPGE 16,MEM ; Get mem ref first before clobbering RQ or RR!
JRST $1 ; Negative divisor (case 3 or 4)
JUMPGE RQ,$3 ; If both operands positive, just do IDIV!
; Case 2: Negative dividend, positive divisor
CAIG 16,1 ; Dividend negative. Is divisor 2 or greater?
JRST $2 ; No, is 1 or 0, must special-case this as
; result must still have high bit set!
MOVE RR,RQ ; Set up dividend
MOVEI RQ,1 ; High bit copied into high-order word.
DIV RQ,16
JRST $4 ; Done.
; Case 3&4: Negative (very large) divisor. Manual division.
$1: MOVE RR,RQ
MOVEI RQ,0
JUMPGE RR,$4 ; All done if case 3 (positive dividend)
CAMGE RR,16 ; Case 4: both neg, compare.
JRST $4 ; Divisor is greater, result is <0 ? dvdend>
SUB RR,16 ; Dividend is >=, divide once!
AOJA RQ,$4 ; Result is <1 ? dvdend-divisor>
$2: TDZA RR,RR ; Divisor is 1 or 0, just clear remainder
$3: IDIV MQ,16 ; Both operands positive, normal IDIV!
$4:
Note that when the divisor is a constant, KCC attempts to use this information
to output a much smaller code sequence (the appropriate subset of the above
full-fledged sequence). The only really odd variant is for divisors that
are a power of 2, in which case KCC outputs an LSHC and LSH instruction
to shift the dividend properly.
Other alternatives that came up:
(1) CAIL MQ,0
SKIPGE MQ+1,MEM
PUSHJ P,$UIDIV ; Some kind of grossness
IDIV MQ,MQ+1
(2) Convert to double float, then DFDV, then back to uint???
Gross, gross, gross!
(3) KL-10s could use DDIV, but requires 4 sequential ACs!
(4) Code from Peter Samson at Systems Concepts:
(not used directly as MEM ref can conflict with AC refs)
; Reg MQ/
; Reg AC=MQ+1/ dividend
; Addr MEM/ divisor
SKIPGE MQ,MEM
JRST $1
SOJN MQ,$2
EXCH AC,MQ
JRST $3
$1: MOVEI MQ,0
JUMPGE AC,$3
CAMGE AC,MEM
JRST $3
SUB AC,MEM
AOJA MQ,$3
$2: TLNN AC,400000
TDZA MQ,MQ
MOVEI MQ,1
DIV MQ,MEM
$3:
; MQ/ quotient
; AC/ remainder
"if the compiler knows in a given case that the divisor doesn't have
the sign bit on, and that the divisor isn't 1, it need only compile
the four instructions starting at $2. Since divisors are frequently
constants, this simplification should help in a lot of cases."
POINTER ARITHMETIC - COMPARISON:
For == and != the CAME and CAMN instructions can be used just
as for integer comparison. For relative inequalities a few more instructions
are necessary. These will work for any size byte pointer, subject to
the following constraints:
(1) the pointers must have the same bytesize.
(2) Neither pointer may be NULL (C leaves the result undefined).
(3) Local-fmt BPs cannot have a P (position) field of 40-44 inclusive.
General case (works for both 0-section and N-section)
This is what KCC outputs when the code must run in either context.
In fact, this is what is always output anyway.
SKIPL R,A ; Fetch A, test sign bit.
TLC R,770000 ; If OWGBP, invert P&S bits.
ROT R,6 ; Get byte position into low-order bits.
SKIPL S,B ; Repeat for B.
TLC S,770000
ROT S,6
CAMx R,S ; Now can compare.
If the code will never run multi-section (no OWGBPs) then this will work,
for all values of P. KCC does not currently test for or output this.
MOVE R,A
MOVE R+1,B ; Set up A and B close together
ROTC R,6 ; Rotate both at once, swapping P fields!
CAMx R,R+1 ; Then compare.
If the code is multi-section only (just OWGBPs) then this can be used.
KCC does not currently test for or output this.
MOVE R,A
ROT R,6 ; Must get and shift each operand separately.
MOVE S,B
ROT S,6
CAMx R,S
POINTER ARITHMETIC - ADDITION/SUBTRACTION:
Addition is done with ADJBP and so works with any size byte.
For KA-10s there is an ADJBP simulation routine which likewise works
for any bytesize. It is not efficient, but then it is unlikely to be
needed nowadays.
Subtraction is much trickier. There are four possible situations
depending on whether the format (OWGBP or local) and byte size are known
or unknown.
Known format/size Unknown format/size
SKIPL 16,A
LSH 16,6
LSH 16,-30. ; Get size or PS
SUB A,B SUB A,B
MULI A,bpw<<bits MUL A,$BPMUL(16) ; 64-wd table
LSH A+1,-bits LSH A+1,@$BPLSH(16) ; 64-wd table
ADD A,$BPADD(16) ; 64-wd table
ADD A+1,<table>(A) ADD A+1,(A)
Unknown format/Known size Known format/unknown size
LDB 16,[$$BPSZ,,A] ; get PS from A
SUB A,B SUB A,B
MULI A,$$BPMn MUL A,$BPMUL(16)
LSH A+1,$$BSHF LSH A+1,$$BSHF
ADD A,$BPTAB(16)
ADD A+1,$BPTBn(A) ADD A+1,(A)
Currently KCC implements the latter two. The unknown-size algorithm is
used except in such cases (such as subtracting a pointer constant) where
the byte size is definitely known. Byte sizes 6, 7, 8, 9, and 18 are
supported, but no others. The actual values of the symbols used will
vary depending on whether the program is loaded for 0-section or N-section
operation.
The program PARITH.C, found in the KCC source directory, is
used to test various algorithms for pointer subtraction, and to
compute the values in the various magic tables.
POINTER ARITHMETIC - CAST CONVERSIONS:
These are a bit tricky since in some cases the actual instruction
that results is not known until load time. We defer this by assembling
special symbol references which are defined one way by the 0-section
runtime, and a different way by the N-section runtime; resolving those
references at load time thus produces either 0-section or N-section code
sequences. Naturally both forms for an operation must occupy the same
number of words! (Hence the JFCLs in some cases.)
Conversion of a byte pointer (any kind) to a word pointer is done with:
TLZ R,$$BPPS ; 770000 if multi-section, -1 if 0-section.
Conversion of a word pointer to a byte pointer is done with:
TLO R,$$BPmn
where m = byte size, and n = byte position. These symbols are set
appropriately for the section being loaded into.
Conversion of an 18-bit to 9-bit pointer:
Local Extended
TLZE R,007700 TLZA R,050000
TLO R,111100 JFCL
Conversion of a 9-bit to an 18-bit pointer:
Local Extended
TLZE R,117700 TLZ R,010000
TLOA R,002200 TLON R,060000
JFCL TLC R,030000
Naturally, the JFCLs could be eliminated if it was known at compile time
which section the code would run in.
Casts from other byte pointers to other byte pointers are done by
first converting to a word pointer and then to a byte pointer; no attempt
is made to preserve alignment in those cases.
STRUCTURE COPYING
The code sequence constructed depends on the size of the structure
(number of words) and whether the section number is known at compile time;
that is, whether KCC knows the result will run only in section 0, or section
N, or (as is normal) must be capable of running in either.
** P_SMOVE reg,addr+offset(idx) [plus Pbsize set to # words]
** reg = register containing destination word address
** addr+offset(idx) = source address
** Pbsize = # words to copy
If there are 3 words or less, this sequence is used:
XMOVEI 16,-1(R) ; Get destination address
PUSH 16,address ; Copy source words there.
PUSH 16,address+1
...
If there are 4 or more, and we know we will be 0-section:
MOVEI 16,(R) ; Get dest address
HRLI 16,address ; Make <source>,,<dest>
BLT 16,size-1(16) ; Use BLT to copy words!
If there are 4-10 inclusive, and we know we will be N-section:
<simple PUSHes as before>
If there are more than 10, known N-section:
PUSH P,14
PUSH P,15
MOVEI 14,size ; XBLT arg 1: <# words>
XMOVEI 15,address ; XBLT arg 2: <source>
XMOVEI 16,(R) ; XBLT arg 3: <dest>
EXTEND 14,[XBLT]
POP P,15
POP P,14
A combination of these is normally put together for the general case
(either 0-section or N-section) using JUMPGE 17, tests to jump if
multi-section or not. The resulting pastiches are not beautiful, but
they work as efficiently as possible.
FLOAT <-> INT CAST CONVERSIONS:
Casts from (float) to and from (int) are normally done using
the FIX and FLTR instructions.
(float)->(int):
FIX R,R ; For both signed and unsigned int.
(int)->(float):
Signed int Unsigned int (note R is a register pair!)
FLTR R,R LSHC R,-1 ; Shift sign bit down
FLTR R,R ; Float the shifted value
FSC R,1 ; Float-shift back up
CAIGE R+1,0 ; If low bit was lost,
FADRI R,(1.0) ; add it back.
NOTE: For the KA-10, which does not have FIX or FLTR, simulations of
these are used.
FIX simulation (R is a register pair!):
MUL R,0400 ; Get exponent in R, rest in R+1 */
TSC R,R ; If negative, make positive exponent */
ASH R+1,-243(R) ; Shift "rest" by right amount */
<result in R+1>
FLTR simulation:
FSC R,233
FSC has the disadvantage of losing high-order bits if attempting to
float an integer which has more than 27 significant bits, but there
doesn't seem to be any good alternative.
DOUBLE <-> INT CAST CONVERSIONS:
There are no instructions however for casting integers to double
precision floating point, so a code sequence must be used:
(int)->(double):
Signed int Unsigned int
ASHC R,-8 LSHC R,-9 ; Get integer into low word
LSH R+1,-1
TLC R,243000 TLC R,244000 ; Set proper exponent
DFAD R,[0 ? 0] DFAD R,[0 ? 0] ; Normalize the result
double->int:
; The internal pseudo-instruction P_DFIX R,M expands into:
DMOVE R,M
HLRE 16,R ;This mattered when shifts were bit-at-a-time
ASH 16,-11 ;Get just exponent (9 bits)
JUMPGE 16,.+3 ;Positive?
DMOVN R,R ;No, negate, orig sign still in 1B0[A]
; For KA-10 format this is DFN R,R+1.
TRC 16,777777 ;Watch for diff between twos and ones comp
TLZ R,777000 ;Bash exponent and sign ... now positive
; For KA-10 format, LSH R+1,10 goes here.
ASHC R,-233(16) ;Make an integer (may overflow)
CAIGE 16, ;Original negative? Check its sign.
MOVN R,R ;Yup, negate result.