Trailing-Edge
-
PDP-10 Archives
-
klu2_442
-
blt.mic
There are 5 other files named blt.mic in the archive. Click here to see a list.
.TOC "BLT - Neatly Optimized"
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
; ;
; This implementation of BLT is a complete rewrite of the ;
; instruction. BLT has been substantially optimized by splitting ;
; the copy loops into three separate loops: one for PXCT (which ;
; supports only in section copying, and is clean only in section ;
; zero or one), one for spraying a single word through a block ;
; of memory, and one for all other cases (usually normal block ;
; copies). In all of these cases we attempt to keep the Mbox ;
; busy as much as possible by starting each load on the same ;
; microinstruction that completes the previous store, thus ;
; eliminating a fair amount of Mbox dead time. (Stores cannot ;
; be similarly overlapped, due to the need to wait for the ;
; parity check on each load.) We also avoid the overhead of ;
; needless state register switching in the non PXCT cases. ;
; ;
; This code gives up on the backwards BLT idea entirely, since ;
; that broke many useful programs. ;
; ;
; --QQSV ;
; ;
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
.DCODE
251: EA, J/BLT ;BLT--adjoins EXCH
.UCODE
311: ;[440] Near EXCH
BLT: BR/AR,ARL_ARL,ARR_AC0,ARX/AD ;Generate initial dest address
=0 BRX/ARX,MQ_AR,AR_AR-BR,ARX/AD, ;MQ_current dest address. Get copy
SC_#,#/18.,CALL [REPLIC] ; count-1; set up for source grab
BR/AR,AR_SHIFT,ARX_ARX+1 (AD), ;Get source address half, and
GEN CRY18,SR_BLT(AC) ; bump both AC halves
AR_ARX-BR,GEN CRY18,ARX_AR ;Finish AC; keep initial source
AC0_AR,ARL_MQL,ARR_ARX ;Set AC; gen complete source addr
BR/AR,SKP P!S XCT ;MQ_source addr; test PXCT
;
; We are now just about set up. BR will contain the current source
; address; MQ will contain the current destination address. ARX will
; have -(count-1) of words remaining to be copied; it is incremented
; each time a word is copied. Thus, the copy terminates one word
; AFTER ARX becomes positive (this makes sure that we always copy at
; least one word). BRX will contain a copy of ARX that is used only
; if the instruction must quit prematurely; it is updated each time
; a store completes.
;
; Now figure out which loop to use. If the destination address -
; the source address = 1, we are spraying a word through memory.
;
=00 GEN BR,LOAD VMA(EA),ARX_BRX, ;Not PXCT. Read the first word
CALL [XFERW]
AR_MQ-1,SR_BLT(PXCT SRC),J/BLTPX;PXCT. Back up dest vma, set up SR
GEN MQ-BR-1,SKP AD NZ ;Got word. Are we spraying memory?
=
=0 GEN MQ,STORE VMA(EA),J/SPRAY ;Yes. Start first store
GEN MQ,STORE VMA(EA),SKP AC REF ;No. Test for AC reference
=0 SKP AC REF,J/BLTLOD ;Load from memory. Test store
ARX_ARX+1,SIGNS DISP,TIME/3T, ;Load from ACs. Leave SR alone and
J/BLTLUP ; test for completion
;
=0
BLTLOD: SR_BLT,ARX_ARX+1,SIGNS DISP, ;All references are to memory. Allow
TIME/3T,J/BLTLUP ; shadow AC reference. Test if done
XBLTGO: ARX_ARX+1,SIGNS DISP,TIME/3T, ;Store to ACs. Leave SR alone. Test
J/BLTLUP ; completion
;
; REPLIC--Subroutine to swap the ARX and the BRX, and to replicate
; ARR in ARL. It just happened to be useful. Return 1.
;
REPLIC: BRX/ARX,ARX_BRX,ARL_ARR,ARR_ARR,;Swap and replicate
RETURN [1]
;
; The main copy loop. The cache is as overlapped with the Ebox
; as possible. (Recall that we cannot immediately store data fresh
; from memory; the AR_MEM forces an extra cycle of delay for parity
; checking.) The SR has been used to set up the proper VMA context
; for shadow AC reference, so we can use the same loop even if ACs
; are involved.
;
=0
BLSTOR: MQ_MQ+1,VMA/AD,STORE,ARX_ARX+1, ;Increment dest VMA and count,
SIGNS DISP,TIME/3T ; start store. Are we done?
=0111
BLTLUP: FIN STORE,I FETCH,J/NOP ;Count expired. All done
FIN STORE,BRX/ARX,AR_BR+1, ;More to do. Increment source VMA,
VMA/AD,LOAD AR,SKP INTRPT ; start the fetch, and test int
=0 BR/AR,AR_MEM,J/BLSTOR ;No int. Wait for load and loop
AR_MEM,SR DISP,J/CLEAN ;Interrupted. Freeze BLT or XBLT
;
; If we are spraying memory, we can use VMA_VMA+1 which preserves
; globality. Thus it does not matter whether ACs are in use here.
; Indeed, once it gets started, PXCT can use this loop too.
;
SPRAY: ARX_ARX+1,SIGNS DISP,TIME/3T ;Copying only one word?
=0111
SPRAYL: FIN STORE,I FETCH,J/NOP ;Could be. Spray done
SKP INTRPT ;More to do. Test interrupt
=0 FIN STORE,BRX/ARX,VMA_VMA+1, ;No interrupt. Proceed to next
STORE,ARX_ARX+1,SIGNS DISP, ; store, increment count, and
TIME/3T,J/SPRAYL ; test completion
MEM_AR,BRX/ARX,SR DISP,J/CLEAN ;Interrupted. Freeze and clean up
;
; Finally, the PXCT case. We will optimize spraying memory (at
; this writing, TOPS-10 still uses BLT to do that in some cases).
; Note that this can be used only for copying within a section
; (usually zero). The state register must be swapped at each
; operation (unless we are spraying memory) to activate the proper
; PXCT bits. SR bit 0 is off in order to force AC context.
;
=0*
BLTPX: MQ_AR,VMA_BR,LOAD AR,ARX_BRX, ;Set up dest addr and count, do
SR_BLT(PXCT DST),CALL [XFERW];first load, and shuffle SR
GEN MQ-BR,SKP AD NZ ;Is this a single word spray?
=0 VMA_MQ+1,STORE,ARX_ARX+1, ;Yes. Store first word here
SIGNS DISP,TIME/3T,J/SPRAYL ; and use standard loop
PXPUT: MQ_MQ+1,VMA/AD,STORE,ARX_ARX+1, ;Bump dest VMA and count, start
SIGNS DISP,TIME/3T ; store, and test completion
=0111 FIN STORE,I FETCH,J/NOP ;All done. Blow out of here
SR_BLT(PXCT SRC) ;More to do. Do the SR shuffle
FIN STORE,BRX/ARX,AR_BR+1, ;Terminate store, freeze count, tick
INH CRY18,VMA/AD,LOAD AR, ; source VMA, start load,
SKP INTRPT ; and test for interrupt
=0 BR/AR,AR_MEM,SR_BLT(PXCT DST), ;No interrupt. Wait for load and
J/PXPUT ; swap state register
AR_MEM,J/BLTFIX ;Interrupt. Common fixup code
;
; If we get a page fault or an interrupt in the middle of this we
; end up here. The BRX keeps an accurate count of the actual
; transfers complete. We must subtract one from it, and add the
; result to both halves of AC0. ARX is set to -1 on the way in.
;
=0
BLTPGF: AR_ARX+BRX,CALL [REPLIC] ;Set up both halves of AR
AR_AR+FM[AC0],INH CRY18,J/PGFAC0;Update AC0, then fault or int
.TOC "XBLT--Also Neatly Modernized"
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
; ;
; This XBLT rewrite makes use of what we learned when we up- ;
; graded BLT--indeed, it shares code in a couple of cases. ;
; Once again, we split this into separate loops, depending ;
; upon the copy circumstances. If we are copying forward (the ;
; most common case), we distinguish the core clearing case, the ;
; normal copy, and PXCT that does not clear core, using the BLT ;
; code for the first two cases. If we are copying backward, we ;
; do not attempt to optimize clearing core, and there is no need ;
; for a separate PXCT loop. As a result, we have only one loop ;
; for that case. ;
; ;
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;
; The dispatch for this instruction has been rewritten so that
; EXTEND special cases op code 20. As a result, we no longer
; need special DRAM for this instruction. We get here with
;
; ARX_AC0,SKP AD NZ ;Get copy count. Anything to do?
;
=0
XBLT: I FETCH,J/NOP ;No. This is easy
BRX/ARX,AR_AC1 ;Yes. Fetch source address
BR/AR,AR_ARX,ARX_AC2,SR_XBLT(SRC);Save it, grab destination address
FM[T0]_AR,AR_AR+BR ;Save initial count; gen and save
AC1_AR,AR_ARX+BRX,MQ_ARX ;final source addr; MQ_dest address
AC2_AR,AR_0.M,ARX_-BRX ;Save final dest addr
AC0_AR,ARX_ARX+1,SIGNS DISP, ;Clear final count; set up internal
TIME/3T ; count. Going forward or backward?
=0111 ARX_BRX COMP,J/BLBACK ;Backward. Fix internal count first
BRX/ARX,VMA_BR,LOAD AR, ;Forward. Get first word
SR_XBLT(DST) ; and switch SR
GEN MQ-BR-1,SKP AD NZ ;Are we spraying memory?
=0 AR_MEM,J/XSPRAY ;Yes. Wait; then start spray
AR_MEM,SKP P!S XCT ;No. Is this PXCT?
=0 VMA_MQ,STORE,STORE,J/XBLTGO ;No. Store first word and loop
MQ_MQ-1 ;Yes. Back up to get correct start
;
; The PXCT case. As with BLT, this requires state register swapping.
;
XBLPX: MQ_MQ+1,VMA/AD,STORE,ARX_ARX+1, ;Increment dest pointer, store word
SIGNS DISP,TIME/3T ;Are we done yet?
=0111 FIN STORE,I FETCH,J/NOP ;Yes. Leave now
SR_XBLT(SRC) ;No. Switch state register
FIN STORE,BRX/ARX,AR_BR+1, ;Terminate store, tick source, and
VMA/AD,LOAD AR,SKP INTRPT ; start next load. Interrupted?
=0 BR/AR,AR_MEM,SR_XBLT(DST), ;No. Wait for load and swap SR
J/XBLPX
AR_MEM,J/XBLFIX ;Yes. Clean up, then interrupt
;
; Spray a word through memory. Get it started.
;
XSPRAY: VMA_MQ,STORE,J/SPRAY ;Start spray properly for XBLT
;
; Copy is going backwards. We must spend a cycle to generate a -1
; (since there is no way to generate something like AR_BR-1); the
; extra cycle gives us time to swap the state register. Thus there
; is no need for special PXCT code. We could make a backwards
; memory spray run faster, but it doesn't seem worth the bother.
; Note that both source and destination start out at one greater
; than the actual first address used, so we do not need special
; startup code.
;
BACLUP: MQ_MQ-1,VMA/AD,STORE, ;Decrement destination pointer and
SIGNS DISP,TIME/3T ; store word. Are we done?
=0111
BLBACK: MEM_AR,BRX/ARX,ARX_1S, ;More to do. Keep count and set to
SR_XBLT(SRC),J/BACKLD ; decrement count and source addr
FIN STORE, I FETCH,J/NOP ;All done. Get out
;
BACKLD: AR_ARX+BR,VMA/AD,LOAD AR, ;Decrement pointer, fetch next word
ARX_ARX+BRX,TIME/3T,SKP INTRPT; Decrement count. Interrupted?
=0 BR/AR,AR_MEM,SR_XBLT(DST),J/BACLUP;No. Wait; then loop on
AR_MEM,J/XBLFIX ;Yes. Clean up before taking it
;
; XBLT freeze comes here. If we have an interrupt, we must restore
; all three ACs to a usable state. T0 tells which direction we are
; going (the restore count must be generated differently for each
; direction).
;
; GEN FM[T0],SKP AD0 ;Which way are we copying?
=0
XBLFRZ: ARX_1S,J/FORSUB ;Forward. Must subtract one
AR_BRX+1 ;Backward. Add one to count
FORFRZ: BR/AR,ARX_AR,AR_AR+FM[AC1],SR_0 ;Keep count. Adjust source addr
ARX_ARX+FM[AC2] ;Adjust dest addr
AC1_AR,AR_ARX ;Restore source
AC2_AR,AR_-BR,J/PGFAC0 ;Restore dest and count. Done
;
FORSUB: AR_ARX+BRX,J/FORFRZ ;Subtract one and rejoin