Trailing-Edge - PDP-10 Archives - BB-H348C-RM_1982 - swskit-v21/certification/25pag.asc
There are no other files named 25pag.asc in the archive.
  NWSM:  (AV) -- The average number of working sets in memory during the
       interval.   If  this  number is significantly larger than NRUN or
       NBAL, then working sets are not being forced out of  memory  when
       processes  go  into  a  wait state (like terminal input wait) and
       consequently response times should not  be  greatly  affected  by

  BSWT:  (AV) -- The average number of processes in the Balance Set that
       are  waiting  for  the  completion  of some event.  Normally this
       number reflects the number of processes waiting for a page to  be
       read  in  from  the  disk.  If NBAL - BSWT is less than one, then
       there are not enough runnable processes in memory to keep the CPU
       busy  100%  of  the  time.   In  this  case, SWPW or FILW will be

  DSKR:  (%) -- Percentage of the processes in Balance Set  Wait  (BSWT)
       that are waiting for a file page to be read into memory.

  DSKW:  (%) -- Percentage of the processes in Balance Set  Wait  (BSWT)
       that are waiting for file pages to be written back to the disk.

  SWPR:  (%) -- Percentage of the processes in Balance Set  wait  (BSWT)
       that  are  waiting  for a page to be swapped into memory from the
       swapping area of the disk.

  NLOD:  (PM) -- The average number of working sets loaded  into  memory
       per minute.

  CTXS:  (PS) -- The average number of context switches performed by the
       scheduler  per  second.   A  context  switch is made whenever the
       scheduler decides to stop running one process and  start  running
       another   process.    This   happens  when  the  running  process
       voluntarily blocks, or it faults on a page that is not in memory,
       or  when  a  higher  priority  process is ready to run.  Since it
       takes CPU time to perform a context switch, CTXS directly affects

  UPGS:  (AV) -- The average number of pages assigned to processes  with
       loaded  working  sets.   These processes may or may not be in the
       Balance Set, but they are allocated memory.

  FPGS:  (AV) -- The average number of physical memory  pages  that  are
       currently  available for swapping in user processes.  The monitor
       normally keeps between 20 and 100 free page.   The  monitor  uses
       these  pages  (and  the  rest of memory not in use by balance set
       processes) as a page cache.  For example, if a  process  reenters
       the balance set after waking up from a blocked state and it still
WATCH                                                            Page 12

       has some of its pages in memory in the free page pool, then those
       pages  are  used directly without requiring any disk I/O.  It has
       been demonstrated that this cache  plays  an  important  part  in
       overall  system  performance.   Therefore, if FPGS is very small,
       the system performance has most likely been degraded.

  DMRD:  (PS) -- The number of reads per second  made  to  the  swapping

  DMWR:  (PS) -- The number of writes per second made  to  the  swapping

  DKRD:  (PS) -- The number of reads per second made to the file system.

  DKWR:  (PS) -- The number of  writes  per  second  made  to  the  file

  TTIN:  (PS) -- The number of terminal input  characters  received  per
       second  from all terminals on the system.  This includes real and
       pseudo-terminals as well as echoed characters.

  TTOU:  (PS) -- The number of terminal characters output per second  by
       all jobs on the system.  This includes real and pseudo terminals.

  WAKE:  (PS) -- The number of process wakeups per second.  Some of  the
       types of wakeups that fall into this category are:
                     Terminal Input    
                     Terminal Output   
                     Process Termination       

  TTCC:  (PS) -- The  number  of  terminal  interrupt  characters  (e.g.
       control-c) typed per second.

  TDIO:  (PS) -- The aggregate number of disk pages read or written  per
       second  to  both  the  file system area and to the swapping area.
       Normally 60 pages per second for a one channel and 100 pages  per
       second  for  a  two  channel  system are saturation levels.  This
       variable is the summation of DMRD, DMWR, DKRD and DKWR.

  RPQS:  (PS) -- The average number  of  pages  per  second  which  were
       retrieved  from  the  replaceable  queue in order to satisfy page
       faults.  This number of pages faults (per second)  don't  require
       disk I/O.

  GCCW:  (PS) -- The average number of pages per second which were freed
       by  global  garbage  collections.   See  TCOR  and  NCOR for more
WATCH                                                            Page 13


  XGCW:  (PS) -- The average number of pages per second which were freed
       by  local  garbage  collections  on  specific  processes.   These
       garbage collections remove pages from  a  process's  working  set
       which have not been used in a long time.

  KNOB:  (value) -- This is the setting of the "bias control" knob.  The
       settings  of this knob range from 1 to 20 and provide a mechanism
       for the  administrator  to  favor  interactive  or  computational
       users.  Not all 20 settings are implemented (only 5 are unique in
       release 4 of TOPS-20).  However, a setting of 1  will  cause  the
       scheduler to heavily favor interactive jobs, a setting of 20 will
       cause the scheduler to heavily favor computational  jobs,  and  a
       setting  of  11  (the  normal  default)  will  provide a balanced

       Each active process is placed by the scheduler in one of 6 queues
       depending  on  the  process's recent history.  The values in this
       display represent the portion of USED time allocated to processes
       which were in the respective scheduling queues.

       The first queue is only used by Job 0 and by jobs in the  special
       high  priority  category.   Normally  the  percentage of run time
       accumulated in this queue is small.

       The second and third queues are the interactive queues.   If  the
       sum of these two values is high, then there is a high interactive
       load on the system.

       The last three queues are the  computational  queues.   Processes
       only  move onto these queues if they have entered a compute bound
       phase.  If the sum of these three values is high, then the system
       load is primarily computational.

       When the class scheduler is  turned  on,  interactive  users  are
       scheduled  in  queue  order  while  processes  in the lower three
       (computational) queues are given priority on the basis  of  their
       class's distance from its target share.

4.3  The Load Averages Section

The term "Load Average"  refers  to  the  average  number  of  processes
simultaneously  demanding  service over some interval of time.  The load
average display looks like the following:

        LOAD AVERAGES:          5.29    4.06    3.39 
        HIGH QUEUE AVERAGES:    3.76    2.86    2.25 
        LOW QUEUE AVERAGES:     1.54    1.20    1.14 
WATCH                                                            Page 14

        CLA   SHR    UTIL 

        0   80.00    0.00    0.00    0.00    0.00 
        1   15.00    0.00    0.00    0.00    0.00 
        2    5.00    0.00    0.00    0.00    0.00 


       The system keeps three exponential load averages.   These  values
       represent  the  average  load  over the last 1 minute, the last 5
       minutes, and the last 15 minutes.  These numbers can be  used  to
       estimate  the expected elongation of the elapsed time required to
       run a program.  If the system load average  equals  X,  then  the
       approximate elapsed time required to run an additional program on
       the system is at least  (1+X)*Y,  where  Y  is  the  stand  alone
       elapsed  time  required  to  run  this program.  If the system is
       swapping heavily, it may require a great deal more time.

       By comparing the  three  load  averages,  it  can  be  determined
       whether the system load is rising or falling.


       These values are the components  of  each  of  the  load  average
       values that are attributable to interactive jobs.


       These values are the components  of  each  of  the  load  average
       values  that  are attributable to computational jobs.  The sum of
       the high queue average and the low queue average equals the  load


       When class scheduling is being utilized, this section will be  of
       interest.   Information  about the classes defined for the system
       are presented with the following information.

       1.  CLASS NUMBER -- Under the column CLA,  the  class  number  is

       2.  SHARE -- For each  class,  its  share  of  the  processor  is
           displayed  under  the  column SHR.  This share corresponds to
           the percentage of the CPU  which  the  monitor  will  try  to
           distribute among the jobs in this class.
WATCH                                                            Page 15

       3.  UTILIZATION -- For each class, the  utilization  of  the  CPU
           actually  achieved  is  displayed under the UTIL column.  The
           value here should be less than the  share  unless  the  class
           received  "windfall"  because other classes did not use their
           entire share.

       4.  Load Averages -- The 1 minute, 5 minute, and 15  minute  load
           averages  are  presented for each of the classes.  These load
           averages may  appear  to  be  very  large  because  they  are
           computed as follows:
                                          NUMBER OF PROCESSES IN 
                                          CLASS MAKING SIMULTANEOUS
                  CLASS LOAD AVERAGE =  ---------------------------
                                         MAXIMUM (SHARE, UTILIZATION)
           Thus if there are 5 processes making simultaneous demands  in
           a  class  with  a  share  of  20%  (which  has  achieved  15%
           utilization), then that class's load average is  25  (5/.20).
           Thus  a  user  should  expect a compute bound task to take at
           least 25 times as long as it would on a  stand  alone  system
           with 100% of the CPU.

       If the class scheduler is not running then  the  utilization  and
       load averages will all be zero.

4.4  Directory Cache Statistics

The directory cache statistics are only available when an  enabled  user
requests  "ALL" to the question "Print monitor statistics ?".  There are
three pieces of information in this display  which  are  illustrated  as

        Directory Cache hits: 175
        Directory Cache Misses - Cache Full: 0
        Directory Cache Misses - New Entry Added: 321

The directory cache mechanism in the monitor holds retrieval information
on a number of directories.  Whenever a directory access  is  made,  the
cache  is first interrogated to determine if the directory is there.  If
it is not the  retrieval  information  is  "cached".   If  the  accessed
directory  is  there,  then  the retrieval information from the cache is
used.  The statistics indicated the effectiveness of this cache.

  Directory Cache hits:  -- This is the  number  of  times  an  accessed
       directory was found in the cache.

  Directory Cache Misses - Cache Full:  -- This is the number  of  times
WATCH                                                            Page 16

       an  accessed  directory  was  not  found in the cache and all the
       cache slots were filled with active directories.   In  this  case
       the  most  recently  accessed  directory  cannot  be put into the

  Directory Cache Misses -New Entry Added:  -- The number of  times  the
       accessed  directory  is  not  found  in  the  cache, but room was
       available to add it (possibly in lieu of an inactive entry).

       The "hit ratio" is computed by HITS/(HITS+MISSES) and provides  a
       good  indication of the cache's effectiveness.  For instance, the
       in the example is  175/(0+321)  or  35%  (which  isn't  all  that

4.5  Normal Per-job Information

     The normal per-job information which  is  available  to  all  users
running  WATCH  (not just enabled users) and consists of a line for each
job which had an active process during  the  interval.   The  statistics
reported for each job include the following.


      0 DET OPERATOR    SYSJOB     0.01   0.2   0.00  80.00
      2 207 OPERATOR    BATCON     0.01   0.2   0.00  80.00
      3 210 OPERATOR    NETCON     1.62  44.9   0.00  80.00
      7 143 MCKIE       WATCH      0.63  17.5   0.00  80.00
     17  21 D.SCHEIFLER EXEC       0.10   2.9   0.00  80.00
     18 154 SEARS       EMACS      0.00   0.1   0.00  80.00
     20 121 ORAN        EMACS      0.09   2.5   0.00  80.00
     21  51 HARRELSON   EXEC       0.03   0.7   0.00  80.00
     22 DET OPERATOR    PERF       0.08   2.4   0.00  80.00
     23 170 HALLYBURTON VDIREC     0.30   8.4   0.00  80.00

  JOB -- The is the job number assigned by  the  system  when  the  user
       logged in.

  TTY -- This is the number of the terminal that is being  used  by  the
       user  running this job.  "DET" means that the job is not attached
       to a controlling terminal (i.e.  DETACHED).

  USER -- This is the name of the directory that the user logged into.

  PROGRAM -- This is the name of the program being run or  the  name  of
       the  EXEC  command being used.  Please note that the program name
       is obtained at the time sample is taken.  It is not  possible  to
       tell  if  the  program  or  command was running during the entire
WATCH                                                            Page 17

  DELTA RT -- This is the incremental amount of runtime (CPU time) which
       the job used during the interval.  This is in seconds.

  %RT -- The percentage of the interval represented  by  the  DELTA  RT.
       The sum of all %RT values is used to compute the SUSE.

  JU -- Job Utilization.  When the class scheduler is running this  will
       normally  be a non-zero value.  It represents the CPU utilization
       accumulated by this job and charged to  the  job's  class  share.
       Because  the  class  scheduler  tries  to  divide the class share
       equitably among all active users in the class, computational jobs
       within the same class should normally receive nearly the same job

  CSH -- Class Share.  When the class scheduler is  running,  this  will
       reflect the class's share divided by the number of active jobs in
       that class.  This then is the target share for the job.  Normally
       the job utilization (JU) will be less than a job's class share.

4.6  Expanded Per-job Information

     Most of the information presented in this section  is  obtained  by
setting  break  points  in  the monitor with the SNOOP JSYS.  Thus, this
information is only available to users who are running WATCH with either
WHEEL or OPERATOR privileges enabled.

The presented information occupies  a  full  132  character  line.   For
explanation purposes it will be broken up as follows:

     Job identification Information

          JOB   TTY   USER   PROGRAM....

     Job Utilization Information


     Memory, Response, and Disk Information


The "...." indicates  that  there  are  other  variables  preceeding  or
WATCH                                                            Page 18

4.6.1  Job Identification Information - 

        JOB TTY USER        PROGRAM ......
          0 DET OPERATOR    SYSJOB
          2  21 D.SCHEIFLER ICP     
          4 210 OPERATOR    NETCON  
          7  72 LIBMAN      AID     
          9 117 LEAPLINE    EXEC    
         10   2 OPERATOR    BASIC   
         12  50 GUNN        EMACS   
         15 DET OPERATOR    PERF    
         16 215 OPERATOR    IBMSPL  
         19  37 ACARLSON    PTYCON  
These variables have the  same  definition  as  in  the  normal  per-job

  JOB -- The is the job number assigned by  the  system  when  the  user
       logged in.

  TTY -- This is the number of the terminal that is being  used  by  the
       user  running this job.  "DET" means that the job is not attached
       to a controlling terminal.

  USER -- This is the name of the user account of the logged in job.

  PROGRAM -- This is the name of the program being run or  the  name  of
       the  EXEC  command being used.  Please note that the program name
       is obtained at the end of the interval.  and it is  not  possible
       to tell if the program or command was running the entire time.

4.6.2  Job Utilization Information - 


   0       1.0  14.3  8.6      72.9  5.7  4.0  8.8       
   2       0.1   0.7 18.0      82.0                        
   4       0.6   3.6 19.3      34.8 38.4  5.8  1.8         
   7       0.0   0.3 15.5      19.3 65.2                   
   9       0.5   1.2 43.0      15.7      35.6  5.7         
  10       9.7  49.8 20.7      63.5  2.9 11.9  0.9       0.5
  12       0.1   0.4 29.4      70.6                        
  15       1.0   6.0 19.4      20.2  1.1 16.0 43.3         
  16      22.4  38.7 61.5      38.5                        
  19       1.1   5.7 27.7      72.3                        

  %RT (%) -- The percentage  of  the  interval  during  which  this  job
WATCH                                                            Page 19

       actually received CPU time.

  DEMD (%) -- Summation of the percentage of the interval  during  which
       each process in the job was active.  A process is active if it is
       competing for usage of the CPU, the disk, or a resource for which
       the  wait  time is reasonably short (less than 100 milleseconds).
       If only one process in the job is  simultaneously  active  during
       the interval (the normal case) then the DEMD percentage will vary
       from 0% to 100%.  If more than  one  process  was  simultaneously
       active,  then  the percentage could exceed 100% (if all processes
       in the job were active  for  the  interval,  the  DEMD  would  be

       The rest of the variables in this section indicate what  the  job
       was  doing  during  its  "active"  period.  Only the USED portion
       indicates actual resource consumption.  The others represent  the
       amount  of  time  spent in various short term wait states.  These
       statistics are all expressed as percentages of DEMD and thus they
       sum to 100%.  When assessing the importance of the statistics for
       a specific job, you should mulitply these percentages by DEMD  to
       get the percentage of the interval time.

       USED (%) -- The percentage of the DEMD time that the processes in
            this job spent using the CPU.

       GRDY (%) -- The percentage of DEMD that  processes  in  this  job
            were  runnable  but  could  not  fit  in the balance set.  A
            process must be in the balance set before it will be  chosen
            by  the  scheduler  to  run.   The  most  common  cause  for
            processes to be on this list is that  there  is  not  enough
            memory to hold all runnable jobs.

       BRDY (%) -- The percentage of DEMD that  processes  in  this  job
            were  in  the  balance  set but were not being run.  Usually
            processes in this state are waiting for their  turn  to  use
            the CPU.

       SWPR (%) -- The percentage of DEMD that  processes  in  this  job
            waited   on  page  faults  from  the  swapping  area  to  be

       DSKR (%) -- The percentage of DEMD that  processes  in  this  job
            waited for file pages to read in from the disk.

       DSKW (%) -- The percentage of DEMD that  processes  in  this  job
            waited for file pages to be written to the disk.

       RPQW (%) -- The percentage of DEMD that  processes  in  this  job
            waited  for  a  physical memory page to become available for
            swapping into.  Usually  when  time  is  accumulating  here,
            there is a shortage of memory on the system.  
WATCH                                                            Page 20

       OTHR (%) -- This last category is the  percentage  of  DEMD  that
            processes in this job spent in any of the other wait states.

4.6.3  Memory, Response, And Disk Information - 


    0    917.1   1  165  0.06  6 410.0   7.7   20   17  45  40
    2     99.9   0   12  0.07  6   8.0   7.0    0    0        
    4    192.5   3   28  0.11  6  37.0   7.2   29    4  57  25
    7     78.2   0    1  0.39  6  27.0  25.0    8    0  32   7
    9     99.9   0   12  0.12  2  56.0  54.8    0    9  57  69
   10    198.1   4   95  0.12  5 124.0  54.2   32  109  62  87
   12    199.8   0   17  0.03  3  18.0   7.6    0    0        
   15     99.9   0   12  0.61  5  27.0  17.2    3   24  45  51
   16     99.9   0  357  0.13  2  22.0  20.6    0    0        
   19     99.9   0  329  0.02  4  15.0  13.1    0    0        

  IMEM (%) -- The percentage of the time that the working sets  for  the
       processes which makeup the job are in memory.  This number is the
       summation of the percentages for each process and thus may exceed

  NLD (CNT) -- The number of times a working set for processes  in  this
       job  were  loaded  into  memory.  If this number is zero, then no
       working sets were loaded during the interval.  Because a  process
       had  to  be  active before it could be reported on, this can only
       occur if the working sets were in memory for the whole interval.

  NRSP (CNT) --The number of responses  that  the  job  had  during  the
       interval.   A  response  is a counted whenever a process wakes up
       for one of the reasons specified under WAKE:.

  RESP (AV) -- The average response time in seconds during the interval.
       The  time  for  each response is defined as the elapsed time from
       when an event for which a process is waiting  has  completed,  to
       the  time  that  the  process  goes  back into a wait state after
       having responded to the event.  Responses that require more  than
       2 seconds of CPU time to finish are not counted in this column.

  SR () -- The "stretch ratio" for each response (which was  represented
       in  RESP).  The stretch ratio is obtained by dividing the elapsed
       time of each response by the compute time required to satisfy it.
       (SR  =  elapsed  time/ CPU time).  The only responses counted are
       those which require less than 2 seconds of CPU time to  complete.
       Thus  the  stretch  ratio  is  the  elongation  perceived  by the
       interactive user and not the computational user.
WATCH                                                            Page 21

  WSS (SUM) -- The sum of the maximum working set size demanded by  each
       active process in the job.

  UPGS (AV) -- The average number of pages actually  in  memory  when  a
       process from the job is in the balance set.

       This is obtained  by  integrating  the  instantaneous  number  of
       assigned  pages  (FKWSP) over all time that the process is in the
       balance set.  The average used pages is then the integral divided
       by the accumulated time, i.e.

       UPGS = SUM all processes (integral FKWSP dt / T in balset)

  SWPR (CNT) -- The number of times a process in the job  waited  for  a
       faulted page to be read in from the swapping area.  This does not
       include pages which were preloaded by the working set manager.

  DSKR (CNT) -- The number of times a process waited for a  for  a  disk
       input  to  complete.   Because many programs prefault pages, this
       count will be different from the actual number of pages read.

  TPF (AV) -- The average number of milliseconds that it took to satisfy
       each page fault for this job during the interval.

  IFA (AV) -- The "inter-fault  average".   This  value  represents  the
       average compute time in milliseconds between page faults for job.
       A large "IFA" means that the working sets of  processes  in  this
       job are very stable.

4.7  Description Of The System Utilization Statistics

There are two parts to the system  utilization  statistics.   The  first
part  consists  of  summaries  for  expanded  per-job statistics.  These
include summaries for the job utilization information  and  the  memory,
response  and  disk information.  The second part consists of additional
system variables and several computations.

4.7.1  System Summary Of Per-job Variables - 

These statistics are listed under the per-job  statistics  on  the  line
which begins:

System Summary .......

The values might appear as follows:

     339.2 28.6      51.9  5.0 11.5  3.0  
WATCH                                                            Page 22

      4102.9  37 1772  0.13  3 2168. 231.8  370  911  52  90

  DEMD (%) -- The summary value for the DEMD column is the sum  of  each
       item  in the column.  This represents the total demand put on the
       system over the interval.


       These values represent the average percentage of  the  DEMD  time
       that the jobs were in these states.

  IMEM (SUM) -- The summary is the summation of all the per-job  values.
       It  is  significant  as  an  indicator  of  how many working sets
       belonging to processes which were active during the interval were
       simultaneously   in  memory.   For  instance  the  value  4102.9%
       indicates that approximately 41 working sets belonging to  active
       processes  were  simultaneously  in  memory.   This number can be
       compared with NWSM:.

  NLD (SUM) -- This is the number of  working  sets  loaded  during  the
       interval.  It should correspond to the rate given by the variable

  NRSP (SUM) -- The summation of the number  of  responses  counted  for
       each job during the interval.

  RESP (AV) -- Summary value for the RESP column is the average response
       time  for those responses measured (requiring less than 2 seconds
       of CPU time) during the interval.

  SR (AV) The summary value for the SR column  is  the  average  stretch
       ratio  for interactions which required less than 2 seconds of CPU

  WSS -- This is the arithmetic sum of the WSS value for each job.  This
       represents  the  maximum  amount  of memory which would have been
       required during the interval if  all  active  processes  achieved
       their largest size at the same time and were all in memory.

       SYS WSS = SUM all jobs (JOB WSS)

  UPGS (AV) -- This summary value represents the average number of pages
       needed  by  the  active processes at any specified point in time.
       Specifically it is the working set page integrals  for  all  jobs
       divided by the interval time.

       SYS upgs = SUM all jobs (integral FKWSP dt) /T interval
WATCH                                                            Page 23

  SWPR (SUM) -- This is the total number of swap reads done by  jobs  on
       the  system  in  response  to page faults.  This does not include
       pages preloaded by the working set manager.

  DSKR (SUM) -- This is the total number of disk pages read which caused
       processes to wait.

  TPF (AV) -- The average time required to wait for a page  fault  (swap
       or disk) to be resolved.

  IFA (AV) -- The average amount of compute time each job spends between
       page faults.

4.7.2  Additional System Variables And Computations - 

Other information in this section includes additional system  statistics
not  available  in  the  system  statistics  section and computations of
various other variables.  It looks like the following:

    TOTRC:  1992   LOKPGS:   104   SHR PGS:   245   AVAIL MEM:  1888
    NRUN MIN,MAX:           1   11
    SUMNR MIN,MAX:       1879 2073
    NRPLQ MIN,MAX:         28  170
    SYS MEM DMD =                                 255.2
    SWAP RATIO (SUM WSS / AV MEM) =                 1.15
    ACTIVE SWAP RATIO (DMD/AVMEM) =                 0.14
    AV WS SIZE =                                   28.84

TOTRC (CNT) -- The number of physical memory pages available.   This  is
       the  total  physical  memory  minus  the  number  required by the
       resident monitor.

LOKPGS (CNT) -- The current number of pages locked down by  the  monitor
       beyond  the  resident  monitor  pages.   Out of this set of pages
       comes  the  terminal  buffers,  magtape  buffers,  line   printer
       buffers,  and  other pages locked down during certain file system

SHR PGS (CNT) -- This is the  number  of  physical  memory  pages  being
       shared  by  more than one process at the end of the interval.  It
       is included in the count "AVAIL MEM".

AVAIL MEM (CNT) -- This is the difference between "TOTRC" and  "LOKPGS".
       This  is  the  actual  number  of pages available for use by user
WATCH                                                            Page 24

NRUN MIN, MAX - These values are  the  minimum  and  maximum  number  of
       simultaneously active processes during the interval.

SUMNR MIN, MAX -- These values are the minimum  and  maximum  number  of
       pages belonging to working sets in memory during the interval.

NRPLQ MIN,MAX:  -- The minimum  and  maximum  number  of  pages  on  the
       replaceable queue during the interval.

SYS MEM DMD = -- The system average memory demand derived  by  computing
       the  integrals of the memory forecast for each process during its
       active period, summing over all processes  and  dividing  by  the
       interval  time.  Whereas the "system summary UPGS" is the average
       amount of memory actually in use at any point in time, this value
       is the average amount forecast at any point in time.

       SYS MEM DMD = SUM all jobs (integral FKNR dt) / T interval

SWAP RATIO (SUM WSS / AV MEM) = -- The Swap  Ratio  is  the  system  WSS
       divided  by  the  amount  of  available  main memory.  If this is
       greater than one, it represents the amount by which  main  memory
       would have to be increased to avoid any swapping.

ACTIVE SWAP RATIO (DMD/AVMEM) = -- The active swap ratio is  the  system
       average  core  demand  divided  by  the  amount of available main
       memory.  If this number is greater than one,  it  represents  the
       amount  by  which  main memory would have to be increased to hold
       all jobs wanting to run simultaneously.

MEM UTILIZATION ((UPGS+SHRPGS)/AVMEM) = -- The memory utilization is the
       system used pages divided by the amount of available main memory.
       For active swap ratios greater than 1, this  indicates  how  well
       the monitor is doing in keeping memory used.

AV WS SIZE = - The  average  working  set  size  is  computed  from  the
       integrals  computed  from the working set demands over the active
       period of each process divided by the sum of the  active  periods
       of each process.

AV CPU TIME (MS) PER INTERACTION -- This is average amount of  CPU  time
       which a job spends between each response.

THINK TIME (SEC) PER INTERACTION -- This is the average  time  spent  by
       the user between the time the system requests a response and when
       that response is received.
WATCH                                                            Page 25

4.8  Disk I/O Statistics

                        DISK I/O

CHN,UNIT        SEEKS           READS           WRITES
0,6             380             485             300             PS #1
0,7                                                             REL4 #0
1,0             49              185             34              SNARK #0
1,1             2               6                               LANG #0
2,3             57              68              32              MISC #0
2,5             602             652             449             PS #0

These statistics display the following information:

  CHN,UNIT -- The channel number and the unit number on that channel  to
       which the disk is connected.

  SEEKS -- The number of times the disk heads had to be moved to get  to
       the  next  request during the interval.  If multiple requests can
       be answered on the same cylinder, then no seek will take place.

  READS -- The number of pages read on this unit during the interval.

  WRITES -- The  number  of  pages  written  on  this  unit  during  the

  --no title-- The name of the structure and its  relative  unit  number
       within the structure.

4.9  Tune Mode Statistics

     Tune Mode is designed to  display  some  of  the  more  interesting
statistics  on  one  line so that a system programmer can easily monitor
the changes in load during a test period.  This mode is  useful  when  a
very  short interval is desired (around 10 seconds).  The information is
abstracted  from  the  "system  statistics"  and   "system   utilization
statistics" sections and includes the following information:

    52.8   2.4   6.8  41.2  16.6  13.1   2.2  62.5  6.87  34.8

          41.7   2.3  19.5   1.8              2207.9   0   198  0.05   2

The  definition  of  the variables on the first row can be obtained from
the system statistics section and the definitions on the second row from
WATCH                                                            Page 26

the system utilization statistics section.


     Thus far the  discussion  in  this  document  has  centered  around
explanation of the variables in the WATCH output.  This section presents
some heuristics with which systems can be evaluated.  The heuristics are
computations  which  normally  provide  some meaningful insight into the
effectiveness of a specific configuration in satisfying the  demands  of
the  workload.   They  also  help to determine which users are consuming
more of the system resources than they should.  These heuristics are not
guarenteed to be accurate for all systems.  They are useful, however, in
that they serve as checks or indicators by which the pattern  of  system
resource usage can be examined for potential improvements.

5.1  Some Good Numbers

     When variables have the values indicated in this list,  the  system
is  usually  in balance.  Since it is quite possible for one variable to
appear in balance, while others are not,  this  information  is  only  a

     1.  NCOR = 30 PER MINUTE ....  expensive if higher

     2.  NRUN/NBAL = 1 ....  All processes wanting to  run  can  fit  in

     3.  SWPR less than 20% ....  Swap  reads/writes  are  overhead  and
         thus should (ideally) be a small component of disk usage.

     4.  SWPW close to 0 ....  Since this variable represents  processes
         waiting  for  memory when no others can run, utilization of the
         system can normally be increased by adding memory until this is
         a small value.  However, if the load has a large I/O component,
         additional memory may merely shift the CPU idle time from  SWPW
         to FILW.

     5.  SKED = 15% ....  Scheduler overhead detracts from cycles  going
         to user programs.  Programs which do very little work each time
         they are scheduled generally drive  this  value  up.   If  this
         value  is high, then it is important to determine if there is a
         set of applications which could be reprogrammed in order to  do
         more  work  between interactions.  Programs which become active
         as  each  character  is  typed  (like  some  screen  formatting
         software) should be viewed with suspicion.
WATCH                                                            Page 27

     6.  FILW close to 0 ....  This is CPU idle time caused by processes
         waiting on disk I/O to complete.  Often more memory will permit
         larger numbers of programs to be resident and thus absorb  some
         of  the  CPU.   Other  times,  reconfiguring  the  disk  access
         patterns to spread the disk I/O more evenly  across  the  disks
         and channels will lower this value.

     7.  BSWT/NBAL small ....  If a large  proportion  of  processes  in
         memory  are  waiting  on the disk the CPU will not be utilized.
         When this relationship is small, the CPU can be well utilized.

     8.  NREM = 0 ....   Since  this  counts  the  times  when  runnable
         processes are removed from the balance set, performance is best
         when it is zero and can degrade rapidly otherwise.

     9.  FPGS large...  If the number of free pages drops below 50  then
         the  system  will  probably begin spending resources to garbage
         collect more often.  This statistic along with NREM can be used
         to indicate a system overload.

    10.  DMRD+DMWR  less  than  20  per   second   ...    Because   drum
         reads/writes   utilize   a  percentage  of  the  disk  system's
         bandwidth, higher throughput is possible when swapping is  low.
         Normally  swapping  of  less  than 30 pages per second does not
         cause any visable effect.  If the normal load contains a  large
         amount  of user disk I/O, then swapping at rates higher than 20
         will decrease the system  throuput.   If  the  normal  load  is
         mostly interactive or computational, then higher swapping rates
         can be sustained.

5.2  Per Job Computations

     The following are helpful  in  understanding  the  characteristics  of
     various jobs on the system.
     ----LOOK AT EACH JOB----
       If the job HAS A LARGE (100 P) WORKING SET
        THEN .......
             If the jobs wakes up more than once every 3 - 5 seconds
                     o  It may cause thrashing
                     o  It should be smaller during interactions
                     o  It should run longer between interactions
WATCH                                                            Page 28

             If the job is larger than 1/2 user memory
                     o  There may be a mismatch between jobs
                     o  You should check for other large jobs
                     o  Make the job smaller
                     o  Schedule it when the load is light
       If the job WAKES UP FREQUENTLY (more than once every 3 seconds)
        THEN ....
             o This job IS loading the system
             o Understand why it wakes up frequently
                     - high speed terminal doing output ?
                             (Increase terminal buffer size)
                     - Waking on every terminal input character ?
                             (weigh cost versus benefit)
                     - Doing Magtape I/O ?
                             (does not load system)
             o If possible, reduce this wakeup rate
      If there are LARGE DIFFERENCES IN I/O between disk units ?
             o Move I/O bound application to lightly loaded disk.
             o Move PS to 2 pack structure
       If OVERALL DISK RATE is more than
             o 60 pages per second for one channel
             o100 pages per second for two channels
         then another channel normally increases system throughput.
       If "SWPW" is HIGH
        THEN .....
                     o The system may need more memory
                     o The system may need more swapping bandwidth
       If "FILW" is HIGH
        THEN .....

                     o The system needs more I/O bandwidth
                     o Organize the packs to get more out of channels
                     o Consider getting more channels or disk units
WATCH                                                            Page 29


6.1  Running WATCH To Obtain Trend Information

It is recommended that all sites run  WATCH  as  part  of  their  normal
operation.  It can be started under PTYCON in the following manner.

        PTYCON>Define 10 as W
        PTYCON>Connect W
        @login ...........
        @delete WATCH.TXT
         Output to file: WATCH.TXT
         Print monitor statistics ? Y
         Print job summary ? Y
         Time period (MM:SS): 5:00

This will collect a  sample  every  5  minutes.   This  time  period  is
recommended  in  that  it  is  small enough to be able to see most usage
changes, yet long enough  that  an  excessive  amount  of  data  is  not
collected.   In  the  example,  watch data is appended to a history file
which can be rolled off on tape once a week and then  used  manually  or
via software to correlate changes in the workload.

6.2  Running WATCH To Analyze A Workload

Whereas the normal information is adequate for spotting  trends  in  the
workload,  it  is generally does not contain enough detail to be able to
isolate dominant components.

The process of running WATCH to  collect  the  expanded  information  is
sufficiently  expensive  that  it  is  normally  done only when specific
information is being requested in order to  determine  how  to  tune  an
application  or  the  configuration.  When WATCH is collecting "ALL" the
information, it should be run with about a 2 minute interval in order to
be  able  to  see  very  short  term  changes  in the workload.  Shorter
intervals are possible, but WATCH itself may become a more dominent part
of the workload than desired (especially on a 2020).

6.3  Accounting For The "TRAP:" Time

It was  stated  in  the  document  that  normally  the  TRAP:   variable
represents  the  percentage  of USED:  time which is spent handling page
faults.  This allows the installation to bill users  for  this  activity
and  has  the  effect  of  penalizing  users  who  cause  excessive page
faulting.  Such users will find their programs  more  expensive  to  run
when  there are other users on the system than when the system is fairly
WATCH                                                            Page 30

idle.  Over the long run, this has the effect of pushing the larger page
faulting  programs into periods of time when the system would other wise
be dormant in order to capitalize on lower costs.

In some environments, this is not a desired effect.  The site may  elect
to  absorb  the  page  faulting  revenue (up to 10%) in order to provide
users with more reproducible run times.   This  is  often  a  desire  in
service  bureaus.   This can be done at the time the monitor is built by
setting the following flag in the PARAM0.MAC file.


The normal setting for this flag is 1.

[End of Document]



     WATCH is a TOPS-20 data collection tool that can be used to  gather
the  information  necessary  to analyze both system and job performance.
WATCH periodically samples many system variables;   writing  them  in  a
format  which  is  useable  for analysis.  It is advantageous to collect
watch statistics anytime the system is running.   These  statistics  are
often  useful  when  usage  trends  are  being analyzed in order to plan
system growth.  Any user can run WATCH and obtain  most  of  the  system
information  and  some  of  the  job  information.  These statistics are
normally sufficient for determining overall system performance  and  for
spotting  short  and  long  term  usage trends.  Expanded system and job
information is available  for  users  who  are  running  with  WHEEL  or
OPERATOR  privileges  enabled.   This expanded set of statistics provide
the much more detailed information which is often  required  to  observe
and tune the workload or an individual application.

The WATCH output consists of 9 different display sections:

     1.  Heading -- This section contains the date, time, number of jobs
         logged in, and the time interval over which the data sample was

     2.  System  Statistics  --  This  section  contains   system   wide
         statistics  which  reflect the resource utilization of the CPU,
         disk, and memory.

     3.  Load Averages -- Load averages indicate the number of  runnable
         processes over specified intervals.  This section indicates the
         load  average  for  the  system,  for   the   interactive   and
         computational queues, and for each class (when class scheduling
         is in use).

     4.  Directory  Cache  --  A  cache  of  the  most   recently   used
         directories  is  kept  by  the  monitor.  This section displays
         statistics which indicate the usefulness of this cache.

     5.  Normal Per-Job Information -- Per job statistics are  displayed
         which relate the amount of CPU resource distributed to each job
         along with statistics concerning class  utilization  (when  the
         class scheduler is in use).

     6.  Expanded  Per-Job  Information  --  In  addition  to  the   CPU
         information,  this  display presents many statistics which show
         the states in which the job spent time, which  show  how  large
         the job is, and which provide disk and swapping information.
WATCH                                                             Page 2

     7.  System  Utilization  Statistics  --  The   system   utilization
         statistics  includes a summary of the expanded per-job section,
         includes   additional   system   statistics,    and    includes
         computations of several key variables.

     8.  Disk I/O -- Disk I/O statistics are displayed  on  a  per-drive
         basis.   Included  in these statistics are the number of seeks,
         reads, and writes performed by each drive.

     9.  Tune Mode Display -- This display is a single line and contains
         some  of  the  more  revealing  System  Statistics  and summary
         statistics from the system utilization section.  It is a useful
         "quick  and dirty" display for users who are monitoring changes
         in the system load.


     WATCH can be run by  all  users.   Users  with  OPERATOR  or  WHEEL
privileges  enabled can obtain more information than other users.  WATCH
is run by typing either "WATCH" or "R WATCH".   When  WATCH  starts,  it
will identify itself with a message like:

    WATCH 4(3), /H for help.

Information will then be requested in the following order:

    Output to file:
    Print monitor statistics?
    Print job summary ?
    Tune mode?
    Time period (MM:SS):

Some of these requests will not be made  if  previous  answers  indicate
that  the  information  is  unnecessary.   The  following  sections will
illustrate how to answer  these  prompts  to  get  combinations  of  the
displays outlined in the introduction.

2.1  Output File And Interval Time

     WATCH requests the name of the output file  and  the  time  between
samples with the following prompts.

    Output to file:

    Time period (MM:SS):
WATCH                                                             Page 3

When  the  prompt  "Output  to  file:"  is displayed, the user may enter
either a filename (including TTY:  if output to the terminal is desired)
or  a  "/H".   The  /H  will  cause  WATCH  to  display help information
containing short descriptions of each of the  variables  in  the  System
Statistics  section,  the  Load Averages section, and the Normal per-job
section.  It will then request the output file name again.

     The response to the request "Time period (MM:SS):"  may  either  be
answered  with  an  interval  of time (in minutes and seconds) or with a
carriage return.  In the latter case, WATCH will take a sample and write
the output each time the user types an additional carriage return.

2.2  Responses And The Resulting Displays

     This section will show the legal responses to the prompts  and  the
resulting  displays.   It  will  also  provide  some reasons how various
displays might be used.

   Print Monitor Statistics ? YES
   Print Job Statistics ? YES
         All users may request this display which includes:
            1. Heading
            2. System Statistics
            3. Load Averages
            4. Normal Per-Job Statistics
   This is normally used by a site collecting data for analyzing the
   the long term trend in the workload.
   Print Monitor Statistics ? ALL
         Only ENABLED users with WHEEL or OPERATOR privileges may
         request this display which includes:
            1. Heading
            2. System Statistics
            3. Load Averages
            4. Directory Cache Statistics
            5. Extended Per-Job Statistics
            6. System Utilization Statistics
            7. Disk I/O Statistics
        This display is used by a person who is analyzing short term
        changes in a workload or application.
   Print Monitor Statistics ? YES
   Print Job Summary ? NO
WATCH                                                             Page 4

         All users can request this display which includes:
            1. Heading
            2. System Statistics
            3. Load Averages
         This display is often used when WATCH data is being written
         to the TTY and changes in system statistics are being monitored.
   Print Monitor Statistics ? NO
   Print Job Summary ? YES
         All users can request this display which includes:
            1. Heading
            2. Load Averages
            3. Normal Per-Job Statistics
        This display is normally used when a job is being monitored
        on a TTY.
   Print Monitor Statistics ? NO
   Print Job Summary ? NO
   Tune Mode ? NO
         All users can request this display which includes:
            1. Heading
            2. Load Averages
         This display is normally used on a TTY to monitor changes
         in the load average.
   Print Monitor Statistics ? NO
   Print Job Summary ? NO
   Tune Mode ? YES
         Only ENABLED users with WHEEL or OPERATOR privileges may
         request this display which includes:
            1. Tune Mode statistics lines with column headings every 
         This display is used by an analyst engaged in tuning either
         the system or an application.
WATCH                                                             Page 5

2.3  WATCH Operation

     After the user enters the necessary information, WATCH will display
the message:

             WATCH IN OPERATION --

and will then  take  its  first  sample.   This  message  will  be  seen
immediately  after  entering  the  interval  time  unless  the  user  is
requesting displays which require privileges (extended and  tune  mode).
Then  a  10-30  second pause will be detected (depending on system load)
while the SNOOP break points necessary to collect  the  information  are
inserted into the TOPS-20 monitor.

     When the user desires to stop collecting statistics, the  following
procedure should be followed:


At this point,  the  output  file  can  be  printed.   Examples  of  the
information  included  in  the  output  file are used in the rest of the


     The following terms are used throughout this document and should be
interpreted in light of the following definitions.

3.1  EXEC

The TOPS-20 command EXECutive is a program which prompts  the  user  for
and  inteprets  the  TOPS-20  command  language.   A copy of the EXEC is
started in response to the control-c  typed  by  the  user  to  get  the
system's  attention  before  LOGIN.  The EXEC is a process which remains
part of the user's job until after the LOGOUT command is processed.

3.2  Job

A job consists of the  sequence  of  commands  and  programs  which  are
initiated  by  a  user.   It  begins  at the time the user logs into the
system and terminates when that user logs out.  Associated with each job
on  the  system  is  a record of information which is accessed by a "job
number".  This  record  will  contain  the  accounting  statistics,  the
WATCH                                                             Page 6

resource  ownership  information, and the control information associated
with the job.  Because the job number is the means by  which  a  job  is
identified  and  referenced, a term like "job 12" is often used in place
of the more accurate phrase "the job which can be  identified  with  the
number 12".

A user logging in is given an available job number which is used by  the
system  to  keep  track  of  that user.  The job number is from 1 to the
maximum built into the monitor and is recycled when the user  logs  out.
This  maximum  number  of  jobs which can be executing simultaneously is
fixed at the time the monitor is compiled and linked.  At this time  the
memory  required to keep track of the specified number of jobs and their
resources is computed and reserved.  Most often the number of jobs  that
a  specific  monitor  can  handle  has  been  chosen  so that the system
resources are not over committed under normal use.

3.3  Process

The term "process" is used in this document instead of the terms  "task"
and  "fork"  which are sometimes used in other TOPS-20 literature.  This
term refers to a program running in its own virtual  address  space.   A
job  may  consist of one or more processes.  A process is created when a
virtual address space is allocated for the  program  and  is  terminated
when that address space deallocated.

When a job begins, it will consist of one  process,  the  TOPS-20  EXEC.
The EXEC will create additional processes whenever thue user initiates a
function which cannot be performed by the  EXEC.   Thus  when  the  user
types  "RUN  FOOBAR",  a  process  is created, and the program FOOBAR is
started.  When FOOBAR finishes and the user types RESET, the memory  for
the  additional  process will be dealocatted and the process terminated.
Thus it is common for a job to consist of two processes, the EXEC and  a
user program.

One of the capabilities which a program may exercise is  to  create  and
control  and  "inferior"  process (or processes).  An "inferior" process
can be stopped (frozen) and deleted by its creating (superior)  process.
Each  process  created by the EXEC or a user program would each have its
own virtual address spaces, would assign  its  own  devices,  and  would
contain  its  own program.  The maximum number of processes which may be
created within a single job is fixed at the time the monitor is built.

3.4  Active Process

An active process is one which is competing for usage of  the  CPU,  the
disk,  or  a  resource for which the wait time is reasonably short (less
than 100 milleseconds).  Active processes are often spoken of  as  being
"runnable".   Processes  which are not active are waiting on events like
WATCH                                                             Page 7

terminal input, ENQ/DEQ locks, and  other  unpredictable  events.   Such
processes are sometimes said to be "blocked".

Normally, when the EXEC creates a process, loads  a  program  into  that
processes  address  space  and  passes  control to that process, it (the
exec) waits until the program  completes.   Thus  even  though  the  job
consists  of  two  processes,  only  one  at  a  time  is  competing for
resources.  This is not always  the  case.   Typing  a  CONTROL-T  often
causes  the EXEC and the user program to compete for the CPU at the same
time.  In analyzing the impact of a complex job with several  processes,
it  is  important  to  determine  if the job consists of processes which
compete for resources simultaneously.   The  expanded  user  information
will  help you determine the impact of jobs with competing processes.  A
job which consists of several competing  processes  can  place  as  much
demand  on the system as an equivalent number of jobs with non-competing

3.5  Balance Set

All active processes  in  the  system  are  on  the  "go  list".   These
processes  need to be in memory in order to use the CPU, or to terminate
their short term wait state.  The subset of active  processes  that  are
scheduled  to  be  simultaneously  in memory form the BALANCE SET.  Once
selected for the balance set, the  working  set  manager  (part  of  the
TOPS-20  scheduler)  will  try  to  find room for a process in memory by
swapping out processes that are either no longer in the balance  set  or
which are of lower priority than the incoming one.

3.6  Swapping

The term "swapping" is used to describe the disk I/O which is  initiated
by  monitor  as  part  of  its  memory  management service.  Swapping is
contrasted to user initiated disk I/O which is  performed  in  order  to
move data to and from application programs.

3.7  Page Fault

When a user program or the monitor references a virtual address which is
not  currently  allocated  a place in physical memory, a page fault will
occur.  The monitor will resolve this page fault by finding the  desired
page  and  making it part of the processes address space.  The procedure
for finding the page may require that the monitor bring the page in from
the  disk.   However, if the page is already in memory, the monitor will
not perform a disk read, but will merely  make  the  page  part  of  the
process's  address  space.   This  will  occur  if  the  page  is on the
replacable queue (available to be used for some  other  purpose)  or  is
WATCH                                                             Page 8

being used by another process.


4.1  The Heading Section

The heading section  is  mostly  a  time  stamp  for  other  information
displayed.   It  is  in all displays except for the "tune mode" display.
The following is an example:

        SUMMARY at 10-May-80 11:29:26
          for an interval of   2:01.5    with 64  active jobs.

The first line contains the date and time when  the  sample  was  taken.
The  second  line includes the time interval between this sample and the
previous one.  If the user had specified a time  interval  (rather  than
merely  entering  a  carriage  return), then the interval here should be
nearly equal to the time specified.  Under  a  heavy  load,  the  actual
interval may be longer.

     In this example, the interval was 2 minutes, 1.5 seconds and  there
were 64 users logged in.

4.2  System Statistics Section

     Example from the system statistics section.

 USED:  87.6   IDLE:   0.0   SWPW:   0.0   SKED:  10.0
 SUSE:  80.0   TCOR:   0.1   FILW:   0.8   BGND:   1.8
 NTRP:  22.3   NCOR:  2.02   AJBL: 58.44   NREM:     0
 TRAP:   3.2   NRUN:   5.7   NBAL:   5.7   NWSM:  61.0
 BSWT:   2.8   DSKR:  23.0   DSKW:   4.8   SWPR:   6.4
 NLOD: 16.12   CTXS:  64.1   UPGS: 1992.   FPGS:  147.
 DMRD:   4.1   DMWR:   4.7   DKRD:  10.7   DKWR:   4.3
 TTIN:  11.0   TTOU:  401.   WAKE:  18.3   TTCC:  1.51
 TDIO:  23.8   RPQS:   4.4   GCCW:   6.4   XGCW:     0
 KNOB:    11      
 QUEUE DISTRIBUTION PERCENTAGE: 0.19  20.48  21.28  11.04  2.69  31.74

     The statistics in this section are expressed either as  percentages
(%),  as  averages  (AV),  or as rates (Px).  The rates are in units per
minute (PM) or in units per second (PS).

     The  display  includes  "CPU  usage  statistics"  from  which   the
distribution   of  the  CPU  resource  can  be  determined.   The  usage
statistics are all percentages based on  the  wall  clock  time  of  the
WATCH                                                             Page 9

interval.  They should sum to 100% (+/- roundoff).

                     CPU Usage Statistics


The  description  of  each  of  the  variables  in  the example follows.
Following the example, they are presented line by line in left to  right

  USED:  (%)  --  Percentage  of  interval  during  which  the  CPU  was
       executing  instructions  on  behalf  of some user.  This includes
       user, JSYS, page fault, and interrupt processing.

  IDLE:  (%) -- Percentage of the interval during which the CPU was idle
       because  there  were  no active processes on the system.  If this
       number  is  non-zero,  then  the  system  can  accommodate   some