Trailing-Edge
-
PDP-10 Archives
-
SRI_NIC_PERM_SRC_3_19910112
-
stanford/5-swskit/performance.memos
There are 5 other files named performance.memos in the archive. Click here to see a list.
Definitions of terms used in the Report from WATCH
System Statistics:
USED USED is the percentage of the interval during which CPU was
actually executing code on behalf of user programs. This
includes CPU time spent in instruction execution within the
user program and CPU time spent in EXEC mode executing JSYS
calls. This value is kept by the system and includes time
spent by jobs that logged out during the interval.
IDLE IDLE is the percentage of the interval during which the CPU
had nothing to do. This represents the amount of time during
which there were absolutely no jobs demanding to be run by the
CPU. If this number is non-zero, the system is being under
utilized.
SWPW SWPW is the percentage of the interval during which the CPU
was not able to run any jobs because the pages needed by the
runnable jobs were not in main memory. The term "SWPW" is
somewhat misleading since it does not distinguish between
waits caused by pages being read in from a disk file or by
pages being read in from the "swapping space". A rough break
down of the time spent waiting for "Disk" reads versus "Swap"
reads can be obtained by comparing the sum of DKRD + DKWR to
the sum of DMRD + DMWR.
SKED SKED is the percentage of the interval during which the
TOPS-20 scheduler was running.
SUSE SUSE is the sum of the run time percentages accumulated by
each job listed in the report. This value differs from "USED"
only by the skew that builds up during the time that FRIDAY
takes to collect all of the data about each job, and by the
loss of data from jobs that logged out during the interval.
NCOR NCOR is the average number of memory garbage collections
performed by the monitor each minute during the last interval.
AJBL AJBL is the average number of times per minute the system was
forced to adjust the balance set during the last interval.
NREM NREM is the average number of times per minute the monitor had
to remove a process from the balance set before the process
was ready to be removed. This number becomes non-zero
whenever there are more jobs wanting to be run by the system
than there is room in memory. Whenever this situation arises,
the monitor must force runnable jobs to be removed from the
balance set and subsequently swapped out to make room for
other runnable jobs to be swapped into memory. In general,
whenever this number goes non-zero, response time becomes
longer.
Page 2
TRAP TRAP is the percentage of the interval during which the CPU
was responding to page faults. This time is normally charged
to the user, and is therefore also part of "USED" time.
However, it is possible to make the system not include page
fault time as a part of the users run time. In this case,
"TRAP" time is not included in "USED" time.
NRUN NRUN is the average number of processes demanding to be run at
any point in time during the interval. This number represents
the CPU load on the system during the interval. When NRUN is
greater than 1.0, the user programs running during the
interval experience an average elongation of execution time
(as compared to execution time measured on a stand alone
system) of at least "NRUN" times longer.
NBAL NBAL is the average number of processes in the Balance Set
during the interval. If this number is less than NRUN by a
significant amount, it usually implies that there isn't enough
memory to hold all runnable processes.
BSWT BSWT is the average number of processes in the Balance Set
that are waiting for a page to be read into main memory. If
NBAL - BSWT is less than one, then there are not enough
runnable processes in memory to keep the CPU busy 100% of the
time. In this case, SWPW will increase.
DSKR DSKR is the percentage of the processes in Balance Set Wait
state, "BSWT", that are waiting for a file page to be read
into memory.
DSKW DSKW is the percentage of the processes in Balance Set Wait
state, "BSWT", that are waiting for file pages to be written
back to the disk.
SWPR SWPR is the percentage of the processes in the Balance Set
Wait state, "BSWT", that are waiting for a page to be swapped
into memory from the swapping area.
UPGS UPGS is the sum of the average number of working set pages for
each job in the balance set.
BGND BGND is the percentage of the interval during which the
monitor was doing background tasks. The primary background
task represented by this number is the task of moving terminal
input characters from a system wide buffer to the individual
terminal input buffers. This also includes the time it takes
to echo terminal input characters.
TCOR TCOR is the percentage of the interval that the monitor spent
garbage collecting memory. The garbage collection process
requires the monitor to look at the age of each page in memory
to see which ones haven't been referenced in a while. The
least recently used pages become the prime candidates for
being swapped out.
Page 3
CTXS CTXS is the average number of context switches performed by
the scheduler per second. A context switch is made whenever
the scheduler decides to stop running one process and start
running another process. This happens when the running
process voluntarily blocks, or it faults on a page that is not
in memory, or when a higher priority process is ready to run.
Since it takes CPU time to perform a context switch, CTXS
directly affects SKED.
FPGS FPGS is the average number of physical memory pages that are
currently available for swapping in user processes. The
monitor normally keeps between 20 and 100 pages on this queue.
The monitor uses these pages (and the rest of memory not in
use by Balance Set processes) as a page cache. For example,
if a process reenters the balance set after waking up from a
blocked state and it still has some of its pages in memory in
the free page pool, then those pages are used directly without
requiring any disk I/O. It has been demonstrated that this
cache plays an important part in overall system performance.
Therefore, if FPGS is very small, the system performance has
most likely been degraded.
DMRD DMRD is the number of reads per second that are made to the
swapping area.
DMWR DMWR is the number of writes per second made to the swapping
area.
DKRD DKRD is the number of reads per second made to the file
system.
DKWR DKWR is the number of writes per second made to the file
system.
TTIN TTIN is the number of terminal input characters received per
second from all terminals on the system.
TTOU TTOU is the average number of terminal characters output per
second by all jobs on the system.
WAKE WAKE is the number of process wakes per second. The types of
wakes that fall into this category are:
IPCF
ENQ
Terminal Input
Terminal Output
Process Termination
DISMS
TIMER
IIC
TTCC TTCC is the number of terminal interrupt characters (e.g.,
control-C) typed per second.
TDIO TDIO is the aggregate number of disk pages read or written per
Page 4
second to both the file system area and to the swapping area.
The maximum rate for a single channel system is about 60 pages
per second. The maximum rate for a two channel system is
about 100 pages per second.
Queue Distribution Percentage
The TOPS-20 Monitor has five scheduler queues. All runnable
jobs are divided among these queues. The first queue is only
used by Job 0 and by jobs in the special high priority
category. Normally the percentage of run time accumulated in
this queue is small.
The second and third queues are the interactive queues. If
the sum of these two values is high, then there is a lot of
interactive load on the system.
The last two queues are the computational queues. Processes
only move onto these queues if they have entered a compute
bound phase. If the sum of these two values is high, then the
system load is primarily computational.
Load Averages
The system keeps three exponential load averages. These
values represent the average load over the last 1 minute, the
last 5 minutes, and the last 15 minutes. These numbers can be
used to estimate the expected elongation of the elapsed time
required to run a program. If the system load average equals
X, then the approximate elapsed time required to run an
additional program on the system is (1+X)*Y, where Y is the
stand alone elapsed time required to run this program.
By comparing the three load averages, it can be determined
whether the system load is rising or falling.
High Queue Averages
These values are the components of each of the load average
values that are attributable to interactive jobs.
Low Queue Averages
These values are the components of each of the load average
values that are attributable to compute bound jobs. The sum
of the high queue average and the low queue average equals the
load average.
Job Statistics
JOB JOB is the job number.
TTY TTY is the number of the terminal that is being used by the
user running this job. "DET" means that the job is not
attached to a controlling terminal.
Page 5
USER USER is the name of the user who is running this job.
PROGRAM This is the name of the program being run by the user of this
job. Please note that the program name is obtained at the end
of the interval. Sometimes the data gathered during the
interval reflects the behavior of a program that was running
just prior to the start up of the current program.
%RT This is the percentage of the CPU that this job received
during the interval. The sum of the "%RT" values for all jobs
equals "SUSE".
DEMD This is the percentage of the interval that the job wanted to
run. If a job has a "DEMD" of 100, that job is a compute
bound job that is demanding to be run continuously. Since it
takes some time for FRIDAY to snap shot the data, the "DEMD"
values may be slightly inaccurate, but the average over
several intervals will be accurate.
USED USED is the percentage of DEMD time that this job spent
executing.
GRDY GRDY is the percentage of DEMD time that this job was runnable
but could not fit in the balance set. A job must be in the
balance set before it will be chosen by the scheduler to run.
The most common cause for jobs to be on this list is that
there is not enough memory to hold all runnable jobs.
BRDY BRDY is the percentage of DEMD time that this job was in the
balance set but was not being run. Usually jobs in this state
are waiting for their turn to use the CPU.
SWPR SWPR is the percentage of DEMD time that this job spent
waiting for pages to be swapped into memory from the swapping
area.
DSKR DSKR is the percentage of DEMD time that this job spent
waiting for file pages to be read into memory from the disk.
DSKW DSKW is the percentage of DEMD time that this job spent
waiting for file pages to be written to the disk.
SWPI SWPI is the percentage of DEMD time that this job spent
waiting for its Process Storage Blocks (PSB), page table, and
Job Storage Block (JSB) to be swapped in. The Process Storage
Blocks contain the definition of the job's working set and the
job's PC. These special pages must all be in memory before
the job can be run.
UDWT UDWT is the percentage of DEMD time that this job spent
waiting for updates to files, directories, Index Blocks, or
the BIT table to be completed. Time accumulates in this area
whenever a job requests that a specific file system action
finish to completion. For example, closing a file that was
open for writing requires that all pages of the file be
Page 6
written onto the disk and that the file name appears in the
directory before the close function is complete. While the
job is waiting for these writes to complete, the job is in the
"UDWT" state.
RPQW RPQW is the percentage of DEMD time that this job spent
waiting for a physical memory page to become available for
swapping into. Usually if time is accumulating here, there is
a shortage of memory on the system.
OTHR This last category is the percentage of DEMD time that this
job spent in any of the other wait states.
The sum of USED + GRDY + BRDY + SWPR + DSKR + DSKW + SWPI +
UDWT + RPQW + OTHR for each job is equal to 100% of the DEMD
time.
RESP RESP is the average response time in seconds over the
interval. The time for each response is defined as the
elapsed time from when an event for which the job is waiting
has completed, to the time that the job goes back into a wait
state after having responded to the event. Responses that
require more than 2 seconds of CPU time time finish are not
counted in this column.
NRSP NRSP is the number of responses that the job had during the
interval.
SR SR is the "Stretch Ratio" for each response. The stretch
ratio is obtained by dividing the elapsed time of each
response by the compute time required to satisfy each
response.
WSS WSS is the average number of pages in the job's working set.
WSS is the sum of the working set sizes of each process within
the job.
The job working set size is obtained by summing the
integration of the instantaneous process working set size
(FKNR) over all time that the process is not blocked for all
processes in the job. The average process size is the
integral divided by the accumulated time, i.e.
WSS = SUM all processes (integral FKNR dt / T not blocked)
UPGS UPGS is the average number of pages actually in memory when
the job is running. Since the system "demand" pages in each
page one at a time, this number is usually slightly less than
"WSS".
The job used pages is obtained by integrating the
instantaneous process assigned pages (FKWSP) over all time
that the fork is in the balance set. The average used pages
is then the integral divided by the accumulated time, i.e.
Page 7
UPGS = SUM all processes (integral FKWSP dt / T in balset)
SWPR SWPR is the number of times that the process had to wait for a
page to be read in from the swapping area. This may be more
than the actual number of pages swapped in.
DSKR DSKR is the number of times that the process had to wait for a
page to be read in from the file system. If the process is
doing prefaulting, this number may be less than the actual
number of pages read.
TPF TPF is the average number of milliseconds that it took to
satisfy each page fault for this job during the interval.
IFA IFA is the "Inter-Fault Average". This value represents the
average compute time in milliseconds between page faults for
this job. A large "IFA" means that the working set of this
job is very stable.
System Summary Values
DEMD The summary value for the DEMD column is the sum of each item
in the column. This represents the total demand put on the
system over the interval.
USED - OTHR
These values represent the average values over all jobs on the
system.
RESP The summary value for the RESP column is the average response
time of all responses in the interval.
NRSP The summary value for the NRSP column is the total number of
responses received during the interval.
SR The summary value for the SR column is the average stretch
ratio for the jobs on the system during the interval.
WSS The arithmetic sum of the working sets of all jobs active
during the interval is computed and reported in the system
summary WSS column. This represents the total number of pages
used during the interval.
SYS WSS = SUM all jobs (JOB WSS)
UPGS The summary value of the UPGS column is the sum of the working
set page integrals for all jobs divided by the interval time,
i.e.
SYS UPGS = SUM all jobs (integral FKWSP dt) / T interval
Page 8
SWPR The SWPR column summary is the total number of swap reads done
by the jobs on the system. This does not include the swap
reads required to swap in PSB pages.
DSKR The DSKR column summary is the total number of disk reads done
by the jobs on the system.
TPF The TPF column summary is the average time required to fault
in a page.
IFA The IFA column summary is the average time between page faults
for all jobs on the system.
TOTRC TOTRC represents the total number of physical memory pages
available after the resident monitor is locked down.
LOKPGS LOKPGS is the current number of pages locked down other than
the resident monitor pages. Out of this set of pages comes
the terminal buffers, mag tape buffers, line printer buffers,
and other pages locked down during certain file system
operations.
AVAIL MEM
This is the difference between "TOTRC" and "LOKPGS". This is
the actual number of pages available for use by user programs.
SHR PGS This is the average number of physical memory pages being
shared by more than one process.
NRUN MIN, MAX
These values are the minimum and maximum of "NRUN" over the
interval.
SUMNR MIN, MAX
These values are the minimum and maximum number of pages in
the balance set during the interval.
SYS MEM DMD
The system average memory demand is the sum of the NR
integrals for all jobs divided by the interval time, i.e.
SYS MEM DMD = SUM all jobs (integral FKNR dt) / T interval
SWAP RATIO
The Swap Ratio is the system WSS divided by the amount of
available main memory. This represents the amount by which
main memory would have to be increased to avoid any swapping
during the interval.
SWAP RATIO = SYS WSS / AVAIL MEM
ACTIVE SWAP RATIO
The active swap ratio is the system average core demand
divided by the amount of available main memory. This
represents the amount by which main memory would have to be
Page 9
increased to hold all jobs wanting to run simultaneously.
ACTIVE SWAP RATIO = SYS MEM DMD / AVAIL MEM
MEM UTILIZATION
The core utilization if the system used pages divided by the
amount of available main memory. For active swap ratios
greater than 1, this indicates how well the monitor is doing
in keeping core used.
MEM UTILIZATION = SYS UPGS / AVAIL MEM
Page 19
How to Use Friday to Analyze System Performance
1. Look at the overall disk I/O rate (TDIO)
For a 1 channel system the maximum rate is about 60
pages per second. For a 2 channel system the maximum
rate is about 100 pages per second. If "TDIO" is at or
near the maximum, the disk bandwidth may be a system
bottleneck.
2. Look at the number of reads and writes to each disk unit.
If most of the reads and writes are to the PS:
structure, consider going to a two pack PS:.
If PS: is already a multiple pack structure, check to
see if there are some I/O bound applications that can be
moved onto a lightly loaded structure.
3. Look at "WSS" for each job
If there is a job with a working set greater than 100
pages, this job may be causing thrashing. If the job is
compute bound ("NRSP"=0), then look to see if there are
other large jobs on the system with this job. Only one
job should be greater than 1/2 of available memory
(AVAIL MEM). If more than one job is greater than 1/2
of available memory, then schedule these jobs to run
separately.
4. Look for highly interactive jobs.
Scan the "NRSP" column looking for jobs that wake up
frequently (more often than once every 3 seconds). For
any job that meets this criteria, look to see if that
job is also large. A large interactive job puts a very
heavy load on all of the system resources.
Try to understand why a job is waking up frequently. If
the rate at which it is waking can be reduced by
modifying the program, this would improve system
throughput.
Page 11
SUMMARY at 10-Nov-78 15:11:11
for an interval of 2:00 with 67 active jobs.
USED: 80.5 IDLE: 0.0 SWPW: 1.4 SKED: 15.7
SUSE: 77.8 NCOR: 1.51 AJBL: 58.71 NREM: 0
TRAP: 17.9 NRUN: 8.6 NBAL: 8.7 BSWT: 1.1
DSKR: 31.6 DSKW: 11.5 SWPR: 41.8 UPGS: 576.
BGND: 2.6 TCOR: 0.7 CTXS: 49.1 FPGS: 362.
DMRD: 9.4 DMWR: 7.0 DKRD: 5.8 DKWR: 4.5
TTIN: 7.6 TTOU: 516. WAKE: 11.0 TTCC: 3.51
TDIO: 26.7
QUEUE DISTRIBUTION PERCENTAGE: 0.08 12.72 17.40 2.11 48.05
LOAD AVERAGES: 8.84 8.52 7.95
HIGH QUEUE AVERAGES: 3.05 2.97 3.17
LOW QUEUE AVERAGES: 5.79 5.55 4.78
JOB TTY USER PROGRAM %RT DEMD USED GRDY BRDY SWPR DSKR DSKW SWPI UDWT RPQW OTHR RESP NRSP SR WSS UPGS SWPR DSKR TPF IFA
0 DET OPERATOR SYSJOB 0.4 3.3 16.9 22.3 31.9 14.5 8.0 6.5 0.12 34 6 12.7 9.6 31 11 43 15
2 214 OPERATOR OPLEAS 0.1 0.6 14.1 4.5 66.6 14.8 0.13 5 7 10.2 7.9 11 0 40 8
3 207 OPERATOR ORION 0.0 0.5 4.8 2.2 50.8 26.2 16.0 0.30 2 21 19.8 5.0 5 2 65 4
4 217 OPERATOR QUASAR 0.1 1.6 11.7 4.3 58.4 18.9 6.6 0.24 8 9 28.8 15.9 25 3 52 8
7 6 R.ACE MD 3.6 32.4 14.0 16.5 30.0 18.9 6.3 2.0 6.2 6.1 0.28 198 7 29.6 27.3 223 118 55 15
11 115 ACARLSON PTYCON 0.1 1.0 19.0 32.9 19.5 22.9 5.7 0.04 29 5 13.0 12.2 6 5 47 21
13 220 OPERATOR FAL20 0.1 1.5 23.6 76.4 0.01 141 4 9.6 8.6 0 0
15 DET OPERATOR PERF 1.1 6.7 20.1 6.5 13.8 59.7 0.62 13 5 43.4 40.9 0 24 46 67
17 211 OPERATOR LPTSPL 0.0 0.3 7.7 54.3 38.0 0.20 2 13 12.6 4.3 8 0 27 3
18 212 OPERATOR LPTSPL 0.0 0.6 11.2 1.1 69.4 18.3 0.14 5 9 15.8 6.2 16 0 30 4
19 213 OPERATOR BATCON 0.1 1.6 14.5 10.0 61.4 7.0 7.1 0.19 10 7 25.0 20.9 36 3 33 7
21 37 PTAYLOR EXEC 0.1 0.1 51.5 48.5 0.01 12 2 16.0 15.8 0 0
22 7 R.ACE MOUNT 0.0 0.3 15.9 3.2 69.4 11.6 0.09 4 6 9.0 6.3 7 0 34 7
27 132 LCAMPBELL DMM 0.2 1.3 19.1 22.1 42.7 13.6 2.6 0.12 13 5 16.0 14.0 22 0 30 13
29 42 HURLEY FRIDAY 1.8 5.1 37.7 22.8 35.1 3.2 1.1 62.4 61.8 56 4 38 38
30 30 LEACHE CROCK 0.7 24.6 3.4 96.6 0.33 90 30 8.0 6.0 0 0
36 20 ELFSTROM EXEC 0.1 2.3 18.0 55.8 13.7 12.5 0.1 0.15 12 4 21.8 19.3 4 5 79 54
37 DET OSMAN EXEC 10.4 189.9 8.0 85.0 7.0 0.49 383114 4.0 2.2 0 0
39 121 BLOOD EXEC 2.2 9.0 27.4 32.5 14.0 19.3 4.1 1.7 0.5 0.5 0.20 19 8 31.4 29.9 33 43 47 38
42 225 BERKOWITZ MACRO 9.8 102.5 10.0 87.2 2.8 0.1 66.0 60.1 10 2 2901022
43 117 1EIBEN EXEC 0.3 2.1 19.3 22.4 51.9 1.7 1.5 3.1 0.09 29 5 8.4 6.2 21 1 60 21
44 73 GRANT EXEC 0.6 2.7 25.4 5.1 19.5 40.4 7.2 2.1 0.3 0.12 29 4 16.0 14.6 16 24 48 20
45 222 MURPHY MACRO 10.7 99.0 10.8 85.0 1.1 3.1 78.7 75.7 17 5 225 581
53 107 LYONS MM 0.2 0.4 23.7 76.3 0.02 24 4 18.0 18.0 0 0
56 231 ACARLSON CREF 10.8 87.5 13.9 63.4 2.8 16.2 3.5 0.0 0.2 0.0 0.15 11 4 62.1 53.5 40 171 94 68
58 110 BELANGER LIFE 7.1 39.1 18.9 81.1 0.51 93 5 18.4 15.4 0 0
60 67 DBELL FILDDT 0.7 8.5 14.6 7.8 74.7 2.1 0.8 0.44 23 7 47.7 46.6 162 3 47 8
63 150 CRUGNOLA EDIT 0.0 0.1 50.4 49.6 0.13 1 2 14.1 13.0 3 0 21 21
65 15 MILLER SYSTAT 1.6 9.3 20.9 8.5 62.9 5.3 1.2 1.3 0.44 24 5 44.4 41.5 140 10 50 15
67 DET OSMAN SDDT 10.2 254.4 5.6 94.4 0.0 19.14 14*** 4.0 0.0 0 0
71 46 OSMAN TV 4.1 21.8 21.9 53.6 9.5 7.8 6.3 0.3 0.5 0.1 0.15 190 5 38.5 34.5 61 47 41 53
74 103 HARAMUNDANISMM 0.0 0.4 7.2 0.6 48.2 23.8 20.1 0.24 2 14 8.5 5.8 7 2 38 3
76 70 TEEGARDEN EXEC 0.2 0.3 20.7 79.3 0.02 24 5 12.0 12.0 0 0
System summary 910.7 10.2 78.4 4.5 3.4 1.3 0.2 0.3 1.7 0.47 1444 18 826. 230.5 960 483 59 77
Page 12
TOTRC: 1483 LOKPGS: 94 SHR PGS: 217 AVAIL MEM: 1389
NRUN MIN,MAX: 5 17
SUMNR MIN,MAX: 479 820
SYS MEM DMD = 265.3
SWAP RATIO (SUM WSS / AV MEM) = 0.59
ACTIVE SWAP RATIO (DMD/AVMEM) = 0.19
MEM UTILIZATION ((UPGS+SHRPGS)/AVMEM) = 0.32
AV WS SIZE = 29.13
AV CPU TIME (MS) PER INTERACTION = 25.60
THINK TIME (SEC) PER INTERACTION = 2.02
DISK I/O
CHN,UNIT SEEKS READS WRITES
0,6 10 18 25 LANG #0
0,7 828 822 637 PS #1
1,0 737 854 595 PS #0
1,1 1 REL3 #0
1,2
2,3 17 21 8 MISC #0
2,4 91 111 115 SNARK #0
2,5 1