Trailing-Edge
-
PDP-10 Archives
-
decuslib10-10
-
43,50523/his0.do
There are no other files named his0.do in the archive.
RANDOM SAMPLING WITHOUT REPLACEMENT
J. Teuhola and O. Nevalainen
11.6.1981
Department of mathematical sciences
University of Turku
SF-20500 TURKU 50
Finland
0.1 The Use Of The Subroutine
CALL HTRY (M, N, IA)
M is the wanted sample size (input)
N is the population size (input)
IA is an arrray to contain the sample (output)
the subtoutine selects M different random integers out
of the set [1..N] and stores them in a random order in the
array IA. The calling program must contain a storage
definition statement for IA: DIMENSION IA(M).
0.2 Method
The so-called hashed try-again technique /1/ is used.
Here the sample is found iteratively by generating a pseudo
random integer from the interval from 1 to N and checking
whether it has not yet been selected. If this is true it is
moved to the final sample. The same is repeated until M
different numbers are found. To speed up the algorithm a
hash table of the selected integers is maintained in the
array IA. By using the hash technique one can in expected
0(1)-time test whether or not an integer has previously been
selected. For M > n/2 the processing time is still improved
by first selecting those N-M integers which are to be left
out from the final sample.
0.3 Efficiency Of The Sampling
Because except program code, only an array of the size
M is required, the sampling program is useful when N is so
large that an array of size N would cause a spillover of the
storage in use. The object code of HTRY and INSERT is in
total ca. 580 words.
The expected running time of HTRY is 0(M) whereas its
upper limit is unbounded. Because of the complementary
sampling for M > N/2 the curve of the observed running time
is decreasing for such M values. For n=1000 the sampling of
M=200, 400, 600, 800, and 1000 random numbers demands ca.
RANDOM SAMPLING WITHOUT REPLACEMENT Page 2
0.25, 0.60, 0.62, 0.50, and 0.45 seconds.
0.4 Organizatio Of The Program
the user calls the HTRY subroutine, which in turn calls
two subroutines:
INSERT a subroutine to search for and possibly insert a
key KSI in the hash table IA,
RAN a FORTRAN library function which returns a
pseudo-random number from the interval [0,1).
0.5 Restrictions And Modifications
HTRY does not check the validity of M (i.e.
1 <= M <= N).
If the random order of numbers is not
essential one can make the program shorter and
faster by ruling out the statements which finally
permute the IA-array.
If the case M > n/2 does not occur very often
a shorter version of HTRY is achieved by removing
the statements for the comlemetary sampling.
0.6 Reference
/1/ J. Teuhola and O. Nevalainen: Two
efficient algorithms for random sampling without
replacement, to appear in Int. J. of Comp.
Math., 1981.