 Web pdp-10.trailing-edge.com

Trailing-Edge - PDP-10 Archives - decuslib10-10 - 43,50523/his0.do
There are no other files named his0.do in the archive.

RANDOM SAMPLING WITHOUT REPLACEMENT

J. Teuhola and O. Nevalainen
11.6.1981
Department of mathematical sciences
University of Turku
SF-20500 TURKU 50
Finland

0.1  The Use Of The Subroutine

CALL HTRY (M, N, IA)
M is the wanted sample size (input)
N is the population size (input)
IA is an arrray to contain the sample (output)
the subtoutine selects M different random integers  out
of  the  set [1..N] and stores them in a random order in the
array IA.   The  calling  program  must  contain  a  storage
definition statement for IA:  DIMENSION IA(M).

0.2  Method

The so-called hashed try-again technique /1/  is  used.
Here  the sample is found iteratively by generating a pseudo
random integer from the interval from 1 to  N  and  checking
whether it has not yet been selected.  If this is true it is
moved to the final sample.  The same  is  repeated  until  M
different  numbers  are  found.  To speed up the algorithm a
hash table of the selected integers  is  maintained  in  the
array  IA.   By using the hash technique one can in expected
0(1)-time test whether or not an integer has previously been
selected.  For M > n/2 the processing time is still improved
by first selecting those N-M integers which are to  be  left
out from the final sample.

0.3  Efficiency Of The Sampling

Because except program code, only an array of the  size
M  is  required, the sampling program is useful when N is so
large that an array of size N would cause a spillover of the
storage  in  use.   The object code of HTRY and INSERT is in
total ca.  580 words.
The expected running time of HTRY is 0(M)  whereas  its
upper  limit  is  unbounded.   Because  of the complementary
sampling for M > N/2 the curve of the observed running  time
is decreasing for such M values.  For n=1000 the sampling of
M=200, 400, 600, 800, and 1000 random  numbers  demands  ca.
RANDOM SAMPLING WITHOUT REPLACEMENT                   Page 2

0.25, 0.60, 0.62, 0.50, and 0.45 seconds.

0.4  Organizatio Of The Program

the user calls the HTRY subroutine, which in turn calls
two subroutines:
INSERT a subroutine to search for and possibly insert a
key KSI in the hash table IA,
RAN     a FORTRAN  library  function  which  returns  a
pseudo-random number from the interval [0,1).

0.5  Restrictions And Modifications

HTRY does not check the validity of  M  (i.e.
1 <= M <= N).
If  the  random  order  of  numbers  is   not
essential  one  can  make  the program shorter and
faster by ruling out the statements which  finally
permute the IA-array.
If the case M > n/2 does not occur very often
a  shorter version of HTRY is achieved by removing
the statements for the comlemetary sampling.

0.6  Reference

/1/    J.   Teuhola  and  O.    Nevalainen:    Two
efficient  algorithms  for random sampling without
replacement, to  appear  in  Int.   J.   of  Comp.
Math., 1981.