PDP-10 Archive: 43,50523/his0.do from decuslib10-10

Trailing-Edge - PDP-10 Archives - decuslib10-10 - 43,50523/his0.do
There are no other files named his0.do in the archive.


	            RANDOM SAMPLING WITHOUT REPLACEMENT

	                J. Teuhola and O. Nevalainen
	                         11.6.1981
	            Department of mathematical sciences
	                     University of Turku
	                     SF-20500 TURKU 50
	                          Finland





	0.1  The Use Of The Subroutine

	     CALL HTRY (M, N, IA)
	          M is the wanted sample size (input)
	          N is the population size (input)
	          IA is an arrray to contain the sample (output)
	     the subtoutine selects M different random integers  out
	of  the  set [1..N] and stores them in a random order in the
	array IA.   The  calling  program  must  contain  a  storage
	definition statement for IA:  DIMENSION IA(M).



	0.2  Method

	     The so-called hashed try-again technique /1/  is  used.
	Here  the sample is found iteratively by generating a pseudo
	random integer from the interval from 1 to  N  and  checking
	whether it has not yet been selected.  If this is true it is
	moved to the final sample.  The same  is  repeated  until  M
	different  numbers  are  found.  To speed up the algorithm a
	hash table of the selected integers  is  maintained  in  the
	array  IA.   By using the hash technique one can in expected
	0(1)-time test whether or not an integer has previously been
	selected.  For M > n/2 the processing time is still improved
	by first selecting those N-M integers which are to  be  left
	out from the final sample.



	0.3  Efficiency Of The Sampling

	     Because except program code, only an array of the  size
	M  is  required, the sampling program is useful when N is so
	large that an array of size N would cause a spillover of the
	storage  in  use.   The object code of HTRY and INSERT is in
	total ca.  580 words.
	     The expected running time of HTRY is 0(M)  whereas  its
	upper  limit  is  unbounded.   Because  of the complementary
	sampling for M > N/2 the curve of the observed running  time
	is decreasing for such M values.  For n=1000 the sampling of
	M=200, 400, 600, 800, and 1000 random  numbers  demands  ca.
	RANDOM SAMPLING WITHOUT REPLACEMENT                   Page 2


	0.25, 0.60, 0.62, 0.50, and 0.45 seconds.



	0.4  Organizatio Of The Program

	     the user calls the HTRY subroutine, which in turn calls
	two subroutines:
	     INSERT a subroutine to search for and possibly insert a
	          key KSI in the hash table IA,
	     RAN     a FORTRAN  library  function  which  returns  a
	          pseudo-random number from the interval [0,1).



	          0.5  Restrictions And Modifications

	               HTRY does not check the validity of  M  (i.e.
	          1 <= M <= N).
	               If  the  random  order  of  numbers  is   not
	          essential  one  can  make  the program shorter and
	          faster by ruling out the statements which  finally
	          permute the IA-array.
	               If the case M > n/2 does not occur very often
	          a  shorter version of HTRY is achieved by removing
	          the statements for the comlemetary sampling.



	          0.6  Reference


	          /1/    J.   Teuhola  and  O.    Nevalainen:    Two
	          efficient  algorithms  for random sampling without
	          replacement, to  appear  in  Int.   J.   of  Comp.
	          Math., 1981.