Trailing-Edge
-
PDP-10 Archives
-
decus_20tap5_198111
-
decus/20-0137/colchi/colchi.doc
There are 2 other files named colchi.doc in the archive. Click here to see a list.
WESTERN MICHIGAN UNIVERSITY
COMPUTER CENTER
LIBRARY PROGRAM #1.1.3
CALLING NAME: COLCHI
PREPARED BY: SAM ANEMA
PROGRAMMED BY: SAM ANEMA
APPROVED BY: JACK R. MEAGHER
DATE: AUGUST, 1972 (VERSION 3)
CHI-SQUARE, COLLAPSING CHI-SQUARE,
GAMMA, AND THETA STATISTICS
1.0 PURPOSE
THIS IS A MULTIPURPOSE NON-PARAMETRIC STATISTICS PROGRAM DESIGNED TO SATISFY
SEVERAL DISTINCT NEEDS OF USERS.
THE USER SUBMITS AN R X C TABLE OF DATA.
LET THERE BE C LEVELS OF A COLUMN VARIABLE, SAY Y(1), Y(2),...,Y(C), AND LET
THERE BE R LEVELS OF A ROW VARIABLE, SAY Z(1), Z(2),...,Z(R).
ASSUME THE DATA FORMAT:
COLUMN VARIABLE
LEVEL Y(1) Y(2) .......... Y(C)
Z(1) X(11) X(12) X(1C)
Z(2) X(21) X(22) X(2C)
. . . .
. . . .
. . . .
Z(R) X(R1) X(R2) X(RC)
X(IJ) IS THE OBSERVED FREQUENCY OF THE I-TH LEVEL OF THE ROW VARIABLE (Z(I))
AND THE J-TH LEVEL OF THE COLUMN VARIABLE (Y(J)).
APPLICATION A: CHI-SQUARE STATISTICS
FOR EACH R X C TABLE SUBMITTED, THIS PROGRAM AUTOMATICALLY PRODUCES THE
FOLLOWING STANDARD CHI-SQUARE STATISTICS:
(I) THE CHI-SQUARE VALUE,
(II) CONTINGENCY COEFFICIENTS,
(III) PHI-SQUARE
(IV) PHI-PRIME, AND
(V) DEGREES OF FREEDOM = (R-1)(C-1)
(VI) CORRECTED CHI-SQUARE (2X2).
THESE STATISTICS ARE EXACTLY THE SAME AS THOSE DESCRIBED IN OPTION 1 OF
"INTEGRATED NON-PARAMETRIC STATISTICS", LIBRARY PROGRAM #1.1.1, A WMU COMPUTER
CENTER DOCUMENT.
APPLICATION B: GAMMA STATISTICS AND SOMERS D
THIS OPTION CAN BE USED TO DESCRIBE ASSOCIATION BETWEEN TWO ORDERED ORDINAL
SCALES OF DATA. BOTH VARIABLES Y AND Z MUST BE SUBMITTED IN A NATURAL ORDER-
ING. THAT IS,
Y(1)<=Y(2)<=...<=Y(C) OR Y(1)>=Y(2)>=...>=Y(C). SIMILARLY,
Z(1)<=Z(2)<=...<=Z(R) OR Z(1)>=Z(2)>=...>=Z(R).
NOTE: THE GAMMA FOR THE ORDERINGS Y(1)<=Y(2)<=...<=Y(C) AND Z(1)<=Z(2)<=...<=
Z(R) IS THE NEGATIVE OF THE GAMMA FOR THE ORDERINGS Y(1)<=Y(2)<=...<=Y(C)
AND Z(1)>=Z(2)>=...>=Z(R) OR VICE VERSA.
INCLUDED IN THIS OPTION ARE:
(I) GAMMA = G = GOODMAN-KRUSKAL GAMMA STATISTICS
(II) KENDALL TAU - A,
(III) KENDALL TAU - B,
(IV) KENDALL TAU - C,
(V) SOMERS DXY AND DYX - THE ASYMMETRIC MEASURES OF ASSOCIATION
NOTES 1. THE GOODMAN-KRUSKAL GAMMA STATISTIC IS DESCRIBED ON PAGE 85 OF
"ELEMENTARY APPLIED STATISTICS" BY LINTON C. FREEMAN.
2. ALL FIVE OF THE ABOVE MEASURES OF ASSOCIATION ARE DESCRIBED IN THE
PAPER, "A NEW ASYMMETRIC MEASURE OF ASSOCIATION FOR ORDINAL VARIABLES"
IN THE AMERICAN SOCIOLOGICAL REVIEW, VOLUME 27, DECEMBER, 1962, BY
ROBERT H. SOMERS. THE USER IS ADVISED TO ACQUAINT HIMSELF WITH THIS
PAPER BEFORE ATTEMPTING TO INTERPRET THE MEANING OF THESE FIVE
STATISTICS.
APPLICATION C: THETA STATISTIC OR COEFFICIENT OF DIFFERENTIATION
THIS OPTION CAN BE USED TO DESCRIBE ASSOCIATION BETWEEN ONE NOMINAL AND ONE
ORDINAL OR ORDERED SCALE OF DATA.
NOTES 1. THE Y-SCALE OR THE COLUMN VARIABLE MUST BE THE ORDINAL SCALE. HENCE,
EITHER Y(1)<=Y(2)<=...<=Y(C) OR Y(1)>=Y(2)>=...>=Y(C). THE Z-SCALE
OR THE ROW VARIABLE IS THE NOMINAL SCALE.
2. USING THE NOTATION OF FREEMAN'S TEXT "ELEMENTARY APPLIED STATISTICS",
PAGE 112, WE HAVE:
D SUM(D(I))
THETA = --- = ------
T(2) T(2)
D(I) = /F(B)-F(A)/ = ABSOLUTE VALUE OF THE FREQUENCY BELOW MINUS THE FRE-
QUENCY ABOVE FOR EACH PAIR OF CLASSES IN THE NOMINAL SCALE. THE SUMMATION IS
OVER ALL POSSIBLE PAIRS OF NOMINAL VARIABLES. T(2) IS CALCULATED BY MULTIPLY-
ING THE TOTAL FREQUENCY FOR EACH NOMINAL CLASS BY THE TOTALS FOR EACH OF THE
OTHER NOMINAL CLASSES TWO AT A TIME AND SUMMING OVER ALL PAIRS.
D = SUM(D(I)), T(2), AND THETA = D/T(2) ARE OUTPUTS FOR THIS OPTION. ONCE AGAIN
THE USER IS CAUTIONED TO READ CHAPTER 10 OF FREEMAN'S TEXT BEFORE USING AND
INTERPRETING THIS OPTION.
APPLICATION D: COLLAPSING CHI-SQUARE
USING THIS OPTION THE USER MAY SPECIFY CERTAIN ROWS AND CERTAIN COLUMNS OF THE
ORIGINAL R X C CONTINGENCY TABLE TO BE DELETED OR POOLED. A COLLAPSED R' X C'
CONTINGENCY TABLE IS OBTAINED WHERE R'<=R AND C'<=C.
(I) THE CHI-SQUARE STATISTICS GIVEN IN APPLICATION A ARE AUTOMATICALLY
PRODUCED FOR THE COLLAPSED R' X C' TABLE.
(II) THE USER MAY CALL FOR APPLICATIONS B AND C (GAMMA AND THETA) TO BE
CALCULATED FOR THE COLLAPSED R' X C' TABLE.
(III) THE USER MAY COLLAPSE THE ORIGINAL R X C TABLE AS OFTEN AS HE
PLEASES, BUT THE USER CANNOT COLLAPSE A COLLAPSED TABLE.
EXAMPLE: SUPPOSE THAT THE ORIGINAL DATA IS THE 3 X 4 TABLE GIVEN BELOW:
Y
1 2 3 4
1 3 5 2 7
2 11 1 3 2
Z
3 14 1 4 4
AFTER MAKING THE DESIRED CALCULATIONS ON THE ORIGINAL DATA, SUPPOSE THE USER
WISHES TO HAVE ADDITIONAL CALCULATIONS MADE ON CERTAIN COLLAPSED TABLES. TWO
ILLUSTRATIONS ARE PROVIDED:
(I) R' = 2 AND C' = 2, WHERE
ROW 1 OF THE COLLAPSED TABLE CONSISTS OF ROWS 1 AND 2 OF THE
ORIGINAL.
ROW 2 OF THE COLLAPSED TABLE CONSISTS OF ROW 3 OF THE ORIGINAL.
COLUMN 1 OF THE COLLAPSED TABLE CONSISTS OF COLUMN 1, 2, 3 OF THE
ORIGINAL.
COLUMN 2 OF THE COLLAPSED TABLE CONSISTS OF COLUMN 4 OF THE
ORIGINAL.
1' 2'
1 2 3 4 1' 2'
1 3 5 2 7 1' 25 9
1'
2 11 1 3 2 2' 19 4
2' 3 14 1 4 4
(II) R' = 2 AND C' = 2, WHERE
ROW 1 OF THE COLLAPSED TABLE CONSISTS OF ROWS 1 AND 3 OF THE
ORIGINAL.
ROW 2 OF THE COLLAPSED TABLE CONSISTS OF ROW 2 OF THE ORIGINAL.
COLUMN 1 OF THE COLLAPSED TABLE CONSISTS OF COLUMN 1 OF THE
ORIGINAL.
COLUMN 2 OF THE COLLAPSED TABLE CONSISTS OF COLUMN 3 OF THE
ORIGINAL.
1' 2'
1 3 2 4 1' 2'
1 3 2 5 7 1' 17 6
1'
3 14 4 1 4 2' 11 3
2' 2 11 3 1 2
TO SEE SPECIFICALLY HOW THESE TWO COLLAPSED TABLES ARE OBTAINED FROM THE
ORIGINAL TABLE, THE USER IS DIRECTED TO METHOD OF USE, LINES 11-19.
2.0 LIMITATION
(1) NO MORE THAN 20 ROWS OR 20 COLUMNS
(2) ENTRIES MUST BE INTEGERS
3.0 METHOD OF USE
THE FOLLOWING IS AN EXAMPLE OF PROGRAM OPERATION. THE RESPONSES THE USER MUST
SUPPLY ARE INDICATED BY <CR> AT END OF LINE.
A <CR> INDICATES A CARRIAGE RETURN (RETURN KEY).
.R COLCHI<CR>
LINE 1 HOW MANY ROWS?
LINE 2 3<CR>
LINE 3 HOW MANY COLUMNS?
LINE 4 4<CR>
LINE 5 ENTER IDENTIFICATION.
LINE 6 TRIAL RUN<CR>
LINE 7 ENTER FREQUENCIES
3,5,2,7<CR>
LINE 8 11,1,3,2<CR>
14,1,4,4<CR>
TRIAL RUN
CONTINGENCY TABLE
VAR 1 2 3 4
1 3 5 2 7 17
2 11 1 3 2 17
LINE 9 3 14 1 4 4 23
28 7 9 13 57
CHI-SQUARE = 14.69166 PROB = 0.02280
CONTINGENCY COEFFICIENT = 0.45269
PHI-SQUARE = 0.25775
PHI-PRIME = 0.35899
DEGREES OF FREEDOM = 6
TYPE:
1 TO TERMINATE
2 TO ENTER MORE DATA
LINE 10 3 TO COLLAPSE
4 FOR GAMMA STATISTICS
5 FOR THETA
LINE 11 3<CR>
LINE 12 WHAT IS THE NEW NUMBER OF ROW CATEGORIES?
LINE 13 2<CR>
LINE 14 ENTER NEW ROW CATEGORIZATION
1,2<CR>
LINE 15 3<CR>
LINE 16 WHAT IS THE NEW NUMBER OF COLUMN CATEGORIES?
LINE 17 2<CR>
LINE 18 ENTER NEW COLUMN CATEGORIZATION.
1,2<CR>
LINE 19 3,4<CR>
VAR 1 2
1 20 14 34
2 15 8 23
35 22 57
CHI-SQUARE = 0.23666 PROB = 0.62663
LINE 20 2X2 CORRECTED CHI-SQUARE = 0.04376 PROB = 0.83431
CONTINGENCY COEFFICIENT = 0.06430
PHI-SQUARE = 0.00415
PHI-PRIME = 0.06443
DEGREES OF FREEDOM = 1
TYPE:
1 TO TERMINATE
2 TO ENTER MORE DATA
LINE 21 3 TO COLLAPSE
4 FOR GAMMA STATISTICS
5 FOR THETA
LINE 22 1<CR>
CPU TIME: 0.67 ELAPSED TIME: 3:58.90
NO EXECUTION ERRORS DETECTED
EXIT
THE FOLLOWING IS A LINE BY LINE EXPLANATION OF THE ABOVE EXAMPLE.
LINES 1-4. THE NUMBER OF ROWS AND COLUMNS IN THE CONTINGENCY TABLE MUST BE
ENTERED. IN EACH CASE THE NUMBER MUST BE LESS THAN 20.
LINES 5-6. A LINE MUST BE ENTERED AT THIS POINT WHICH WILL APPEAR ABOVE THE
CONTINGENCY TABLE IN THE OUTPUT. IT MAY CONTAIN ANY INFORMATION THE USER
DESIRES.
LINES 7-8. THE CONTINGENCY TABLE MUST BE ENTERED AT THIS POINT. EACH ROW
MUST OCCUPY A SEPARATE LINE. ENTRIES IN A LINE MUST BE SEPARATED BY COMMAS.
LINE 9. THIS IS THE INITIAL CONTINGENCY TABLE ANALYSIS.
LINES 10-11. THE USER IS GIVEN THE CHOICE OF TERMINATING THE ANALYSIS, ENTER-
ING A NEW TABLE, COLLAPSING THE PRESENT TABLE (SEE APPLICATION D UNDER 1.0
PURPOSE), OBTAINING GAMMA STATISTICS FOR THE PRESENT TABLE (SEE APPLICATION B
UNDER 1.0 PURPOSE), OR OBTAINING THETA STATISTICS (SEE APPLICATION C UNDER 1.0
PURPOSE).
LINES 12-13. IF THE USER CHOOSES TO COLLAPSE HIS EXISTING CONTINGENCY TABLE
HE MUST SPECIFY THE NUMBER OF ROWS CONTAINED IN THE COLLAPSED TABLE.
LINES 14-15. FOR EACH OF THE ROWS SPECIFIED IN LINE 13 THE USER MUST ENTER
A LINE WHICH CONTAINS THE CATEGORIES WHICH ARE TO BE COMBINED. ENTRIES MUST BE
SEPARATED BY COMMAS.
LINES 16-17. THE USER MUST SPECIFY THE NUMBER OF COLUMNS CONTAINED IN THE
COLLAPSED TABLE.
LINES 18-19. FOR EACH OF THE COLUMNS SPECIFIED IN LINE 17 THE USER MUST ENTER
A LINE WHICH CONTAINS THE CATEGORIES WHICH ARE TO BE COMBINED. ENTRIES MUST
BE SEPARATED BY COMMAS.
LINE 20. THE CONTINGENCY ANALYSIS OF THE COLLAPSED TABLE IS PRINTED.
LINES 21-22. AGAIN THE USER HAS THE CHOICES AS IN LINES 10, 11. THETA AND
GAMMA ANALYSIS, IF SELECTED, WOULD BE PERFORMED ON THE COLLAPSED TABLE.
COLLAPSING, IF SELECTED, WOULD BE PERFORMED ON THE ORIGINAL TABLE.
4.0 SAMPLE TERMINAL RUN
.R COLCHI
WMU - COLLAPSING CHI SQUARE
HOW MANY ROWS?
3
HOW MANY COLUMNS?
4
ENTER IDENTIFICATION.
TRIAL RUN
ENTER FREQUENCIES.
3,5,2,7
11,1,3,2
14,1,4,4
TRIAL RUN
CONTINGENCY TABLE
VAR 1 2 3 4
1 3 5 2 7 17
2 11 1 3 2 17
3 14 1 4 4 23
28 7 9 13 57
CHI-SQUARE = 14.69166 PROB = 0.02280
CONTINGENCY COEFFICIENT = 0.45269
PHI-SQUARE = 0.25775
PHI-PRIME = 0.35899
DEGREES OF FREEDOM = 6
TYPE:
1 TO TERMINATE
2 TO ENTER MORE DATA
3 TO COLLAPSE
4 FOR GAMMA STATISTICS
5 FOR THETA
4
GAMMA = -0.36159
TAU-A = -0.17105
TAU-B = -0.25349
TAU-C = -0.25208
DYX = -0.25208
DXY = -0.25490
TYPE:
1 TO TERMINATE
2 TO ENTER MORE DATA
3 TO COLLAPSE
4 FOR GAMMA STATISTICS
5 FOR THETA
5
D = 317
T2 = 1071
THETA = 0.29599
TYPE:
1 TO TERMINATE
2 TO ENTER MORE DATA
3 TO COLLAPSE
4 FOR GAMMA STATISTICS
5 FOR THETA
3
WHAT IS THE NEW NUMBER OF ROW CATEGORIES?
2
ENTER NEW ROW CATEGORIZATION
1,2
3
WHAT IS THE NEW NUMBER OF COLUMN CATEGORIES?
2
ENTER NEW COLUMN CATEGORIZATION.
1,2
3,4
CONTINGENCY TABLE
VAR 1 2
1 20 14 34
2 15 8 23
35 22 57
CHI-SQUARE = 0.23666 PROB = 0.62663
2X2 CORRECTED CHI-SQUARE = 0.04376 PROB = 0.83431
CONTINGENCY COEFFICIENT = 0.06430
PHI-SQUARE = 0.00415
PHI-PRIME = 0.06443
DEGREES OF FREEDOM = 1
TYPE:
1 TO TERMINATE
2 TO ENTER MORE DATA
3 TO COLLAPSE
4 FOR GAMMA STATISTICS
5 FOR THETA
1
END OF EXECUTION
CPU TIME: 0.35 ELAPSED TIME: 6.05
EXIT