You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
/** Calculate distance of numerical profiles based on KLL Sketches and L-Infinity Distance */
31
+
// Chi-square constants
32
+
// at least two distinct categories are required to run the chi-square test for a categorical variable
33
+
privatevalchisquareMinDimension:Int=2
34
+
35
+
//for tables larger than 2 x 2: "No more than 20% of the expected counts are less than 5 and all individual expected counts are 1 or greater" (Yates, Moore & McCabe, 1999, The Practice of Statistics, p. 734)
36
+
privatevaldefaultAbsThresholdYates:Integer=5
37
+
privatevaldefaultPercThresholdYates:Double=0.2
38
+
39
+
// for 2x2 tables: all expected counts should be 10 or greater (Cochran, William G. "The χ2 test of goodness of fit." The Annals of mathematical statistics (1952): 315-345.)
40
+
privatevaldefaultAbsThresholdCochran:Integer=10
41
+
42
+
// Default c(alpha) value corresponding to an alpha value of 0.003, Eq. (15) in Section 3.3.1 of Knuth, D.E., The Art of Computer Programming, Volume 2 (Seminumerical Algorithms), 3rd Edition, Addison Wesley, Reading Mass, 1998.
/** Calculate distance of categorical profiles based on different distance methods
77
+
*
78
+
* Thresholds for chi-square method:
79
+
* - for 2x2 tables: all expected counts should be 10 or greater (Cochran, William G. "The χ2 test of goodness of fit." The Annals of mathematical statistics (1952): 315-345.)
80
+
* - for tables larger than 2 x 2: "No more than 20% of the expected counts are less than 5 and all individual expected counts are 1 or greater" (Yates, Moore & McCabe, 1999, The Practice of Statistics, p. 734)
81
+
*
82
+
* @paramsample1 the mapping between categories(keys) and counts(values) of the observed sample
83
+
* @paramsample2 the mapping between categories(keys) and counts(values) of the expected baseline
84
+
* @paramcorrectForLowNumberOfSamples if true returns chi-square statistics otherwise p-value
85
+
* @parammethod Method to use: LInfinity or Chisquare
86
+
* @paramabsThresholdYates Yates absolute threshold for tables larger than 2x2
87
+
* @parampercThresholdYates Yates percentage of categories that can be below threshold for tables larger than 2x2
88
+
* @paramabsThresholdCochran Cochran absolute threshold for 2x2 tables
89
+
* @return distance can be an absolute distance or a p-value based on the correctForLowNumberOfSamples argument
/** Calculate distance of categorical profiles based on Chisquare test or stats
111
+
*
112
+
* for 2x2 tables: all expected counts should be 10 or greater (Cochran, William G. "The χ2 test of goodness of fit." The Annals of mathematical statistics (1952): 315-345.)
113
+
* for tables larger than 2 x 2: "No more than 20% of the expected counts are less than 5 and all individual expected counts are 1 or greater" (Yates, Moore & McCabe, 1999, The Practice of Statistics, p. 734)
114
+
*
115
+
* @paramsample the mapping between categories(keys) and counts(values) of the observed sample
116
+
* @paramexpected the mapping between categories(keys) and counts(values) of the expected baseline
117
+
* @paramcorrectForLowNumberOfSamples if true returns chi-square statistics otherwise p-value
118
+
* @paramabsThresholdYates Yates absolute threshold for tables larger than 2x2
119
+
* @parampercThresholdYates Yates percentage of categories that can be below threshold for tables larger than 2x2
120
+
* @paramabsThresholdCochran Cochran absolute threshold for 2x2 tables
121
+
* @return distance can be an absolute distance or a p-value based on the correctForLowNumberOfSamples argument
/** Regroup categories with elements below threshold, required for chi-square test
160
+
*
161
+
* for 2x2 tables: all expected counts should be 10 or greater (Cochran, William G. "The χ2 test of goodness of fit." The Annals of mathematical statistics (1952): 315-345.)
162
+
* for tables larger than 2 x 2: "No more than 20% of the expected counts are less than 5 and all individual expected counts are 1 or greater" (Yates, Moore & McCabe, 1999, The Practice of Statistics, p. 734)
163
+
*
164
+
* @paramsample the mapping between categories(keys) and counts(values) of the observed sample
165
+
* @paramexpected the mapping between categories(keys) and counts(values) of the expected baseline
166
+
* @paramabsThresholdYates Yates absolute threshold for tables larger than 2x2
167
+
* @parampercThresholdYates Yates percentage of categories that can be below threshold for tables larger than 2x2
168
+
* @paramabsThresholdCochran Cochran absolute threshold for 2x2 tables
169
+
* @return (sample, expected) returns the two regrouped mappings
0 commit comments