-
Notifications
You must be signed in to change notification settings - Fork 15
Goodness of fit Test (Chi squared Distribution)
Implementation of the Goodness of fit test, that allow us to know how well an observed sample fits an assumed model. Commonly used to verify if a sample belongs to known distribution.
This method calculates the chi statistic for two specified samples. If the expected sample is a number, we asumme that all expected observations has the same probability to occur, hence we expect to see the same number of expected observations per each observed value.
It returns an array that contains the chi statistic and the degrees of freedom. In that order.
# Random 'expected' numbers generated from standard normal distribution
pry(main)> expected_normal = [1.8431602705473755, 1.5354750805480946, 2.1954111293990977, 0.8336726971380797, 0.08482558536707968, 0.5310143881809778]
# Random 'observed' numbers generated assuming an uniform distribution
pry(main)> observed_uniform = []
pry(main)> 6.times { observed_uniform << rand }
pry(main)> observed_uniform
=> [0.8133472663590252, 0.7810839183757206, 0.3012726333220074, 0.4924476634781214, 0.6520781068814389, 0.280418598131673]
# Chi statistic, where first value is the statistic and the second one is the degrees of freedom
pry(main)> StatisticalTest::ChiSquaredTest.chi_statistic(expected_normal, observed_uniform)
=> [[6.631528378944574, 5] # [chi statistic, degrees of freedom]
This method receives three parameters: The alpha value, the expected sample and the observed values. It returns a hash with the following keys:
-
probability
: it calculates the probability of the chi statistic, using the Chi squared CDF. -
p_value
: It returns the p value, calculated as1 - probability
. -
alpha
: the specified alpha value. -
null
: Eithertrue
orfalse
. If true, it means that the null hypothesis should not be rejected. -
alternative
: Eithertrue
orfalse
. If true, it means that the null hypothesis can be rejected. -
confidence_level
: Defined as1 - alpha
.
Keep in mind that the null
and alternative
keys cannot be true
at the same time.
# Goodness of fit with alpha 0.05
pry(main)> StatisticalTest::ChiSquaredTest.goodness_of_fit(alpha = 0.05, expected_normal, observed_uniform)
=> {:probability=>0.9891796008014271, :p_value=>0.010820399198572916, :alpha=>0.05, :null=>false, :alternative=>true, :confidence_level=>0.95}
# Goodness of fit with alpha 0.01
pry(main)> StatisticalTest::ChiSquaredTest.goodness_of_fit(alpha = 0.01, expected_normal, observed_uniform)
=> {:probability=>0.9891796008014271, :p_value=>0.010820399198572916, :alpha=>0.01, :null=>true, :alternative=>false, :confidence_level=>0.99}