Skip to content

Goodness of fit Test (Chi squared Distribution)

Esteban Zapata Rojas edited this page Jan 15, 2025 · 2 revisions

Goodness of fit test

Implementation of the Goodness of fit test, that allow us to know how well an observed sample fits an assumed model. Commonly used to verify if a sample belongs to known distribution.

Class methods

Chi statistic

This method calculates the chi statistic for two specified samples. If the expected sample is a number, we asumme that all expected observations has the same probability to occur, hence we expect to see the same number of expected observations per each observed value.

It returns an array that contains the chi statistic and the degrees of freedom. In that order.

# Random 'expected' numbers generated from standard normal distribution
pry(main)> expected_normal = [1.8431602705473755, 1.5354750805480946, 2.1954111293990977, 0.8336726971380797, 0.08482558536707968, 0.5310143881809778]
# Random 'observed' numbers generated assuming an uniform distribution
pry(main)> observed_uniform = []
pry(main)> 6.times { observed_uniform << rand }
pry(main)> observed_uniform
=> [0.8133472663590252, 0.7810839183757206, 0.3012726333220074, 0.4924476634781214, 0.6520781068814389, 0.280418598131673]
# Chi statistic, where first value is the statistic and the second one is the degrees of freedom
pry(main)> StatisticalTest::ChiSquaredTest.chi_statistic(expected_normal, observed_uniform)
=> [[6.631528378944574, 5] # [chi statistic, degrees of freedom]

goodness of fit test

This method receives three parameters: The alpha value, the expected sample and the observed values. It returns a hash with the following keys:

  • probability: it calculates the probability of the chi statistic, using the Chi squared CDF.
  • p_value: It returns the p value, calculated as 1 - probability.
  • alpha: the specified alpha value.
  • null: Either true or false. If true, it means that the null hypothesis should not be rejected.
  • alternative: Either true or false. If true, it means that the null hypothesis can be rejected.
  • confidence_level: Defined as 1 - alpha.

Keep in mind that the null and alternative keys cannot be true at the same time.

# Goodness of fit with alpha 0.05
pry(main)> StatisticalTest::ChiSquaredTest.goodness_of_fit(alpha = 0.05, expected_normal, observed_uniform)
=> {:probability=>0.9891796008014271, :p_value=>0.010820399198572916, :alpha=>0.05, :null=>false, :alternative=>true, :confidence_level=>0.95}
# Goodness of fit with alpha 0.01
pry(main)> StatisticalTest::ChiSquaredTest.goodness_of_fit(alpha = 0.01, expected_normal, observed_uniform)
=> {:probability=>0.9891796008014271, :p_value=>0.010820399198572916, :alpha=>0.01, :null=>true, :alternative=>false, :confidence_level=>0.99}