Class ChiSquareTestImpl

java.lang.Object
org.apache.commons.math.stat.inference.ChiSquareTestImpl
All Implemented Interfaces:
ChiSquareTest, UnknownDistributionChiSquareTest

public class ChiSquareTestImpl extends Object implements UnknownDistributionChiSquareTest
Implements Chi-Square test statistics defined in the UnknownDistributionChiSquareTest interface.
Version:
$Revision: 990655 $ $Date: 2010-08-29 23:49:40 +0200 (dim. 29 août 2010) $
  • Constructor Details

    • ChiSquareTestImpl

      public ChiSquareTestImpl()
      Construct a ChiSquareTestImpl
    • ChiSquareTestImpl

      public ChiSquareTestImpl(ChiSquaredDistribution x)
      Create a test instance using the given distribution for computing inference statistics.
      Parameters:
      x - distribution used to compute inference statistics.
      Since:
      1.2
  • Method Details

    • chiSquare

      public double chiSquare(double[] expected, long[] observed) throws IllegalArgumentException
      Computes the Chi-Square statistic comparing observed and expected frequency counts.

      This statistic can be used to perform a Chi-Square test evaluating the null hypothesis that the observed counts follow the expected distribution.

      Preconditions:

      • Expected counts must all be positive.
      • Observed counts must all be >= 0.
      • The observed and expected arrays must have the same length and their common length must be at least 2.

      If any of the preconditions are not met, an IllegalArgumentException is thrown.

      Note: This implementation rescales the expected array if necessary to ensure that the sum of the expected and observed counts are equal.

      Specified by:
      chiSquare in interface ChiSquareTest
      Parameters:
      expected - array of expected frequency counts
      observed - array of observed frequency counts
      Returns:
      chi-square test statistic
      Throws:
      IllegalArgumentException - if preconditions are not met or length is less than 2
    • chiSquareTest

      public double chiSquareTest(double[] expected, long[] observed) throws IllegalArgumentException, MathException
      Returns the observed significance level, or p-value, associated with a Chi-square goodness of fit test comparing the observed frequency counts to those in the expected array.

      The number returned is the smallest significance level at which one can reject the null hypothesis that the observed counts conform to the frequency distribution described by the expected counts.

      Preconditions:

      • Expected counts must all be positive.
      • Observed counts must all be >= 0.
      • The observed and expected arrays must have the same length and their common length must be at least 2.

      If any of the preconditions are not met, an IllegalArgumentException is thrown.

      Note: This implementation rescales the expected array if necessary to ensure that the sum of the expected and observed counts are equal.

      Specified by:
      chiSquareTest in interface ChiSquareTest
      Parameters:
      expected - array of expected frequency counts
      observed - array of observed frequency counts
      Returns:
      p-value
      Throws:
      IllegalArgumentException - if preconditions are not met
      MathException - if an error occurs computing the p-value
    • chiSquareTest

      public boolean chiSquareTest(double[] expected, long[] observed, double alpha) throws IllegalArgumentException, MathException
      Performs a Chi-square goodness of fit test evaluating the null hypothesis that the observed counts conform to the frequency distribution described by the expected counts, with significance level alpha. Returns true iff the null hypothesis can be rejected with 100 * (1 - alpha) percent confidence.

      Example:
      To test the hypothesis that observed follows expected at the 99% level, use

      chiSquareTest(expected, observed, 0.01)

      Preconditions:

      • Expected counts must all be positive.
      • Observed counts must all be >= 0.
      • The observed and expected arrays must have the same length and their common length must be at least 2.
      • 0 invalid input: '<' alpha invalid input: '<' 0.5

      If any of the preconditions are not met, an IllegalArgumentException is thrown.

      Note: This implementation rescales the expected array if necessary to ensure that the sum of the expected and observed counts are equal.

      Specified by:
      chiSquareTest in interface ChiSquareTest
      Parameters:
      expected - array of expected frequency counts
      observed - array of observed frequency counts
      alpha - significance level of the test
      Returns:
      true iff null hypothesis can be rejected with confidence 1 - alpha
      Throws:
      IllegalArgumentException - if preconditions are not met
      MathException - if an error occurs performing the test
    • chiSquare

      public double chiSquare(long[][] counts) throws IllegalArgumentException
      Description copied from interface: ChiSquareTest
      Computes the Chi-Square statistic associated with a chi-square test of independence based on the input counts array, viewed as a two-way table.

      The rows of the 2-way table are count[0], ... , count[count.length - 1]

      Preconditions:

      • All counts must be >= 0.
      • The count array must be rectangular (i.e. all count[i] subarrays must have the same length).
      • The 2-way table represented by counts must have at least 2 columns and at least 2 rows.

      If any of the preconditions are not met, an IllegalArgumentException is thrown.

      Specified by:
      chiSquare in interface ChiSquareTest
      Parameters:
      counts - array representation of 2-way table
      Returns:
      chi-square test statistic
      Throws:
      IllegalArgumentException - if preconditions are not met
    • chiSquareTest

      public double chiSquareTest(long[][] counts) throws IllegalArgumentException, MathException
      Description copied from interface: ChiSquareTest
      Returns the observed significance level, or p-value, associated with a chi-square test of independence based on the input counts array, viewed as a two-way table.

      The rows of the 2-way table are count[0], ... , count[count.length - 1]

      Preconditions:

      • All counts must be >= 0.
      • The count array must be rectangular (i.e. all count[i] subarrays must have the same length).
      • The 2-way table represented by counts must have at least 2 columns and at least 2 rows.

      If any of the preconditions are not met, an IllegalArgumentException is thrown.

      Specified by:
      chiSquareTest in interface ChiSquareTest
      Parameters:
      counts - array representation of 2-way table
      Returns:
      p-value
      Throws:
      IllegalArgumentException - if preconditions are not met
      MathException - if an error occurs computing the p-value
    • chiSquareTest

      public boolean chiSquareTest(long[][] counts, double alpha) throws IllegalArgumentException, MathException
      Description copied from interface: ChiSquareTest
      Performs a chi-square test of independence evaluating the null hypothesis that the classifications represented by the counts in the columns of the input 2-way table are independent of the rows, with significance level alpha. Returns true iff the null hypothesis can be rejected with 100 * (1 - alpha) percent confidence.

      The rows of the 2-way table are count[0], ... , count[count.length - 1]

      Example:
      To test the null hypothesis that the counts in count[0], ... , count[count.length - 1] all correspond to the same underlying probability distribution at the 99% level, use

      chiSquareTest(counts, 0.01)

      Preconditions:

      • All counts must be >= 0.
      • The count array must be rectangular (i.e. all count[i] subarrays must have the same length).
      • The 2-way table represented by counts must have at least 2 columns and at least 2 rows.

      If any of the preconditions are not met, an IllegalArgumentException is thrown.

      Specified by:
      chiSquareTest in interface ChiSquareTest
      Parameters:
      counts - array representation of 2-way table
      alpha - significance level of the test
      Returns:
      true iff null hypothesis can be rejected with confidence 1 - alpha
      Throws:
      IllegalArgumentException - if preconditions are not met
      MathException - if an error occurs performing the test
    • chiSquareDataSetsComparison

      public double chiSquareDataSetsComparison(long[] observed1, long[] observed2) throws IllegalArgumentException
      Description copied from interface: UnknownDistributionChiSquareTest

      Computes a Chi-Square two sample test statistic comparing bin frequency counts in observed1 and observed2. The sums of frequency counts in the two samples are not required to be the same. The formula used to compute the test statistic is

      ∑[(K * observed1[i] - observed2[i]/K)2 / (observed1[i] + observed2[i])] where
      K = &sqrt;[invalid input: '&sum'(observed2 / ∑(observed1)]

      This statistic can be used to perform a Chi-Square test evaluating the null hypothesis that both observed counts follow the same distribution.

      Preconditions:

      • Observed counts must be non-negative.
      • Observed counts for a specific bin must not both be zero.
      • Observed counts for a specific sample must not all be 0.
      • The arrays observed1 and observed2 must have the same length and their common length must be at least 2.

      If any of the preconditions are not met, an IllegalArgumentException is thrown.

      Specified by:
      chiSquareDataSetsComparison in interface UnknownDistributionChiSquareTest
      Parameters:
      observed1 - array of observed frequency counts of the first data set
      observed2 - array of observed frequency counts of the second data set
      Returns:
      chi-square test statistic
      Throws:
      IllegalArgumentException - if preconditions are not met
      Since:
      1.2
    • chiSquareTestDataSetsComparison

      public double chiSquareTestDataSetsComparison(long[] observed1, long[] observed2) throws IllegalArgumentException, MathException
      Description copied from interface: UnknownDistributionChiSquareTest

      Returns the observed significance level, or p-value, associated with a Chi-Square two sample test comparing bin frequency counts in observed1 and observed2.

      The number returned is the smallest significance level at which one can reject the null hypothesis that the observed counts conform to the same distribution.

      See UnknownDistributionChiSquareTest.chiSquareDataSetsComparison(long[], long[]) for details on the formula used to compute the test statistic. The degrees of of freedom used to perform the test is one less than the common length of the input observed count arrays.

      Preconditions:
      • Observed counts must be non-negative.
      • Observed counts for a specific bin must not both be zero.
      • Observed counts for a specific sample must not all be 0.
      • The arrays observed1 and observed2 must have the same length and their common length must be at least 2.

      If any of the preconditions are not met, an IllegalArgumentException is thrown.

      Specified by:
      chiSquareTestDataSetsComparison in interface UnknownDistributionChiSquareTest
      Parameters:
      observed1 - array of observed frequency counts of the first data set
      observed2 - array of observed frequency counts of the second data set
      Returns:
      p-value
      Throws:
      IllegalArgumentException - if preconditions are not met
      MathException - if an error occurs computing the p-value
      Since:
      1.2
    • chiSquareTestDataSetsComparison

      public boolean chiSquareTestDataSetsComparison(long[] observed1, long[] observed2, double alpha) throws IllegalArgumentException, MathException
      Description copied from interface: UnknownDistributionChiSquareTest

      Performs a Chi-Square two sample test comparing two binned data sets. The test evaluates the null hypothesis that the two lists of observed counts conform to the same frequency distribution, with significance level alpha. Returns true iff the null hypothesis can be rejected with 100 * (1 - alpha) percent confidence.

      See UnknownDistributionChiSquareTest.chiSquareDataSetsComparison(long[], long[]) for details on the formula used to compute the Chisquare statistic used in the test. The degrees of of freedom used to perform the test is one less than the common length of the input observed count arrays.

      Preconditions:
      • Observed counts must be non-negative.
      • Observed counts for a specific bin must not both be zero.
      • Observed counts for a specific sample must not all be 0.
      • The arrays observed1 and observed2 must have the same length and their common length must be at least 2.
      • 0 invalid input: '<' alpha invalid input: '<' 0.5

      If any of the preconditions are not met, an IllegalArgumentException is thrown.

      Specified by:
      chiSquareTestDataSetsComparison in interface UnknownDistributionChiSquareTest
      Parameters:
      observed1 - array of observed frequency counts of the first data set
      observed2 - array of observed frequency counts of the second data set
      alpha - significance level of the test
      Returns:
      true iff null hypothesis can be rejected with confidence 1 - alpha
      Throws:
      IllegalArgumentException - if preconditions are not met
      MathException - if an error occurs performing the test
      Since:
      1.2
    • setDistribution

      public void setDistribution(ChiSquaredDistribution value)
      Modify the distribution used to compute inference statistics.
      Parameters:
      value - the new distribution
      Since:
      1.2