DiscrimTwoSample¶

class hyppo.discrim.DiscrimTwoSample(is_dist=False, remove_isolates=True)¶

Two Sample Discriminability test statistic and p-value.

Two sample test measures whether the discriminability is different for one dataset compared to another. More details can be described in [1].

Let \(\hat D_{x_1}\) denote the sample discriminability of one approach, and \(\hat D_{x_2}\) denote the sample discriminability of another approach. Then,

\[\begin{split}H_0: D_{x_1} &= D_{x_2} \\ H_A: D_{x_1} &> D_{x_2}\end{split}\]

Alternatively, tests can be done for \(D_{x_1} < D_{x_2}\) and \(D_{x_1} \neq D_{x_2}\).

Parameters

is_dist (bool, default: False) -- Whether x1 and x2 are distance matrices or not.
remove_isolates (bool, default: True) -- Whether to remove the measurements with a single instance or not.

Methods Summary

`DiscrimTwoSample.statistic`(x, y)	Helper function that calculates the discriminability test statistic.
`DiscrimTwoSample.test`(x1, x2, y[, reps, ...])	Calculates the test statistic and p-value for a two sample test for discriminability.

DiscrimTwoSample.statistic(x, y)¶

Helper function that calculates the discriminability test statistic.

Parameters: x, y (ndarray) -- Input data matrices. x and y must have the same number of samples. That is, the shapes must be (n, p) and (n, q) where n is the number of samples and p and q are the number of dimensions. Alternatively, x and y can be distance matrices, where the shapes must both be (n, n).
Returns: stat (float) -- The computed two sample discriminability statistic.

DiscrimTwoSample.test(x1, x2, y, reps=1000, alt='neq', workers=- 1)¶

Calculates the test statistic and p-value for a two sample test for discriminability.

Parameters

x1, x2 (ndarray) -- Input data matrices. x1 and x2 must have the same number of samples. That is, the shapes must be (n, p) and (n, q) where n is the number of samples and p and q are the number of dimensions. Alternatively, x1 and x2 can be distance matrices, where the shapes must both be (n, n), and is_dist must set to True in this case.
y (ndarray) -- A vector containing the sample ids for our n samples. Should be matched to the inputs such that y[i] is the corresponding label for x_1[i, :] and x_2[i, :].
reps (int, optional (default: 1000)) -- The number of replications used to estimate the null distribution when using the permutation test used to calculate the p-value.
alt ({"greater", "less", "neq"} (default: "neq")) -- The alternative hypothesis for the test. Can test that first dataset is more discriminable (alt = "greater"), less discriminable (alt = "less") or unequal discriminability (alt = "neq").
workers (int, optional (default: -1)) -- The number of cores to parallelize the p-value computation over. Supply -1 to use all cores available to the Process.

Returns

d1 (float) -- The computed discriminability score for x1.
d2 (float) -- The computed discriminability score for x2.
pvalue (float) -- The computed two sample test p-value.

Examples

>>> import numpy as np
>>> from hyppo.discrim import DiscrimTwoSample
>>> x1 = np.ones((100,2), dtype=float)
>>> x2 = np.concatenate([np.zeros((50, 2)), np.ones((50, 2))], axis=0)
>>> y = np.concatenate([np.zeros(50), np.ones(50)], axis=0)
>>> discrim1, discrim2, pvalue = DiscrimTwoSample().test(x1, x2, y)
>>> '%.1f, %.1f, %.2f' % (discrim1, discrim2, pvalue)
'0.5, 1.0, 0.00'

Examples using `hyppo.discrim.DiscrimTwoSample`¶

Discriminability Testing¶

DiscrimTwoSample¶

Examples using hyppo.discrim.DiscrimTwoSample¶

Examples using `hyppo.discrim.DiscrimTwoSample`¶