distributions.discrete_family¶
Module: distributions.discrete_family
¶
Inheritance diagram for selectinf.distributions.discrete_family
:
This module contains a class for discrete 1-dimensional exponential families. The main uses for this class are exact (post-selection) hypothesis tests and confidence intervals.
discrete_family
¶
-
class
selectinf.distributions.discrete_family.
discrete_family
(sufficient_stat, weights, theta=0.0)[source]¶ Bases:
object
-
__init__
(sufficient_stat, weights, theta=0.0)[source]¶ A discrete 1-dimensional exponential family with reference measure \(\sum_j w_j \delta_{X_j}\) and sufficient statistic sufficient_stat. For any \(\theta\), the distribution is
\[P_{\theta} = \sum_{j} e^{\theta X_j - \Lambda(\theta)} w_j \delta_{X_j}\]where
\[\Lambda(\theta) = \log \left(\sum_j w_j e^{\theta X_j} \right).\]- Parameters
sufficient_stat : np.float((n))
weights : np.float(n)
Notes
The weights are normalized to sum to 1.
-
property
theta
¶ The natural parameter of the family.
-
property
partition
¶ Partition function at self.theta:
\[\sum_j e^{\theta X_j} w_j\]
-
property
sufficient_stat
¶ Sufficient statistics of the exponential family.
-
property
weights
¶ Weights of the exponential family.
-
pdf
(theta)[source]¶ Density of \(P_{\theta}\) with respect to \(P_0\).
- Parameters
theta : float
Natural parameter.
- Returns
pdf : np.float
-
cdf
(theta, x=None, gamma=1)[source]¶ The cumulative distribution function of \(P_{\theta}\) with weight gamma at x
\[P_{\theta}(X < x) + \gamma * P_{\theta}(X = x)\]- Parameters
theta : float
Natural parameter.
x : float (optional)
Where to evaluate CDF.
gamma : float(optional)
Weight given at x.
- Returns
cdf : np.float
-
ccdf
(theta, x=None, gamma=0, return_unnorm=False)[source]¶ The complementary cumulative distribution function (i.e. survival function) of \(P_{\theta}\) with weight gamma at x
\[P_{\theta}(X > x) + \gamma * P_{\theta}(X = x)\]- Parameters
theta : float
Natural parameter.
x : float (optional)
Where to evaluate CCDF.
gamma : float(optional)
Weight given at x.
- Returns
ccdf : np.float
-
E
(theta, func)[source]¶ Expectation of func under \(P_{\theta}\)
- Parameters
theta : float
Natural parameter.
func : callable
Assumed to be vectorized.
gamma : float(optional)
Weight given at x.
- Returns
E : np.float
-
Var
(theta, func)[source]¶ Variance of func under \(P_{\theta}\)
- Parameters
theta : float
Natural parameter.
func : callable
Assumed to be vectorized.
- Returns
var : np.float
-
Cov
(theta, func1, func2)[source]¶ Covariance of func1 and func2 under \(P_{\theta}\)
- Parameters
theta : float
Natural parameter.
func1, func2 : callable
Assumed to be vectorized.
- Returns
cov : np.float
-
two_sided_acceptance
(theta, alpha=0.05, tol=1e-06)[source]¶ Compute cutoffs of UMPU two-sided test.
- Parameters
theta : float
Natural parameter.
alpha : float (optional)
Size of two-sided test.
tol : float
Tolerance for root-finding.
- Returns
left_cut : (float, float)
Boundary and randomization weight for left endpoint.
right_cut : (float, float)
Boundary and randomization weight for right endpoint.
-
two_sided_test
(theta0, observed, alpha=0.05, randomize=True, auxVar=None)[source]¶ Perform UMPU two-sided test.
- Parameters
theta0 : float
Natural parameter under null hypothesis.
observed : float
Observed sufficient statistic.
alpha : float (optional)
Size of two-sided test.
randomize : bool
Perform the randomized test (or conservative test).
auxVar : [None, float]
If randomizing and not None, use this as the random uniform variate.
- Returns
decision : np.bool
Is the null hypothesis \(H_0:\theta=\theta_0\) rejected?
Notes
We need an auxiliary uniform variable to carry out the randomized test. Larger auxVar corresponds to x being slightly “larger.” It can be passed in, or chosen at random. If randomize=False, we get a conservative test.
-
one_sided_test
(theta0, observed, alternative='greater', alpha=0.05, randomize=True, auxVar=None)[source]¶ Perform UMPU one-sided test.
- Parameters
theta0 : float
Natural parameter under null hypothesis.
observed : float
Observed sufficient statistic.
alternative : str
One of [‘greater’, ‘less’]
alpha : float (optional)
Size of two-sided test.
randomize : bool
Perform the randomized test (or conservative test).
auxVar : [None, float]
If randomizing and not None, use this as the random uniform variate.
- Returns
decision : np.bool
Is the null hypothesis \(H_0:\theta=\theta_0\) rejected?
Notes
We need an auxiliary uniform variable to carry out the randomized test. Larger auxVar corresponds to x being slightly “larger.” It can be passed in, or chosen at random. If randomize=False, we get a conservative test.
-
interval
(observed, alpha=0.05, randomize=True, auxVar=None, tol=1e-06)[source]¶ Form UMAU confidence interval.
- Parameters
observed : float
Observed sufficient statistic.
alpha : float (optional)
Size of two-sided test.
randomize : bool
Perform the randomized test (or conservative test).
auxVar : [None, float]
If randomizing and not None, use this as the random uniform variate.
- Returns
lower, upper : float
Limits of confidence interval.
-
equal_tailed_interval
(observed, alpha=0.05, randomize=True, auxVar=None, tol=1e-06)[source]¶ Form interval by inverting equal-tailed test with \(lpha/2\) in each tail.
- Parameters
observed : float
Observed sufficient statistic.
alpha : float (optional)
Size of two-sided test.
randomize : bool
Perform the randomized test (or conservative test).
auxVar : [None, float]
If randomizing and not None, use this as the random uniform variate.
- Returns
lower, upper : float
Limits of confidence interval.
-
equal_tailed_test
(theta0, observed, alpha=0.05)[source]¶ Perform UMPU two-sided test.
- Parameters
theta0 : float
Natural parameter under null hypothesis.
observed : float
Observed sufficient statistic.
alpha : float (optional)
Size of two-sided test.
randomize : bool
Perform the randomized test (or conservative test).
auxVar : [None, float]
If randomizing and not None, use this as the random uniform variate.
- Returns
decision : np.bool
Is the null hypothesis \(H_0:\theta=\theta_0\) rejected?
Notes
We need an auxiliary uniform variable to carry out the randomized test. Larger auxVar corresponds to x being slightly “larger.” It can be passed in, or chosen at random. If randomize=False, we get a conservative test.
-
one_sided_acceptance
(theta, alpha=0.05, alternative='greater', tol=1e-06)[source]¶ Compute the acceptance region cutoffs of UMPU one-sided test.
TODO: Include randomization?
- Parameters
theta : float
Natural parameter.
alpha : float (optional)
Size of two-sided test.
alternative : str
One of [‘greater’, ‘less’].
tol : float
Tolerance for root-finding.
- Returns
left_cut : (float, float)
Boundary and randomization weight for left endpoint.
right_cut : (float, float)
Boundary and randomization weight for right endpoint.
-
equal_tailed_acceptance
(theta0, alpha=0.05)[source]¶ Compute the acceptance region cutoffs of equal-tailed test (without randomization). Therefore, size may not be exactly \(\alpha\).
- Parameters
theta0 : float
Natural parameter under null hypothesis.
alpha : float (optional)
Size of two-sided test.
- Returns
left_cut : (float, float)
Boundary and randomization weight for left endpoint.
right_cut : (float, float)
Boundary and randomization weight for right endpoint.
-
MLE
(observed, initial=0, max_iter=20, tol=0.0001)[source]¶ Compute the maximum likelihood estimator based on observed sufficient statistic observed.
- Parameters
observed : float
Observed value of sufficient statistic
initial : float
Starting point for Newton-Raphson
max_iter : int (optional)
Maximum number of Newton-Raphson iterations
tol : float (optional)
Tolerance parameter for stopping, based on relative change in parameter estimate. Iteration stops when the change is smaller than tol * max(1, np.fabs(cur_estimate)).
- Returns
theta_hat : float
Maximum likelihood estimator.
std_err : float
Estimated variance of theta_hat based on inverse of variance of sufficient statistic at theta_hat, i.e. the observed Fisher information.
-
-
selectinf.distributions.discrete_family.
crit_func
(test_statistic, left_cut, right_cut)[source]¶ A generic critical function for an interval, with weights at the endpoints.
- ((test_statistic < CL) + (test_statistic > CR) +
gammaL * (test_statistic == CL) + gammaR * (test_statistic == CR))
where (CL, gammaL) = left_cut, (CR, gammaR) = right_cut.
- Parameters
test_statistic : np.float
Observed value of test statistic.
left_cut : (float, float)
(CL, gammaL): left endpoint and value at exactly the left endpoint (should be in [0,1]).
right_cut : (float, float)
(CR, gammaR): right endpoint and value at exactly the right endpoint (should be in [0,1]).
- Returns
decision : np.float