distributions.discrete_family

Module: distributions.discrete_family

Inheritance diagram for selectinf.distributions.discrete_family:

digraph inheritance95e2fa9b07 { rankdir=LR; size="8.0, 12.0"; "distributions.discrete_family.discrete_family" [URL="#selectinf.distributions.discrete_family.discrete_family",fontname="Vera Sans, DejaVu Sans, Liberation Sans, Arial, Helvetica, sans",fontsize=10,height=0.25,shape=box,style="setlinewidth(0.5)",target="_top"]; }

This module contains a class for discrete 1-dimensional exponential families. The main uses for this class are exact (post-selection) hypothesis tests and confidence intervals.

discrete_family

class selectinf.distributions.discrete_family.discrete_family(sufficient_stat, weights, theta=0.0)[source]

Bases: object

__init__(sufficient_stat, weights, theta=0.0)[source]

A discrete 1-dimensional exponential family with reference measure \(\sum_j w_j \delta_{X_j}\) and sufficient statistic sufficient_stat. For any \(\theta\), the distribution is

\[P_{\theta} = \sum_{j} e^{\theta X_j - \Lambda(\theta)} w_j \delta_{X_j}\]

where

\[\Lambda(\theta) = \log \left(\sum_j w_j e^{\theta X_j} \right).\]
Parameters

sufficient_stat : np.float((n))

weights : np.float(n)

Notes

The weights are normalized to sum to 1.

property theta

The natural parameter of the family.

property partition

Partition function at self.theta:

\[\sum_j e^{\theta X_j} w_j\]
property sufficient_stat

Sufficient statistics of the exponential family.

property weights

Weights of the exponential family.

pdf(theta)[source]

Density of \(P_{\theta}\) with respect to \(P_0\).

Parameters

theta : float

Natural parameter.

Returns

pdf : np.float

cdf(theta, x=None, gamma=1)[source]

The cumulative distribution function of \(P_{\theta}\) with weight gamma at x

\[P_{\theta}(X < x) + \gamma * P_{\theta}(X = x)\]
Parameters

theta : float

Natural parameter.

x : float (optional)

Where to evaluate CDF.

gamma : float(optional)

Weight given at x.

Returns

cdf : np.float

ccdf(theta, x=None, gamma=0, return_unnorm=False)[source]

The complementary cumulative distribution function (i.e. survival function) of \(P_{\theta}\) with weight gamma at x

\[P_{\theta}(X > x) + \gamma * P_{\theta}(X = x)\]
Parameters

theta : float

Natural parameter.

x : float (optional)

Where to evaluate CCDF.

gamma : float(optional)

Weight given at x.

Returns

ccdf : np.float

E(theta, func)[source]

Expectation of func under \(P_{\theta}\)

Parameters

theta : float

Natural parameter.

func : callable

Assumed to be vectorized.

gamma : float(optional)

Weight given at x.

Returns

E : np.float

Var(theta, func)[source]

Variance of func under \(P_{\theta}\)

Parameters

theta : float

Natural parameter.

func : callable

Assumed to be vectorized.

Returns

var : np.float

Cov(theta, func1, func2)[source]

Covariance of func1 and func2 under \(P_{\theta}\)

Parameters

theta : float

Natural parameter.

func1, func2 : callable

Assumed to be vectorized.

Returns

cov : np.float

two_sided_acceptance(theta, alpha=0.05, tol=1e-06)[source]

Compute cutoffs of UMPU two-sided test.

Parameters

theta : float

Natural parameter.

alpha : float (optional)

Size of two-sided test.

tol : float

Tolerance for root-finding.

Returns

left_cut : (float, float)

Boundary and randomization weight for left endpoint.

right_cut : (float, float)

Boundary and randomization weight for right endpoint.

two_sided_test(theta0, observed, alpha=0.05, randomize=True, auxVar=None)[source]

Perform UMPU two-sided test.

Parameters

theta0 : float

Natural parameter under null hypothesis.

observed : float

Observed sufficient statistic.

alpha : float (optional)

Size of two-sided test.

randomize : bool

Perform the randomized test (or conservative test).

auxVar : [None, float]

If randomizing and not None, use this as the random uniform variate.

Returns

decision : np.bool

Is the null hypothesis \(H_0:\theta=\theta_0\) rejected?

Notes

We need an auxiliary uniform variable to carry out the randomized test. Larger auxVar corresponds to x being slightly “larger.” It can be passed in, or chosen at random. If randomize=False, we get a conservative test.

one_sided_test(theta0, observed, alternative='greater', alpha=0.05, randomize=True, auxVar=None)[source]

Perform UMPU one-sided test.

Parameters

theta0 : float

Natural parameter under null hypothesis.

observed : float

Observed sufficient statistic.

alternative : str

One of [‘greater’, ‘less’]

alpha : float (optional)

Size of two-sided test.

randomize : bool

Perform the randomized test (or conservative test).

auxVar : [None, float]

If randomizing and not None, use this as the random uniform variate.

Returns

decision : np.bool

Is the null hypothesis \(H_0:\theta=\theta_0\) rejected?

Notes

We need an auxiliary uniform variable to carry out the randomized test. Larger auxVar corresponds to x being slightly “larger.” It can be passed in, or chosen at random. If randomize=False, we get a conservative test.

interval(observed, alpha=0.05, randomize=True, auxVar=None, tol=1e-06)[source]

Form UMAU confidence interval.

Parameters

observed : float

Observed sufficient statistic.

alpha : float (optional)

Size of two-sided test.

randomize : bool

Perform the randomized test (or conservative test).

auxVar : [None, float]

If randomizing and not None, use this as the random uniform variate.

Returns

lower, upper : float

Limits of confidence interval.

equal_tailed_interval(observed, alpha=0.05, randomize=True, auxVar=None, tol=1e-06)[source]

Form interval by inverting equal-tailed test with \(lpha/2\) in each tail.

Parameters

observed : float

Observed sufficient statistic.

alpha : float (optional)

Size of two-sided test.

randomize : bool

Perform the randomized test (or conservative test).

auxVar : [None, float]

If randomizing and not None, use this as the random uniform variate.

Returns

lower, upper : float

Limits of confidence interval.

equal_tailed_test(theta0, observed, alpha=0.05)[source]

Perform UMPU two-sided test.

Parameters

theta0 : float

Natural parameter under null hypothesis.

observed : float

Observed sufficient statistic.

alpha : float (optional)

Size of two-sided test.

randomize : bool

Perform the randomized test (or conservative test).

auxVar : [None, float]

If randomizing and not None, use this as the random uniform variate.

Returns

decision : np.bool

Is the null hypothesis \(H_0:\theta=\theta_0\) rejected?

Notes

We need an auxiliary uniform variable to carry out the randomized test. Larger auxVar corresponds to x being slightly “larger.” It can be passed in, or chosen at random. If randomize=False, we get a conservative test.

one_sided_acceptance(theta, alpha=0.05, alternative='greater', tol=1e-06)[source]

Compute the acceptance region cutoffs of UMPU one-sided test.

TODO: Include randomization?

Parameters

theta : float

Natural parameter.

alpha : float (optional)

Size of two-sided test.

alternative : str

One of [‘greater’, ‘less’].

tol : float

Tolerance for root-finding.

Returns

left_cut : (float, float)

Boundary and randomization weight for left endpoint.

right_cut : (float, float)

Boundary and randomization weight for right endpoint.

equal_tailed_acceptance(theta0, alpha=0.05)[source]

Compute the acceptance region cutoffs of equal-tailed test (without randomization). Therefore, size may not be exactly \(\alpha\).

Parameters

theta0 : float

Natural parameter under null hypothesis.

alpha : float (optional)

Size of two-sided test.

Returns

left_cut : (float, float)

Boundary and randomization weight for left endpoint.

right_cut : (float, float)

Boundary and randomization weight for right endpoint.

MLE(observed, initial=0, max_iter=20, tol=0.0001)[source]

Compute the maximum likelihood estimator based on observed sufficient statistic observed.

Parameters

observed : float

Observed value of sufficient statistic

initial : float

Starting point for Newton-Raphson

max_iter : int (optional)

Maximum number of Newton-Raphson iterations

tol : float (optional)

Tolerance parameter for stopping, based on relative change in parameter estimate. Iteration stops when the change is smaller than tol * max(1, np.fabs(cur_estimate)).

Returns

theta_hat : float

Maximum likelihood estimator.

std_err : float

Estimated variance of theta_hat based on inverse of variance of sufficient statistic at theta_hat, i.e. the observed Fisher information.

selectinf.distributions.discrete_family.crit_func(test_statistic, left_cut, right_cut)[source]

A generic critical function for an interval, with weights at the endpoints.

((test_statistic < CL) + (test_statistic > CR) +

gammaL * (test_statistic == CL) + gammaR * (test_statistic == CR))

where (CL, gammaL) = left_cut, (CR, gammaR) = right_cut.

Parameters

test_statistic : np.float

Observed value of test statistic.

left_cut : (float, float)

(CL, gammaL): left endpoint and value at exactly the left endpoint (should be in [0,1]).

right_cut : (float, float)

(CR, gammaR): right endpoint and value at exactly the right endpoint (should be in [0,1]).

Returns

decision : np.float