randomized.query¶

Module: `randomized.query`¶

Inheritance diagram for selectinf.randomized.query:

digraph inheritance1946ee62fe { rankdir=LR; size="8.0, 12.0"; "randomized.query.affine_gaussian_sampler" [URL="#selectinf.randomized.query.affine_gaussian_sampler",fontname="Vera Sans, DejaVu Sans, Liberation Sans, Arial, Helvetica, sans",fontsize=10,height=0.25,shape=box,style="setlinewidth(0.5)",target="_top",tooltip="Sample from an affine truncated Gaussian"]; "randomized.query.optimization_sampler" -> "randomized.query.affine_gaussian_sampler" [arrowsize=0.5,style="setlinewidth(0.5)"]; "randomized.query.gaussian_query" [URL="#selectinf.randomized.query.gaussian_query",fontname="Vera Sans, DejaVu Sans, Liberation Sans, Arial, Helvetica, sans",fontsize=10,height=0.25,shape=box,style="setlinewidth(0.5)",target="_top"]; "randomized.query.query" -> "randomized.query.gaussian_query" [arrowsize=0.5,style="setlinewidth(0.5)"]; "randomized.query.multiple_queries" [URL="#selectinf.randomized.query.multiple_queries",fontname="Vera Sans, DejaVu Sans, Liberation Sans, Arial, Helvetica, sans",fontsize=10,height=0.25,shape=box,style="setlinewidth(0.5)",target="_top",tooltip="Combine several queries of a given data"]; "randomized.query.optimization_intervals" [URL="#selectinf.randomized.query.optimization_intervals",fontname="Vera Sans, DejaVu Sans, Liberation Sans, Arial, Helvetica, sans",fontsize=10,height=0.25,shape=box,style="setlinewidth(0.5)",target="_top"]; "randomized.query.optimization_sampler" [URL="#selectinf.randomized.query.optimization_sampler",fontname="Vera Sans, DejaVu Sans, Liberation Sans, Arial, Helvetica, sans",fontsize=10,height=0.25,shape=box,style="setlinewidth(0.5)",target="_top"]; "randomized.query.query" [URL="#selectinf.randomized.query.query",fontname="Vera Sans, DejaVu Sans, Liberation Sans, Arial, Helvetica, sans",fontsize=10,height=0.25,shape=box,style="setlinewidth(0.5)",target="_top",tooltip="This class is the base of randomized selective inference"]; }

Classes¶

`affine_gaussian_sampler`¶

class selectinf.randomized.query.affine_gaussian_sampler(affine_con, initial_point, observed_score_state, log_cond_density, logdens_transform, selection_info=None, useC=False)[source]¶

Bases: selectinf.randomized.query.optimization_sampler

Sample from an affine truncated Gaussian

__init__(affine_con, initial_point, observed_score_state, log_cond_density, logdens_transform, selection_info=None, useC=False)[source]¶

Parameters

affine_con : selection.constraints.affine.constraints

Affine constraints

initial_pointndarray
Feasible point for affine constraints.

observed_score_statendarray
Observed score of convex loss (slightly modified). Essentially (asymptotically) equivalent to :math:`

abla ell(eta^*) +

Q(eta^*)eta^*` where \(eta^*\) is population minimizer. For linear regression, it is always \(-X^Ty\).

log_cond_densitycallable
Density of optimization variables given score

logdens_transformtuple
Description of how conditional mean of optimization variables depends on score.

selection_infooptional
Function of optimization variables that will be conditioned on.

useCbool, optional
Use python or C solver.

log_cond_density(opt_sample, target_sample, transform=None)[source]¶: Density of opt_sample | target_sample

sample(ndraw, burnin)[source]¶

Sample target from selective density using projected Langevin sampler with gradient map self.gradient and projection map self.projection.

Parameters

ndraw : int

How long a chain to return?

burnin : int

How many samples to discard?

selective_MLE(observed_target, target_cov, target_score_cov, init_soln, solve_args={'tol': 1e-12}, level=0.9)[source]¶

Selective MLE based on approximation of CGF.

Parameters

observed_target : ndarray

Observed estimate of target.

target_cov : ndarray

Estimated covaraince of target.

target_score_cov : ndarray

Estimated covariance of target and score of randomized query.

init_soln : ndarray

Feasible point for optimization problem.

level : float, optional

Confidence level.

solve_args : dict, optional

Arguments passed to solver.

reparam_map(parameter_target, observed_target, target_cov, target_score_cov, init_soln, solve_args={'tol': 1e-12}, useC=True)[source]¶

coefficient_pvalues(observed_target, target_cov, score_cov, parameter=None, sample_args=(), sample=None, normal_sample=None, alternatives=None)¶

Construct selective p-values for each parameter of the target.

Parameters

observed : np.float

A vector of parameters with shape self.shape, representing coordinates of the target.

parameter : np.float (optional)

A vector of parameters with shape self.shape at which to evaluate p-values. Defaults to np.zeros(self.shape).

sample_args : sequence

Arguments to self.sample if sample is None.

sample : np.array (optional)

If not None, assumed to be a sample of shape (-1,) + self.shape representing a sample of the target from parameters self.reference. Allows reuse of the same sample for construction of confidence intervals, hypothesis tests, etc.

alternatives : list of [‘greater’, ‘less’, ‘twosided’]

What alternative to use.

Returns

pvalues : np.float

confidence_intervals(observed_target, target_cov, score_cov, sample_args=(), sample=None, normal_sample=None, level=0.9, initial_guess=None)¶

Parameters

observed : np.float

A vector of parameters with shape self.shape, representing coordinates of the target.

sample_args : sequence

Arguments to self.sample if sample is None.

sample : np.array (optional)

If not None, assumed to be a sample of shape (-1,) + self.shape representing a sample of the target from parameters self.reference. Allows reuse of the same sample for construction of confidence intervals, hypothesis tests, etc.

level : float (optional)

Specify the confidence level.

initial_guess : np.float

Initial guesses at upper and lower limits, optional.

Returns

intervals : [(float, float)]

List of confidence intervals.

Notes

Construct selective confidence intervals for each parameter of the target.

hypothesis_test(test_stat, observed_value, target_cov, score_cov, sample_args=(), sample=None, parameter=0, alternative='twosided')¶

Sample target from selective density using sampler with gradient map self.gradient and projection map self.projection.

Parameters

test_stat : callable

Test statistic to evaluate on sample from selective distribution.

observed_value : float

Observed value of test statistic. Used in p-value calculation.

sample_args : sequence

Arguments to self.sample if sample is None.

sample : np.array (optional)

If not None, assumed to be a sample of shape (-1,) + self.shape representing a sample of the target from parameters. Allows reuse of the same sample for construction of confidence intervals, hypothesis tests, etc. If not None, ndraw, burnin, stepsize are ignored.

parameter : np.float (optional)

alternative : [‘greater’, ‘less’, ‘twosided’]

What alternative to use.

Returns

pvalue : float

`gaussian_query`¶

class selectinf.randomized.query.gaussian_query(randomization, perturb=None)[source]¶

Bases: selectinf.randomized.query.query

__init__(randomization, perturb=None)¶

Parameters

randomization : selection.randomized.randomization.randomization

Instance of a randomization scheme. Describes the law of \(\omega\).

perturb : ndarray, optional

Value of randomization vector, an instance of \(\omega\).

useC = True¶: A class with Gaussian perturbation to the objective – easy to apply CLT to such things

fit(perturb=None)[source]¶

get_sampler()¶

randomize(perturb=None)¶

The actual randomization step.

Parameters: perturb : ndarray, optional

Value of randomization vector, an instance of \(\omega\).

property sampler¶: Sampler of optimization (augmented) variables.

selective_MLE(observed_target, target_cov, target_score_cov, level=0.9, solve_args={'tol': 1e-12})¶

Parameters

observed_target : ndarray

Observed estimate of target.

target_cov : ndarray

Estimated covaraince of target.

target_score_cov : ndarray

Estimated covariance of target and score of randomized query.

level : float, optional

Confidence level.

solve_args : dict, optional

Arguments passed to solver.

set_sampler(sampler)¶

setup_sampler()¶

Setup query to prepare for sampling. Should set a few key attributes:

observed_score_state

observed_opt_state

opt_transform

solve()¶

summary(observed_target, target_cov, target_score_cov, alternatives, opt_sample=None, target_sample=None, parameter=None, level=0.9, ndraw=10000, burnin=2000, compute_intervals=False)¶

Produce p-values and confidence intervals for targets of model including selected features

Parameters

target : one of [‘selected’, ‘full’]

features : np.bool

Binary encoding of which features to use in final model and targets.

parameter : np.array

Hypothesized value for parameter – defaults to 0.

level : float

Confidence level.

ndraw : int (optional)

Defaults to 1000.

burnin : int (optional)

Defaults to 1000.

compute_intervals : bool

Compute confidence intervals?

dispersion : float (optional)

Use a known value for dispersion, or Pearson’s X^2?

`multiple_queries`¶

class selectinf.randomized.query.multiple_queries(objectives)[source]¶

Bases: object

Combine several queries of a given data through randomized algorithms.

__init__(objectives)[source]¶

Parameters: objectives : sequence

A sequences of randomized objective functions.
Returns: None

Notes

Each element of objectives must have a setup_sampler method that returns a description of the distribution of the data implicated in the objective function, typically through the score or gradient of the objective function. These descriptions are passed to a function form_covariances to linearly decompose each score in terms of a target and an asymptotically independent piece.

fit()[source]¶

summary(observed_target, opt_sampling_info, alternatives=None, parameter=None, level=0.9, ndraw=5000, burnin=2000, compute_intervals=False)[source]¶

Produce p-values and confidence intervals for targets of model including selected features

Parameters

observed_target : ndarray

Observed estimate of target.

alternatives : [str], optional

Sequence of strings describing the alternatives, should be values of [‘twosided’, ‘less’, ‘greater’]

parameter : np.array

Hypothesized value for parameter – defaults to 0.

level : float

Confidence level.

ndraw : int (optional)

Defaults to 1000.

burnin : int (optional)

Defaults to 1000.

compute_intervals : bool

Compute confidence intervals?

coefficient_pvalues(observed_target, parameter=None, sample_args=(), alternatives=None)[source]¶

Construct selective p-values for each parameter of the target.

Parameters

observed_target : ndarray

Observed estimate of target.

parameter : ndarray (optional)

A vector of parameters with shape self.shape at which to evaluate p-values. Defaults to np.zeros(self.shape).

sample_args : sequence

Arguments to self.sample if sample is not found for a given objective.

alternatives : [str], optional

Sequence of strings describing the alternatives, should be values of [‘twosided’, ‘less’, ‘greater’]

Returns

pvalues : ndarray

confidence_intervals(observed_target, sample_args=(), level=0.9)[source]¶

Construct selective confidence intervals for each parameter of the target.

Parameters

observed_target : ndarray

Observed estimate of target.

sample_args : sequence

Arguments to self.sample if sample is not found for a given objective.

level : float

Confidence level.

Returns

limits : ndarray

Confidence intervals for each target.

`optimization_intervals`¶

class selectinf.randomized.query.optimization_intervals(opt_sampling_info, observed, nsample, target_cov=None, normal_sample=None)[source]¶

Bases: object

__init__(opt_sampling_info, observed, nsample, target_cov=None, normal_sample=None)[source]¶: Initialize self. See help(type(self)) for accurate signature.

pivot(linear_func, candidate, alternative='twosided')[source]¶

alternative[‘greater’, ‘less’, ‘twosided’]: What alternative to use.

pvalue : np.float

confidence_interval(linear_func, level=0.9, how_many_sd=20, guess=None)[source]¶

`optimization_sampler`¶

class selectinf.randomized.query.optimization_sampler[source]¶

Bases: object

__init__()[source]¶: Initialize self. See help(type(self)) for accurate signature.

sample()[source]¶

log_cond_density(opt_sample, target_sample, transform=None)[source]¶: Density of opt_sample | target_sample

hypothesis_test(test_stat, observed_value, target_cov, score_cov, sample_args=(), sample=None, parameter=0, alternative='twosided')[source]¶

Sample target from selective density using sampler with gradient map self.gradient and projection map self.projection.

Parameters

test_stat : callable

Test statistic to evaluate on sample from selective distribution.

observed_value : float

Observed value of test statistic. Used in p-value calculation.

sample_args : sequence

Arguments to self.sample if sample is None.

sample : np.array (optional)

If not None, assumed to be a sample of shape (-1,) + self.shape representing a sample of the target from parameters. Allows reuse of the same sample for construction of confidence intervals, hypothesis tests, etc. If not None, ndraw, burnin, stepsize are ignored.

parameter : np.float (optional)

alternative : [‘greater’, ‘less’, ‘twosided’]

What alternative to use.

Returns

pvalue : float

confidence_intervals(observed_target, target_cov, score_cov, sample_args=(), sample=None, normal_sample=None, level=0.9, initial_guess=None)[source]¶

Parameters

observed : np.float

A vector of parameters with shape self.shape, representing coordinates of the target.

sample_args : sequence

Arguments to self.sample if sample is None.

sample : np.array (optional)

If not None, assumed to be a sample of shape (-1,) + self.shape representing a sample of the target from parameters self.reference. Allows reuse of the same sample for construction of confidence intervals, hypothesis tests, etc.

level : float (optional)

Specify the confidence level.

initial_guess : np.float

Initial guesses at upper and lower limits, optional.

Returns

intervals : [(float, float)]

List of confidence intervals.

Notes

Construct selective confidence intervals for each parameter of the target.

coefficient_pvalues(observed_target, target_cov, score_cov, parameter=None, sample_args=(), sample=None, normal_sample=None, alternatives=None)[source]¶

Construct selective p-values for each parameter of the target.

Parameters

observed : np.float

A vector of parameters with shape self.shape, representing coordinates of the target.

parameter : np.float (optional)

A vector of parameters with shape self.shape at which to evaluate p-values. Defaults to np.zeros(self.shape).

sample_args : sequence

Arguments to self.sample if sample is None.

sample : np.array (optional)

If not None, assumed to be a sample of shape (-1,) + self.shape representing a sample of the target from parameters self.reference. Allows reuse of the same sample for construction of confidence intervals, hypothesis tests, etc.

alternatives : list of [‘greater’, ‘less’, ‘twosided’]

What alternative to use.

Returns

pvalues : np.float

`query`¶

class selectinf.randomized.query.query(randomization, perturb=None)[source]¶

Bases: object

This class is the base of randomized selective inference based on convex programs.

The main mechanism is to take an initial penalized program

\[\text{minimize}_B \ell(B) + {\cal P}(B)\]

and add a randomization and small ridge term yielding

\[\text{minimize}_B \ell(B) + {\cal P}(B) - \langle \omega, B \rangle + \frac{\epsilon}{2} \|B\|^2_2\]

__init__(randomization, perturb=None)[source]¶

Parameters

randomization : selection.randomized.randomization.randomization

Instance of a randomization scheme. Describes the law of \(\omega\).

perturb : ndarray, optional

Value of randomization vector, an instance of \(\omega\).

randomize(perturb=None)[source]¶

The actual randomization step.

Parameters: perturb : ndarray, optional

Value of randomization vector, an instance of \(\omega\).

get_sampler()[source]¶

set_sampler(sampler)[source]¶

property sampler¶: Sampler of optimization (augmented) variables.

solve()[source]¶

setup_sampler()[source]¶

Setup query to prepare for sampling. Should set a few key attributes:

observed_score_state

observed_opt_state

opt_transform

summary(observed_target, target_cov, target_score_cov, alternatives, opt_sample=None, target_sample=None, parameter=None, level=0.9, ndraw=10000, burnin=2000, compute_intervals=False)[source]¶

Produce p-values and confidence intervals for targets of model including selected features

Parameters

target : one of [‘selected’, ‘full’]

features : np.bool

Binary encoding of which features to use in final model and targets.

parameter : np.array

Hypothesized value for parameter – defaults to 0.

level : float

Confidence level.

ndraw : int (optional)

Defaults to 1000.

burnin : int (optional)

Defaults to 1000.

compute_intervals : bool

Compute confidence intervals?

dispersion : float (optional)

Use a known value for dispersion, or Pearson’s X^2?

selective_MLE(observed_target, target_cov, target_score_cov, level=0.9, solve_args={'tol': 1e-12})[source]¶

Parameters

observed_target : ndarray

Observed estimate of target.

target_cov : ndarray

Estimated covaraince of target.

target_score_cov : ndarray

Estimated covariance of target and score of randomized query.

level : float, optional

Confidence level.

solve_args : dict, optional

Arguments passed to solver.

Functions¶

selectinf.randomized.query.naive_confidence_intervals(diag_cov, observed, level=0.9)[source]¶

Compute naive Gaussian based confidence intervals for target. Parameters ———-

diag_cov : diagonal of a covariance matrix

observednp.float: A vector of observed data of shape target.shape
alphafloat (optional): 1 - confidence level.

Returns: intervals : np.float

Gaussian based confidence intervals.

selectinf.randomized.query.naive_pvalues(diag_cov, observed, parameter)[source]¶

selectinf.randomized.query.normalizing_constant(target_parameter, observed_target, target_cov, target_score_cov, feasible_point, cond_mean, cond_cov, logdens_linear, linear_part, offset, useC=False)[source]¶

Approximation of normalizing constant in affine constrained Gaussian.

Parameters

observed_target : ndarray

Observed estimate of target.

target_cov : ndarray

Estimated covaraince of target.

target_score_cov : ndarray

Estimated covariance of target and score of randomized query.

init_soln : ndarray

Feasible point for optimization problem.

cond_mean : ndarray

Conditional mean of optimization variables given target.

cond_cov : ndarray

Conditional covariance of optimization variables given target.

logdens_linear : ndarray

Describes how conditional mean of optimization variables varies with target.

linear_part : ndarray

Linear part of affine constraints: \(\{o:Ao \leq b\}\)

offset : ndarray

Offset part of affine constraints: \(\{o:Ao \leq b\}\)

solve_args : dict, optional

Arguments passed to solver.

level : float, optional

Confidence level.

useC : bool, optional

Use python or C solver.

selectinf.randomized.query.selective_MLE(observed_target, target_cov, target_score_cov, init_soln, cond_mean, cond_cov, logdens_linear, linear_part, offset, solve_args={'tol': 1e-12}, level=0.9, useC=False)[source]¶

Selective MLE based on approximation of CGF.

Parameters

observed_target : ndarray

Observed estimate of target.

target_cov : ndarray

Estimated covaraince of target.

target_score_cov : ndarray

Estimated covariance of target and score of randomized query.

init_soln : ndarray

Feasible point for optimization problem.

cond_mean : ndarray

Conditional mean of optimization variables given target.

cond_cov : ndarray

Conditional covariance of optimization variables given target.

logdens_linear : ndarray

Describes how conditional mean of optimization variables varies with target.

linear_part : ndarray

Linear part of affine constraints: \(\{o:Ao \leq b\}\)

offset : ndarray

Offset part of affine constraints: \(\{o:Ao \leq b\}\)

solve_args : dict, optional

Arguments passed to solver.

level : float, optional

Confidence level.

useC : bool, optional

Use python or C solver.

randomized.query¶

Module: randomized.query¶

Classes¶

affine_gaussian_sampler¶

gaussian_query¶

multiple_queries¶

optimization_intervals¶

optimization_sampler¶

query¶

Functions¶

Module: `randomized.query`¶

`affine_gaussian_sampler`¶

`gaussian_query`¶

`multiple_queries`¶

`optimization_intervals`¶

`optimization_sampler`¶

`query`¶