randomized.query

Module: randomized.query

Inheritance diagram for selectinf.randomized.query:

digraph inheritance1946ee62fe { rankdir=LR; size="8.0, 12.0"; "randomized.query.affine_gaussian_sampler" [URL="#selectinf.randomized.query.affine_gaussian_sampler",fontname="Vera Sans, DejaVu Sans, Liberation Sans, Arial, Helvetica, sans",fontsize=10,height=0.25,shape=box,style="setlinewidth(0.5)",target="_top",tooltip="Sample from an affine truncated Gaussian"]; "randomized.query.optimization_sampler" -> "randomized.query.affine_gaussian_sampler" [arrowsize=0.5,style="setlinewidth(0.5)"]; "randomized.query.gaussian_query" [URL="#selectinf.randomized.query.gaussian_query",fontname="Vera Sans, DejaVu Sans, Liberation Sans, Arial, Helvetica, sans",fontsize=10,height=0.25,shape=box,style="setlinewidth(0.5)",target="_top"]; "randomized.query.query" -> "randomized.query.gaussian_query" [arrowsize=0.5,style="setlinewidth(0.5)"]; "randomized.query.multiple_queries" [URL="#selectinf.randomized.query.multiple_queries",fontname="Vera Sans, DejaVu Sans, Liberation Sans, Arial, Helvetica, sans",fontsize=10,height=0.25,shape=box,style="setlinewidth(0.5)",target="_top",tooltip="Combine several queries of a given data"]; "randomized.query.optimization_intervals" [URL="#selectinf.randomized.query.optimization_intervals",fontname="Vera Sans, DejaVu Sans, Liberation Sans, Arial, Helvetica, sans",fontsize=10,height=0.25,shape=box,style="setlinewidth(0.5)",target="_top"]; "randomized.query.optimization_sampler" [URL="#selectinf.randomized.query.optimization_sampler",fontname="Vera Sans, DejaVu Sans, Liberation Sans, Arial, Helvetica, sans",fontsize=10,height=0.25,shape=box,style="setlinewidth(0.5)",target="_top"]; "randomized.query.query" [URL="#selectinf.randomized.query.query",fontname="Vera Sans, DejaVu Sans, Liberation Sans, Arial, Helvetica, sans",fontsize=10,height=0.25,shape=box,style="setlinewidth(0.5)",target="_top",tooltip="This class is the base of randomized selective inference"]; }

Classes

affine_gaussian_sampler

class selectinf.randomized.query.affine_gaussian_sampler(affine_con, initial_point, observed_score_state, log_cond_density, logdens_transform, selection_info=None, useC=False)[source]

Bases: selectinf.randomized.query.optimization_sampler

Sample from an affine truncated Gaussian

__init__(affine_con, initial_point, observed_score_state, log_cond_density, logdens_transform, selection_info=None, useC=False)[source]
Parameters

affine_con : selection.constraints.affine.constraints

Affine constraints

initial_pointndarray

Feasible point for affine constraints.

observed_score_statendarray

Observed score of convex loss (slightly modified). Essentially (asymptotically) equivalent to :math:`

abla ell(eta^*) +

Q(eta^*)eta^*` where \(eta^*\) is population minimizer. For linear regression, it is always \(-X^Ty\).

log_cond_densitycallable

Density of optimization variables given score

logdens_transformtuple

Description of how conditional mean of optimization variables depends on score.

selection_infooptional

Function of optimization variables that will be conditioned on.

useCbool, optional

Use python or C solver.

log_cond_density(opt_sample, target_sample, transform=None)[source]

Density of opt_sample | target_sample

sample(ndraw, burnin)[source]

Sample target from selective density using projected Langevin sampler with gradient map self.gradient and projection map self.projection.

Parameters

ndraw : int

How long a chain to return?

burnin : int

How many samples to discard?

selective_MLE(observed_target, target_cov, target_score_cov, init_soln, solve_args={'tol': 1e-12}, level=0.9)[source]

Selective MLE based on approximation of CGF.

Parameters

observed_target : ndarray

Observed estimate of target.

target_cov : ndarray

Estimated covaraince of target.

target_score_cov : ndarray

Estimated covariance of target and score of randomized query.

init_soln : ndarray

Feasible point for optimization problem.

level : float, optional

Confidence level.

solve_args : dict, optional

Arguments passed to solver.

reparam_map(parameter_target, observed_target, target_cov, target_score_cov, init_soln, solve_args={'tol': 1e-12}, useC=True)[source]
coefficient_pvalues(observed_target, target_cov, score_cov, parameter=None, sample_args=(), sample=None, normal_sample=None, alternatives=None)

Construct selective p-values for each parameter of the target.

Parameters

observed : np.float

A vector of parameters with shape self.shape, representing coordinates of the target.

parameter : np.float (optional)

A vector of parameters with shape self.shape at which to evaluate p-values. Defaults to np.zeros(self.shape).

sample_args : sequence

Arguments to self.sample if sample is None.

sample : np.array (optional)

If not None, assumed to be a sample of shape (-1,) + self.shape representing a sample of the target from parameters self.reference. Allows reuse of the same sample for construction of confidence intervals, hypothesis tests, etc.

alternatives : list of [‘greater’, ‘less’, ‘twosided’]

What alternative to use.

Returns

pvalues : np.float

confidence_intervals(observed_target, target_cov, score_cov, sample_args=(), sample=None, normal_sample=None, level=0.9, initial_guess=None)
Parameters

observed : np.float

A vector of parameters with shape self.shape, representing coordinates of the target.

sample_args : sequence

Arguments to self.sample if sample is None.

sample : np.array (optional)

If not None, assumed to be a sample of shape (-1,) + self.shape representing a sample of the target from parameters self.reference. Allows reuse of the same sample for construction of confidence intervals, hypothesis tests, etc.

level : float (optional)

Specify the confidence level.

initial_guess : np.float

Initial guesses at upper and lower limits, optional.

Returns

intervals : [(float, float)]

List of confidence intervals.

Notes

Construct selective confidence intervals for each parameter of the target.

hypothesis_test(test_stat, observed_value, target_cov, score_cov, sample_args=(), sample=None, parameter=0, alternative='twosided')

Sample target from selective density using sampler with gradient map self.gradient and projection map self.projection.

Parameters

test_stat : callable

Test statistic to evaluate on sample from selective distribution.

observed_value : float

Observed value of test statistic. Used in p-value calculation.

sample_args : sequence

Arguments to self.sample if sample is None.

sample : np.array (optional)

If not None, assumed to be a sample of shape (-1,) + self.shape representing a sample of the target from parameters. Allows reuse of the same sample for construction of confidence intervals, hypothesis tests, etc. If not None, ndraw, burnin, stepsize are ignored.

parameter : np.float (optional)

alternative : [‘greater’, ‘less’, ‘twosided’]

What alternative to use.

Returns

pvalue : float

gaussian_query

class selectinf.randomized.query.gaussian_query(randomization, perturb=None)[source]

Bases: selectinf.randomized.query.query

__init__(randomization, perturb=None)
Parameters

randomization : selection.randomized.randomization.randomization

Instance of a randomization scheme. Describes the law of \(\omega\).

perturb : ndarray, optional

Value of randomization vector, an instance of \(\omega\).

useC = True

A class with Gaussian perturbation to the objective – easy to apply CLT to such things

fit(perturb=None)[source]
get_sampler()
randomize(perturb=None)

The actual randomization step.

Parameters

perturb : ndarray, optional

Value of randomization vector, an instance of \(\omega\).

property sampler

Sampler of optimization (augmented) variables.

selective_MLE(observed_target, target_cov, target_score_cov, level=0.9, solve_args={'tol': 1e-12})
Parameters

observed_target : ndarray

Observed estimate of target.

target_cov : ndarray

Estimated covaraince of target.

target_score_cov : ndarray

Estimated covariance of target and score of randomized query.

level : float, optional

Confidence level.

solve_args : dict, optional

Arguments passed to solver.

set_sampler(sampler)
setup_sampler()

Setup query to prepare for sampling. Should set a few key attributes:

  • observed_score_state

  • observed_opt_state

  • opt_transform

solve()
summary(observed_target, target_cov, target_score_cov, alternatives, opt_sample=None, target_sample=None, parameter=None, level=0.9, ndraw=10000, burnin=2000, compute_intervals=False)

Produce p-values and confidence intervals for targets of model including selected features

Parameters

target : one of [‘selected’, ‘full’]

features : np.bool

Binary encoding of which features to use in final model and targets.

parameter : np.array

Hypothesized value for parameter – defaults to 0.

level : float

Confidence level.

ndraw : int (optional)

Defaults to 1000.

burnin : int (optional)

Defaults to 1000.

compute_intervals : bool

Compute confidence intervals?

dispersion : float (optional)

Use a known value for dispersion, or Pearson’s X^2?

multiple_queries

class selectinf.randomized.query.multiple_queries(objectives)[source]

Bases: object

Combine several queries of a given data through randomized algorithms.

__init__(objectives)[source]
Parameters

objectives : sequence

A sequences of randomized objective functions.

Returns

None

Notes

Each element of objectives must have a setup_sampler method that returns a description of the distribution of the data implicated in the objective function, typically through the score or gradient of the objective function. These descriptions are passed to a function form_covariances to linearly decompose each score in terms of a target and an asymptotically independent piece.

fit()[source]
summary(observed_target, opt_sampling_info, alternatives=None, parameter=None, level=0.9, ndraw=5000, burnin=2000, compute_intervals=False)[source]

Produce p-values and confidence intervals for targets of model including selected features

Parameters

observed_target : ndarray

Observed estimate of target.

alternatives : [str], optional

Sequence of strings describing the alternatives, should be values of [‘twosided’, ‘less’, ‘greater’]

parameter : np.array

Hypothesized value for parameter – defaults to 0.

level : float

Confidence level.

ndraw : int (optional)

Defaults to 1000.

burnin : int (optional)

Defaults to 1000.

compute_intervals : bool

Compute confidence intervals?

coefficient_pvalues(observed_target, parameter=None, sample_args=(), alternatives=None)[source]

Construct selective p-values for each parameter of the target.

Parameters

observed_target : ndarray

Observed estimate of target.

parameter : ndarray (optional)

A vector of parameters with shape self.shape at which to evaluate p-values. Defaults to np.zeros(self.shape).

sample_args : sequence

Arguments to self.sample if sample is not found for a given objective.

alternatives : [str], optional

Sequence of strings describing the alternatives, should be values of [‘twosided’, ‘less’, ‘greater’]

Returns

pvalues : ndarray

confidence_intervals(observed_target, sample_args=(), level=0.9)[source]

Construct selective confidence intervals for each parameter of the target.

Parameters

observed_target : ndarray

Observed estimate of target.

sample_args : sequence

Arguments to self.sample if sample is not found for a given objective.

level : float

Confidence level.

Returns

limits : ndarray

Confidence intervals for each target.

optimization_intervals

class selectinf.randomized.query.optimization_intervals(opt_sampling_info, observed, nsample, target_cov=None, normal_sample=None)[source]

Bases: object

__init__(opt_sampling_info, observed, nsample, target_cov=None, normal_sample=None)[source]

Initialize self. See help(type(self)) for accurate signature.

pivot(linear_func, candidate, alternative='twosided')[source]
alternative[‘greater’, ‘less’, ‘twosided’]

What alternative to use.

pvalue : np.float

confidence_interval(linear_func, level=0.9, how_many_sd=20, guess=None)[source]

optimization_sampler

class selectinf.randomized.query.optimization_sampler[source]

Bases: object

__init__()[source]

Initialize self. See help(type(self)) for accurate signature.

sample()[source]
log_cond_density(opt_sample, target_sample, transform=None)[source]

Density of opt_sample | target_sample

hypothesis_test(test_stat, observed_value, target_cov, score_cov, sample_args=(), sample=None, parameter=0, alternative='twosided')[source]

Sample target from selective density using sampler with gradient map self.gradient and projection map self.projection.

Parameters

test_stat : callable

Test statistic to evaluate on sample from selective distribution.

observed_value : float

Observed value of test statistic. Used in p-value calculation.

sample_args : sequence

Arguments to self.sample if sample is None.

sample : np.array (optional)

If not None, assumed to be a sample of shape (-1,) + self.shape representing a sample of the target from parameters. Allows reuse of the same sample for construction of confidence intervals, hypothesis tests, etc. If not None, ndraw, burnin, stepsize are ignored.

parameter : np.float (optional)

alternative : [‘greater’, ‘less’, ‘twosided’]

What alternative to use.

Returns

pvalue : float

confidence_intervals(observed_target, target_cov, score_cov, sample_args=(), sample=None, normal_sample=None, level=0.9, initial_guess=None)[source]
Parameters

observed : np.float

A vector of parameters with shape self.shape, representing coordinates of the target.

sample_args : sequence

Arguments to self.sample if sample is None.

sample : np.array (optional)

If not None, assumed to be a sample of shape (-1,) + self.shape representing a sample of the target from parameters self.reference. Allows reuse of the same sample for construction of confidence intervals, hypothesis tests, etc.

level : float (optional)

Specify the confidence level.

initial_guess : np.float

Initial guesses at upper and lower limits, optional.

Returns

intervals : [(float, float)]

List of confidence intervals.

Notes

Construct selective confidence intervals for each parameter of the target.

coefficient_pvalues(observed_target, target_cov, score_cov, parameter=None, sample_args=(), sample=None, normal_sample=None, alternatives=None)[source]

Construct selective p-values for each parameter of the target.

Parameters

observed : np.float

A vector of parameters with shape self.shape, representing coordinates of the target.

parameter : np.float (optional)

A vector of parameters with shape self.shape at which to evaluate p-values. Defaults to np.zeros(self.shape).

sample_args : sequence

Arguments to self.sample if sample is None.

sample : np.array (optional)

If not None, assumed to be a sample of shape (-1,) + self.shape representing a sample of the target from parameters self.reference. Allows reuse of the same sample for construction of confidence intervals, hypothesis tests, etc.

alternatives : list of [‘greater’, ‘less’, ‘twosided’]

What alternative to use.

Returns

pvalues : np.float

query

class selectinf.randomized.query.query(randomization, perturb=None)[source]

Bases: object

This class is the base of randomized selective inference based on convex programs.

The main mechanism is to take an initial penalized program

\[\text{minimize}_B \ell(B) + {\cal P}(B)\]

and add a randomization and small ridge term yielding

\[\text{minimize}_B \ell(B) + {\cal P}(B) - \langle \omega, B \rangle + \frac{\epsilon}{2} \|B\|^2_2\]
__init__(randomization, perturb=None)[source]
Parameters

randomization : selection.randomized.randomization.randomization

Instance of a randomization scheme. Describes the law of \(\omega\).

perturb : ndarray, optional

Value of randomization vector, an instance of \(\omega\).

randomize(perturb=None)[source]

The actual randomization step.

Parameters

perturb : ndarray, optional

Value of randomization vector, an instance of \(\omega\).

get_sampler()[source]
set_sampler(sampler)[source]
property sampler

Sampler of optimization (augmented) variables.

solve()[source]
setup_sampler()[source]

Setup query to prepare for sampling. Should set a few key attributes:

  • observed_score_state

  • observed_opt_state

  • opt_transform

summary(observed_target, target_cov, target_score_cov, alternatives, opt_sample=None, target_sample=None, parameter=None, level=0.9, ndraw=10000, burnin=2000, compute_intervals=False)[source]

Produce p-values and confidence intervals for targets of model including selected features

Parameters

target : one of [‘selected’, ‘full’]

features : np.bool

Binary encoding of which features to use in final model and targets.

parameter : np.array

Hypothesized value for parameter – defaults to 0.

level : float

Confidence level.

ndraw : int (optional)

Defaults to 1000.

burnin : int (optional)

Defaults to 1000.

compute_intervals : bool

Compute confidence intervals?

dispersion : float (optional)

Use a known value for dispersion, or Pearson’s X^2?

selective_MLE(observed_target, target_cov, target_score_cov, level=0.9, solve_args={'tol': 1e-12})[source]
Parameters

observed_target : ndarray

Observed estimate of target.

target_cov : ndarray

Estimated covaraince of target.

target_score_cov : ndarray

Estimated covariance of target and score of randomized query.

level : float, optional

Confidence level.

solve_args : dict, optional

Arguments passed to solver.

Functions

selectinf.randomized.query.naive_confidence_intervals(diag_cov, observed, level=0.9)[source]

Compute naive Gaussian based confidence intervals for target. Parameters ———-

diag_cov : diagonal of a covariance matrix

observednp.float

A vector of observed data of shape target.shape

alphafloat (optional)

1 - confidence level.

Returns

intervals : np.float

Gaussian based confidence intervals.

selectinf.randomized.query.naive_pvalues(diag_cov, observed, parameter)[source]
selectinf.randomized.query.normalizing_constant(target_parameter, observed_target, target_cov, target_score_cov, feasible_point, cond_mean, cond_cov, logdens_linear, linear_part, offset, useC=False)[source]

Approximation of normalizing constant in affine constrained Gaussian.

Parameters

observed_target : ndarray

Observed estimate of target.

target_cov : ndarray

Estimated covaraince of target.

target_score_cov : ndarray

Estimated covariance of target and score of randomized query.

init_soln : ndarray

Feasible point for optimization problem.

cond_mean : ndarray

Conditional mean of optimization variables given target.

cond_cov : ndarray

Conditional covariance of optimization variables given target.

logdens_linear : ndarray

Describes how conditional mean of optimization variables varies with target.

linear_part : ndarray

Linear part of affine constraints: \(\{o:Ao \leq b\}\)

offset : ndarray

Offset part of affine constraints: \(\{o:Ao \leq b\}\)

solve_args : dict, optional

Arguments passed to solver.

level : float, optional

Confidence level.

useC : bool, optional

Use python or C solver.

selectinf.randomized.query.selective_MLE(observed_target, target_cov, target_score_cov, init_soln, cond_mean, cond_cov, logdens_linear, linear_part, offset, solve_args={'tol': 1e-12}, level=0.9, useC=False)[source]

Selective MLE based on approximation of CGF.

Parameters

observed_target : ndarray

Observed estimate of target.

target_cov : ndarray

Estimated covaraince of target.

target_score_cov : ndarray

Estimated covariance of target and score of randomized query.

init_soln : ndarray

Feasible point for optimization problem.

cond_mean : ndarray

Conditional mean of optimization variables given target.

cond_cov : ndarray

Conditional covariance of optimization variables given target.

logdens_linear : ndarray

Describes how conditional mean of optimization variables varies with target.

linear_part : ndarray

Linear part of affine constraints: \(\{o:Ao \leq b\}\)

offset : ndarray

Offset part of affine constraints: \(\{o:Ao \leq b\}\)

solve_args : dict, optional

Arguments passed to solver.

level : float, optional

Confidence level.

useC : bool, optional

Use python or C solver.