randomized.query¶
Module: randomized.query
¶
Inheritance diagram for selectinf.randomized.query
:
Classes¶
affine_gaussian_sampler
¶
-
class
selectinf.randomized.query.
affine_gaussian_sampler
(affine_con, initial_point, observed_score_state, log_cond_density, logdens_transform, selection_info=None, useC=False)[source]¶ Bases:
selectinf.randomized.query.optimization_sampler
Sample from an affine truncated Gaussian
-
__init__
(affine_con, initial_point, observed_score_state, log_cond_density, logdens_transform, selection_info=None, useC=False)[source]¶ - Parameters
affine_con : selection.constraints.affine.constraints
Affine constraints
- initial_pointndarray
Feasible point for affine constraints.
- observed_score_statendarray
Observed score of convex loss (slightly modified). Essentially (asymptotically) equivalent to :math:`
abla ell(eta^*) +
Q(eta^*)eta^*` where \(eta^*\) is population minimizer. For linear regression, it is always \(-X^Ty\).
- log_cond_densitycallable
Density of optimization variables given score
- logdens_transformtuple
Description of how conditional mean of optimization variables depends on score.
- selection_infooptional
Function of optimization variables that will be conditioned on.
- useCbool, optional
Use python or C solver.
-
log_cond_density
(opt_sample, target_sample, transform=None)[source]¶ Density of opt_sample | target_sample
-
sample
(ndraw, burnin)[source]¶ Sample target from selective density using projected Langevin sampler with gradient map self.gradient and projection map self.projection.
- Parameters
ndraw : int
How long a chain to return?
burnin : int
How many samples to discard?
-
selective_MLE
(observed_target, target_cov, target_score_cov, init_soln, solve_args={'tol': 1e-12}, level=0.9)[source]¶ Selective MLE based on approximation of CGF.
- Parameters
observed_target : ndarray
Observed estimate of target.
target_cov : ndarray
Estimated covaraince of target.
target_score_cov : ndarray
Estimated covariance of target and score of randomized query.
init_soln : ndarray
Feasible point for optimization problem.
level : float, optional
Confidence level.
solve_args : dict, optional
Arguments passed to solver.
-
reparam_map
(parameter_target, observed_target, target_cov, target_score_cov, init_soln, solve_args={'tol': 1e-12}, useC=True)[source]¶
-
coefficient_pvalues
(observed_target, target_cov, score_cov, parameter=None, sample_args=(), sample=None, normal_sample=None, alternatives=None)¶ Construct selective p-values for each parameter of the target.
- Parameters
observed : np.float
A vector of parameters with shape self.shape, representing coordinates of the target.
parameter : np.float (optional)
A vector of parameters with shape self.shape at which to evaluate p-values. Defaults to np.zeros(self.shape).
sample_args : sequence
Arguments to self.sample if sample is None.
sample : np.array (optional)
If not None, assumed to be a sample of shape (-1,) + self.shape representing a sample of the target from parameters self.reference. Allows reuse of the same sample for construction of confidence intervals, hypothesis tests, etc.
alternatives : list of [‘greater’, ‘less’, ‘twosided’]
What alternative to use.
- Returns
pvalues : np.float
-
confidence_intervals
(observed_target, target_cov, score_cov, sample_args=(), sample=None, normal_sample=None, level=0.9, initial_guess=None)¶ - Parameters
observed : np.float
A vector of parameters with shape self.shape, representing coordinates of the target.
sample_args : sequence
Arguments to self.sample if sample is None.
sample : np.array (optional)
If not None, assumed to be a sample of shape (-1,) + self.shape representing a sample of the target from parameters self.reference. Allows reuse of the same sample for construction of confidence intervals, hypothesis tests, etc.
level : float (optional)
Specify the confidence level.
initial_guess : np.float
Initial guesses at upper and lower limits, optional.
- Returns
intervals : [(float, float)]
List of confidence intervals.
Notes
Construct selective confidence intervals for each parameter of the target.
-
hypothesis_test
(test_stat, observed_value, target_cov, score_cov, sample_args=(), sample=None, parameter=0, alternative='twosided')¶ Sample target from selective density using sampler with gradient map self.gradient and projection map self.projection.
- Parameters
test_stat : callable
Test statistic to evaluate on sample from selective distribution.
observed_value : float
Observed value of test statistic. Used in p-value calculation.
sample_args : sequence
Arguments to self.sample if sample is None.
sample : np.array (optional)
If not None, assumed to be a sample of shape (-1,) + self.shape representing a sample of the target from parameters. Allows reuse of the same sample for construction of confidence intervals, hypothesis tests, etc. If not None, ndraw, burnin, stepsize are ignored.
parameter : np.float (optional)
alternative : [‘greater’, ‘less’, ‘twosided’]
What alternative to use.
- Returns
pvalue : float
-
gaussian_query
¶
-
class
selectinf.randomized.query.
gaussian_query
(randomization, perturb=None)[source]¶ Bases:
selectinf.randomized.query.query
-
__init__
(randomization, perturb=None)¶ - Parameters
randomization : selection.randomized.randomization.randomization
Instance of a randomization scheme. Describes the law of \(\omega\).
perturb : ndarray, optional
Value of randomization vector, an instance of \(\omega\).
-
useC
= True¶ A class with Gaussian perturbation to the objective – easy to apply CLT to such things
-
get_sampler
()¶
-
randomize
(perturb=None)¶ The actual randomization step.
- Parameters
perturb : ndarray, optional
Value of randomization vector, an instance of \(\omega\).
-
property
sampler
¶ Sampler of optimization (augmented) variables.
-
selective_MLE
(observed_target, target_cov, target_score_cov, level=0.9, solve_args={'tol': 1e-12})¶ - Parameters
observed_target : ndarray
Observed estimate of target.
target_cov : ndarray
Estimated covaraince of target.
target_score_cov : ndarray
Estimated covariance of target and score of randomized query.
level : float, optional
Confidence level.
solve_args : dict, optional
Arguments passed to solver.
-
set_sampler
(sampler)¶
-
setup_sampler
()¶ Setup query to prepare for sampling. Should set a few key attributes:
observed_score_state
observed_opt_state
opt_transform
-
solve
()¶
-
summary
(observed_target, target_cov, target_score_cov, alternatives, opt_sample=None, target_sample=None, parameter=None, level=0.9, ndraw=10000, burnin=2000, compute_intervals=False)¶ Produce p-values and confidence intervals for targets of model including selected features
- Parameters
target : one of [‘selected’, ‘full’]
features : np.bool
Binary encoding of which features to use in final model and targets.
parameter : np.array
Hypothesized value for parameter – defaults to 0.
level : float
Confidence level.
ndraw : int (optional)
Defaults to 1000.
burnin : int (optional)
Defaults to 1000.
compute_intervals : bool
Compute confidence intervals?
dispersion : float (optional)
Use a known value for dispersion, or Pearson’s X^2?
-
multiple_queries
¶
-
class
selectinf.randomized.query.
multiple_queries
(objectives)[source]¶ Bases:
object
Combine several queries of a given data through randomized algorithms.
-
__init__
(objectives)[source]¶ - Parameters
objectives : sequence
A sequences of randomized objective functions.
- Returns
None
Notes
Each element of objectives must have a setup_sampler method that returns a description of the distribution of the data implicated in the objective function, typically through the score or gradient of the objective function. These descriptions are passed to a function form_covariances to linearly decompose each score in terms of a target and an asymptotically independent piece.
-
summary
(observed_target, opt_sampling_info, alternatives=None, parameter=None, level=0.9, ndraw=5000, burnin=2000, compute_intervals=False)[source]¶ Produce p-values and confidence intervals for targets of model including selected features
- Parameters
observed_target : ndarray
Observed estimate of target.
alternatives : [str], optional
Sequence of strings describing the alternatives, should be values of [‘twosided’, ‘less’, ‘greater’]
parameter : np.array
Hypothesized value for parameter – defaults to 0.
level : float
Confidence level.
ndraw : int (optional)
Defaults to 1000.
burnin : int (optional)
Defaults to 1000.
compute_intervals : bool
Compute confidence intervals?
-
coefficient_pvalues
(observed_target, parameter=None, sample_args=(), alternatives=None)[source]¶ Construct selective p-values for each parameter of the target.
- Parameters
observed_target : ndarray
Observed estimate of target.
parameter : ndarray (optional)
A vector of parameters with shape self.shape at which to evaluate p-values. Defaults to np.zeros(self.shape).
sample_args : sequence
Arguments to self.sample if sample is not found for a given objective.
alternatives : [str], optional
Sequence of strings describing the alternatives, should be values of [‘twosided’, ‘less’, ‘greater’]
- Returns
pvalues : ndarray
-
confidence_intervals
(observed_target, sample_args=(), level=0.9)[source]¶ Construct selective confidence intervals for each parameter of the target.
- Parameters
observed_target : ndarray
Observed estimate of target.
sample_args : sequence
Arguments to self.sample if sample is not found for a given objective.
level : float
Confidence level.
- Returns
limits : ndarray
Confidence intervals for each target.
-
optimization_intervals
¶
-
class
selectinf.randomized.query.
optimization_intervals
(opt_sampling_info, observed, nsample, target_cov=None, normal_sample=None)[source]¶ Bases:
object
-
__init__
(opt_sampling_info, observed, nsample, target_cov=None, normal_sample=None)[source]¶ Initialize self. See help(type(self)) for accurate signature.
-
optimization_sampler
¶
-
class
selectinf.randomized.query.
optimization_sampler
[source]¶ Bases:
object
-
log_cond_density
(opt_sample, target_sample, transform=None)[source]¶ Density of opt_sample | target_sample
-
hypothesis_test
(test_stat, observed_value, target_cov, score_cov, sample_args=(), sample=None, parameter=0, alternative='twosided')[source]¶ Sample target from selective density using sampler with gradient map self.gradient and projection map self.projection.
- Parameters
test_stat : callable
Test statistic to evaluate on sample from selective distribution.
observed_value : float
Observed value of test statistic. Used in p-value calculation.
sample_args : sequence
Arguments to self.sample if sample is None.
sample : np.array (optional)
If not None, assumed to be a sample of shape (-1,) + self.shape representing a sample of the target from parameters. Allows reuse of the same sample for construction of confidence intervals, hypothesis tests, etc. If not None, ndraw, burnin, stepsize are ignored.
parameter : np.float (optional)
alternative : [‘greater’, ‘less’, ‘twosided’]
What alternative to use.
- Returns
pvalue : float
-
confidence_intervals
(observed_target, target_cov, score_cov, sample_args=(), sample=None, normal_sample=None, level=0.9, initial_guess=None)[source]¶ - Parameters
observed : np.float
A vector of parameters with shape self.shape, representing coordinates of the target.
sample_args : sequence
Arguments to self.sample if sample is None.
sample : np.array (optional)
If not None, assumed to be a sample of shape (-1,) + self.shape representing a sample of the target from parameters self.reference. Allows reuse of the same sample for construction of confidence intervals, hypothesis tests, etc.
level : float (optional)
Specify the confidence level.
initial_guess : np.float
Initial guesses at upper and lower limits, optional.
- Returns
intervals : [(float, float)]
List of confidence intervals.
Notes
Construct selective confidence intervals for each parameter of the target.
-
coefficient_pvalues
(observed_target, target_cov, score_cov, parameter=None, sample_args=(), sample=None, normal_sample=None, alternatives=None)[source]¶ Construct selective p-values for each parameter of the target.
- Parameters
observed : np.float
A vector of parameters with shape self.shape, representing coordinates of the target.
parameter : np.float (optional)
A vector of parameters with shape self.shape at which to evaluate p-values. Defaults to np.zeros(self.shape).
sample_args : sequence
Arguments to self.sample if sample is None.
sample : np.array (optional)
If not None, assumed to be a sample of shape (-1,) + self.shape representing a sample of the target from parameters self.reference. Allows reuse of the same sample for construction of confidence intervals, hypothesis tests, etc.
alternatives : list of [‘greater’, ‘less’, ‘twosided’]
What alternative to use.
- Returns
pvalues : np.float
-
query
¶
-
class
selectinf.randomized.query.
query
(randomization, perturb=None)[source]¶ Bases:
object
This class is the base of randomized selective inference based on convex programs.
The main mechanism is to take an initial penalized program
\[\text{minimize}_B \ell(B) + {\cal P}(B)\]and add a randomization and small ridge term yielding
\[\text{minimize}_B \ell(B) + {\cal P}(B) - \langle \omega, B \rangle + \frac{\epsilon}{2} \|B\|^2_2\]-
__init__
(randomization, perturb=None)[source]¶ - Parameters
randomization : selection.randomized.randomization.randomization
Instance of a randomization scheme. Describes the law of \(\omega\).
perturb : ndarray, optional
Value of randomization vector, an instance of \(\omega\).
-
randomize
(perturb=None)[source]¶ The actual randomization step.
- Parameters
perturb : ndarray, optional
Value of randomization vector, an instance of \(\omega\).
-
property
sampler
¶ Sampler of optimization (augmented) variables.
-
setup_sampler
()[source]¶ Setup query to prepare for sampling. Should set a few key attributes:
observed_score_state
observed_opt_state
opt_transform
-
summary
(observed_target, target_cov, target_score_cov, alternatives, opt_sample=None, target_sample=None, parameter=None, level=0.9, ndraw=10000, burnin=2000, compute_intervals=False)[source]¶ Produce p-values and confidence intervals for targets of model including selected features
- Parameters
target : one of [‘selected’, ‘full’]
features : np.bool
Binary encoding of which features to use in final model and targets.
parameter : np.array
Hypothesized value for parameter – defaults to 0.
level : float
Confidence level.
ndraw : int (optional)
Defaults to 1000.
burnin : int (optional)
Defaults to 1000.
compute_intervals : bool
Compute confidence intervals?
dispersion : float (optional)
Use a known value for dispersion, or Pearson’s X^2?
-
selective_MLE
(observed_target, target_cov, target_score_cov, level=0.9, solve_args={'tol': 1e-12})[source]¶ - Parameters
observed_target : ndarray
Observed estimate of target.
target_cov : ndarray
Estimated covaraince of target.
target_score_cov : ndarray
Estimated covariance of target and score of randomized query.
level : float, optional
Confidence level.
solve_args : dict, optional
Arguments passed to solver.
-
Functions¶
-
selectinf.randomized.query.
naive_confidence_intervals
(diag_cov, observed, level=0.9)[source]¶ Compute naive Gaussian based confidence intervals for target. Parameters ———-
diag_cov : diagonal of a covariance matrix
- observednp.float
A vector of observed data of shape target.shape
- alphafloat (optional)
1 - confidence level.
- Returns
intervals : np.float
Gaussian based confidence intervals.
-
selectinf.randomized.query.
normalizing_constant
(target_parameter, observed_target, target_cov, target_score_cov, feasible_point, cond_mean, cond_cov, logdens_linear, linear_part, offset, useC=False)[source]¶ Approximation of normalizing constant in affine constrained Gaussian.
- Parameters
observed_target : ndarray
Observed estimate of target.
target_cov : ndarray
Estimated covaraince of target.
target_score_cov : ndarray
Estimated covariance of target and score of randomized query.
init_soln : ndarray
Feasible point for optimization problem.
cond_mean : ndarray
Conditional mean of optimization variables given target.
cond_cov : ndarray
Conditional covariance of optimization variables given target.
logdens_linear : ndarray
Describes how conditional mean of optimization variables varies with target.
linear_part : ndarray
Linear part of affine constraints: \(\{o:Ao \leq b\}\)
offset : ndarray
Offset part of affine constraints: \(\{o:Ao \leq b\}\)
solve_args : dict, optional
Arguments passed to solver.
level : float, optional
Confidence level.
useC : bool, optional
Use python or C solver.
-
selectinf.randomized.query.
selective_MLE
(observed_target, target_cov, target_score_cov, init_soln, cond_mean, cond_cov, logdens_linear, linear_part, offset, solve_args={'tol': 1e-12}, level=0.9, useC=False)[source]¶ Selective MLE based on approximation of CGF.
- Parameters
observed_target : ndarray
Observed estimate of target.
target_cov : ndarray
Estimated covaraince of target.
target_score_cov : ndarray
Estimated covariance of target and score of randomized query.
init_soln : ndarray
Feasible point for optimization problem.
cond_mean : ndarray
Conditional mean of optimization variables given target.
cond_cov : ndarray
Conditional covariance of optimization variables given target.
logdens_linear : ndarray
Describes how conditional mean of optimization variables varies with target.
linear_part : ndarray
Linear part of affine constraints: \(\{o:Ao \leq b\}\)
offset : ndarray
Offset part of affine constraints: \(\{o:Ao \leq b\}\)
solve_args : dict, optional
Arguments passed to solver.
level : float, optional
Confidence level.
useC : bool, optional
Use python or C solver.