algorithms.forward_step¶
Module: algorithms.forward_step
¶
Inheritance diagram for selectinf.algorithms.forward_step
:
In this module, we implement forward stepwise model selection for \(K\) steps.
The main goal of this is to produce a set of linear inequality constraints satisfied by \(y\) after \(K\) steps.
Class¶
forward_step
¶
-
class
selectinf.algorithms.forward_step.
forward_step
(X, Y, subset=None, fixed_regressors=None, intercept=True, covariance=None)[source]¶ Bases:
object
Forward stepwise model selection.
-
__init__
(X, Y, subset=None, fixed_regressors=None, intercept=True, covariance=None)[source]¶ - Parameters
X : ndarray
Shape (n,p) – the design matrix.
Y : ndarray
Shape (n,) – the response.
subset : ndarray (optional)
Shape (n,) – boolean indicator of which cases to use. Defaults to np.ones(n, np.bool)
fixed_regressors: ndarray (optional)
Shape (n, *) – fixed regressors to regress out before computing score.
intercept : bool
Remove intercept – this effectively includes np.ones(n) to fixed_regressors.
covariance : ndarray (optional)
Covariance matrix of errors. Defaults to np.identity(n).
- Returns
FS : selection.algorithms.forward_step.forward_step
-
step
(compute_maxZ_pval=False, use_identity=False, ndraw=8000, burnin=2000, sigma_known=True, accept_reject_params=(100, 15, 2000))[source]¶ - Parameters
compute_maxZ_pval : bool
Compute a p-value for this step? Requires MCMC sampling.
use_identity : bool
If computing a p-value condition on the identity of the variable?
ndraw : int (optional)
Defaults to 1000.
burnin : int (optional)
Defaults to 1000.
sigma_known : bool
Is \(\sigma\) assumed known?
accept_reject_params : tuple
If not () should be a tuple (num_trial, min_accept, num_draw). In this case, we first try num_trial accept-reject samples, if at least min_accept of them succeed, we just draw num_draw accept_reject samples.
-
model_pivots
(which_step, alternative='onesided', saturated=True, ndraw=5000, burnin=2000, which_var=[], compute_intervals=False, nominal=False, coverage=0.95)[source]¶ Compute two-sided pvalues for each coefficient in a given step of forward stepwise.
- Parameters
which_step : int
Which step of forward stepwise.
alternative : [‘onesided’, ‘twosided’]
What alternative to use.
saturated : bool
Use saturated model or selected model?
ndraw : int (optional)
Defaults to 5000.
burnin : int (optional)
Defaults to 2000.
which_var : []
Compute pivots for which variables? If empty, return a pivot for all selected variable at stage which_step.
compute_intervals : bool
Should we compute intervals?
coverage : float
Coverage for intervals, if computed.
- Returns
pivots : list
List of (variable, pvalue) for selected model.
-
Functions¶
-
selectinf.algorithms.forward_step.
data_carving_IC
(y, X, sigma, cost=2.0, stage_one=None, split_frac=0.9, coverage=0.95, ndraw=8000, burnin=2000, saturated=False, splitting=False, compute_intervals=True)[source]¶ Fit a LASSO with a default choice of Lagrange parameter equal to lam_frac times \(\sigma \cdot E(|X^T\epsilon|)\) with \(\epsilon\) IID N(0,1) on a proportion (split_frac) of the data.
- Parameters
y : np.float
Response vector
X : np.float
Design matrix
sigma : np.float
Noise variance
stage_one : [np.array(np.int), None] (optional)
Index of data points to be used in first stage. If None, a randomly chosen set of entries is used based on split_frac.
split_frac : float (optional)
What proportion of the data to use in the first stage? Defaults to 0.9.
coverage : float
Coverage for selective intervals. Defaults to 0.95.
ndraw : int (optional)
How many draws to keep from Gibbs hit-and-run sampler. Defaults to 8000.
burnin : int (optional)
Defaults to 2000.
splitting : bool (optional)
If True, also return splitting pvalues and intervals.
- Returns
results : [(variable, pvalue, interval)
Indices of active variables, selected (twosided) pvalue and selective interval. If splitting, then each entry also includes a (split_pvalue, split_interval) using stage_two for inference.
-
selectinf.algorithms.forward_step.
info_crit_stop
(Y, X, sigma, cost=2, subset=None)[source]¶ Fit model using forward stepwise, stopping using a rule like AIC or BIC.
The error variance must be supplied, in which case AIC is essentially Mallow’s C_p.
- Parameters
Y : np.float
Response vector
X : np.float
Design matrix
sigma : float (optional)
Error variance.
cost : float
Cost per parameter. For BIC use cost=log(X.shape[0])
subset : ndarray (optional)
Shape (n,) – boolean indicator of which cases to use. Defaults to np.ones(n, np.bool)
- Returns
FS : forward_step
Instance of forward stepwise stopped at the corresponding step. Constraints of FS will reflect the minimum Z score requirement.