algorithms.forward_step

Module: algorithms.forward_step

Inheritance diagram for selectinf.algorithms.forward_step:

digraph inheritanceb342a9b2ac { rankdir=LR; size="8.0, 12.0"; "algorithms.forward_step.forward_step" [URL="#selectinf.algorithms.forward_step.forward_step",fontname="Vera Sans, DejaVu Sans, Liberation Sans, Arial, Helvetica, sans",fontsize=10,height=0.25,shape=box,style="setlinewidth(0.5)",target="_top",tooltip="Forward stepwise model selection."]; }

In this module, we implement forward stepwise model selection for \(K\) steps.

The main goal of this is to produce a set of linear inequality constraints satisfied by \(y\) after \(K\) steps.

Class

forward_step

class selectinf.algorithms.forward_step.forward_step(X, Y, subset=None, fixed_regressors=None, intercept=True, covariance=None)[source]

Bases: object

Forward stepwise model selection.

__init__(X, Y, subset=None, fixed_regressors=None, intercept=True, covariance=None)[source]
Parameters

X : ndarray

Shape (n,p) – the design matrix.

Y : ndarray

Shape (n,) – the response.

subset : ndarray (optional)

Shape (n,) – boolean indicator of which cases to use. Defaults to np.ones(n, np.bool)

fixed_regressors: ndarray (optional)

Shape (n, *) – fixed regressors to regress out before computing score.

intercept : bool

Remove intercept – this effectively includes np.ones(n) to fixed_regressors.

covariance : ndarray (optional)

Covariance matrix of errors. Defaults to np.identity(n).

Returns

FS : selection.algorithms.forward_step.forward_step

step(compute_maxZ_pval=False, use_identity=False, ndraw=8000, burnin=2000, sigma_known=True, accept_reject_params=(100, 15, 2000))[source]
Parameters

compute_maxZ_pval : bool

Compute a p-value for this step? Requires MCMC sampling.

use_identity : bool

If computing a p-value condition on the identity of the variable?

ndraw : int (optional)

Defaults to 1000.

burnin : int (optional)

Defaults to 1000.

sigma_known : bool

Is \(\sigma\) assumed known?

accept_reject_params : tuple

If not () should be a tuple (num_trial, min_accept, num_draw). In this case, we first try num_trial accept-reject samples, if at least min_accept of them succeed, we just draw num_draw accept_reject samples.

constraints(step=inf, identify_last_variable=True)[source]
model_pivots(which_step, alternative='onesided', saturated=True, ndraw=5000, burnin=2000, which_var=[], compute_intervals=False, nominal=False, coverage=0.95)[source]

Compute two-sided pvalues for each coefficient in a given step of forward stepwise.

Parameters

which_step : int

Which step of forward stepwise.

alternative : [‘onesided’, ‘twosided’]

What alternative to use.

saturated : bool

Use saturated model or selected model?

ndraw : int (optional)

Defaults to 5000.

burnin : int (optional)

Defaults to 2000.

which_var : []

Compute pivots for which variables? If empty, return a pivot for all selected variable at stage which_step.

compute_intervals : bool

Should we compute intervals?

coverage : float

Coverage for intervals, if computed.

Returns

pivots : list

List of (variable, pvalue) for selected model.

model_quadratic(which_step)[source]

Functions

selectinf.algorithms.forward_step.data_carving_IC(y, X, sigma, cost=2.0, stage_one=None, split_frac=0.9, coverage=0.95, ndraw=8000, burnin=2000, saturated=False, splitting=False, compute_intervals=True)[source]

Fit a LASSO with a default choice of Lagrange parameter equal to lam_frac times \(\sigma \cdot E(|X^T\epsilon|)\) with \(\epsilon\) IID N(0,1) on a proportion (split_frac) of the data.

Parameters

y : np.float

Response vector

X : np.float

Design matrix

sigma : np.float

Noise variance

stage_one : [np.array(np.int), None] (optional)

Index of data points to be used in first stage. If None, a randomly chosen set of entries is used based on split_frac.

split_frac : float (optional)

What proportion of the data to use in the first stage? Defaults to 0.9.

coverage : float

Coverage for selective intervals. Defaults to 0.95.

ndraw : int (optional)

How many draws to keep from Gibbs hit-and-run sampler. Defaults to 8000.

burnin : int (optional)

Defaults to 2000.

splitting : bool (optional)

If True, also return splitting pvalues and intervals.

Returns

results : [(variable, pvalue, interval)

Indices of active variables, selected (twosided) pvalue and selective interval. If splitting, then each entry also includes a (split_pvalue, split_interval) using stage_two for inference.

selectinf.algorithms.forward_step.info_crit_stop(Y, X, sigma, cost=2, subset=None)[source]

Fit model using forward stepwise, stopping using a rule like AIC or BIC.

The error variance must be supplied, in which case AIC is essentially Mallow’s C_p.

Parameters

Y : np.float

Response vector

X : np.float

Design matrix

sigma : float (optional)

Error variance.

cost : float

Cost per parameter. For BIC use cost=log(X.shape[0])

subset : ndarray (optional)

Shape (n,) – boolean indicator of which cases to use. Defaults to np.ones(n, np.bool)

Returns

FS : forward_step

Instance of forward stepwise stopped at the corresponding step. Constraints of FS will reflect the minimum Z score requirement.

selectinf.algorithms.forward_step.mcmc_test(fs_obj, step, variable=None, nstep=100, ndraw=20, method='parallel', burnin=1000)[source]