learning.core

Module: learning.core

Core module for learning selection

Functions

selectinf.learning.core.cross_inference(learning_data, nuisance, direction, fit_probability, nref=200, fit_args={}, verbose=False)[source]
selectinf.learning.core.infer_full_target(algorithm, observed_set, features, observed_sampler, dispersion, fit_probability=<function gbm_fit_sk>, fit_args={'n_estimators': 500}, hypothesis=[0], alpha=0.1, success_params=(1, 1), B=500, learner_klass=<class 'selectinf.learning.learners.mixture_learner'>, learning_npz=None, single=False)[source]

Compute a p-value (or pivot) for a target having observed outcome of algorithm(observed_sampler).

Parameters

algorithm : callable

Selection algorithm that takes a noise source as its only argument.

observed_set : set(int)

The purported value algorithm(observed_sampler), i.e. run with the original seed.

features : [int]

List of the elements of observed_set.

observed_sampler : normal_source

Representation of the data used in the selection procedure.

dispersion : float

Scalar dispersion of the covariance of observed_sampler. In OLS problems this is \(\sigma^2\).

fit_probability : callable

Function to learn a probability model P(Y=1|T) based on [T, Y].

hypothesis : np.float # 1-dimensional targets for now

Hypothesized value of target.

alpha : np.float

Level for 1 - confidence.

B : int

How many queries?

Notes

This function makes the assumption that covariance in observed sampler is the true covariance of S and we are looking for inference about coordinates of the mean of

np.linalg.inv(covariance).dot(S)

this allows us to compute the required observed_target, cross_cov and target_cov.

selectinf.learning.core.infer_general_target(observed_outcome, observed_target, target_cov, learner, fit_probability=<function gbm_fit_sk>, fit_args={'n_estimators': 500}, hypothesis=0, alpha=0.1, success_params=(1, 1), B=500, learning_npz=None)[source]

Compute a p-value (or pivot) for a target having observed outcome of from algorithm run on the original data.

Parameters

observed_outcome : object

The purported value observed, i.e. run with the original seed.

target_cov : np.float((1, 1)) # 1 for 1-dimensional targets for now

Covariance of target estimator

learner :

Object that generates perturbed data.

observed_target : np.float # 1-dimensional targets for now

Observed value of target estimator.

fit_probability : callable

Function to learn a probability model P(Y=1|T) based on [T, Y].

hypothesis : np.float # 1-dimensional targets for now

Hypothesized value of target.

alpha : np.float

Level for 1 - confidence.

B : int

How many queries?

selectinf.learning.core.infer_set_target(observed_set, features, observed_target, target_cov, learner, fit_probability=<function gbm_fit_sk>, fit_args={'n_estimators': 500}, hypothesis=[0], alpha=0.1, success_params=(1, 1), B=500, learning_npz=None, single=False)[source]

Compute a p-value (or pivot) for a target having observed outcome of algorithm on original data.

Parameters

observed_set : set(int)

The purported value observed when run with the original seed.

features : [int]

List of the elements of observed_set.

observed_target : np.ndarray

Statistic inference is based on.

target_cov : np.ndarray

(Pre-selection) covariance matrix of observed_target.

learner :

Object that generates perturbed data.

fit_probability : callable

Function to learn a probability model P(Y=1|T) based on [T, Y].

hypothesis : np.float # 1-dimensional targets for now

Hypothesized value of target.

alpha : np.float

Level for 1 - confidence.

B : int

How many queries?

Notes

This function makes the assumption that covariance in observed sampler is the true covariance of S and we are looking for inference about coordinates of the mean of

np.linalg.inv(covariance).dot(S)

this allows us to compute the required observed_target, cross_cov and target_cov.

selectinf.learning.core.repeat_selection(base_algorithm, sampler, min_success, num_tries)[source]

Repeat a set-returning selection algorithm num_tries times, returning all elements that appear at least min_success times.