hidimstat.dcrt_zero#

hidimstat.dcrt_zero(X, y, estimated_coef=None, sigma_X=None, params_lasso_screening={'alpha': None, 'alpha_max_fraction': 0.5, 'alphas': None, 'cv': 5, 'fit_intercept': False, 'max_iter': 1000, 'n_alphas': 10, 'selection': 'cyclic', 'tol': 1e-06}, params_lasso_distillation_x=None, params_lasso_distillation_y=None, refit=False, screening=True, screening_threshold=0.1, statistic='residual', centered=True, n_jobs=1, joblib_verbose=0, fit_y=False, n_tree=100, problem_type='regression', random_state=2022)[source]#

Implements distilled conditional randomization test (dCRT) without interactions.

A faster version of the Conditional Randomization Test Candes et al.[1] using the distillation process from Liu et al.[2]. Based on original implementation at: moleibobliu/Distillation-CRT

Parameters:
Xarray-like of shape (n_samples, n_features)

Training data

yarray-like of shape (n_samples,)

Target values

estimated_coefarray-like of shape (n_features,), optional

Pre-computed feature coefficients

sigma_Xarray-like of shape (n_features, n_features), optional

Covariance matrix of X

params_lasso_screeningdict

Parameters for main Lasso estimation or crossvalidation Lasso, including: - alpha : float, optional - L1 regularization strength. If None, determined by CV. - n_alphas : int, default=0 - Number of alphas for cross-validation. - alphas : array-like, default=None - List of alpha values to try in CV. - alpha_max_fraction : float, default=0.5 - Scale factor for alpha_max. For other parameters see :py:func:LassoCV, here is some advise configuration - cv : int, default=5 - Number of cross-validation folds. - tol : float, default=1e-6 - Tolerance for optimization. - max_iter : int, default=1000 - Maximum iterations. - fit_intercept : bool, default=False - Whether to fit intercept. - selection : str, default=’cyclic’ - Feature selection method.

params_lasso_distillation_xdict, optional

Parameters for X distillation Lasso. Defaults to params_lasso_screening.

params_lasso_distillation_ydict, optional

Parameters for y distillation Lasso. Defaults to params_lasso_screening.

refitbool, default=False

Whether to refit on estimated support set

screeningbool, default=True

Whether to screen variables

screening_thresholdfloat, default=0.1

Threshold for variable screening (0-100)

statistic{‘residual’, ‘random_forest’}, default=’residual’

Learning method for outcome distillation

centeredbool, default=True

Whether to standardize features

n_jobsint, default=1

Number of parallel jobs

joblib_verboseint, default=0

Verbosity level

fit_ybool, default=False

Whether to fit y using selected features

n_treeint, default=100

Number of trees for random forest

problem_type{‘regression’, ‘classification’}, default=’regression’

Type of learning problem

random_stateint, default=2022

Random seed

Returns:
selection_featuresndarray of shape (n_features,)

Boolean mask of selected features

X_resndarray of shape (n_selected, n_samples)

Residuals after X distillation

sigma2ndarray of shape (n_selected,)

Estimated residual variances

y_resndarray of shape (n_selected, n_samples)

Response residuals

References

Examples using hidimstat.dcrt_zero#

Distilled Conditional Randomization Test (dCRT) using Lasso vs Random Forest learners

Distilled Conditional Randomization Test (dCRT) using Lasso vs Random Forest learners

Variable Selection Under Model Misspecification

Variable Selection Under Model Misspecification