hidimstat.dcrt_zero#

hidimstat.dcrt_zero(X, y, estimated_coef=None, sigma_X=None, params_lasso_screening={'alpha': None, 'alpha_max_fraction': 0.5, 'alphas': None, 'cv': 5, 'fit_intercept': False, 'max_iter': 1000, 'n_alphas': 10, 'selection': 'cyclic', 'tol': 1e-06}, params_lasso_distillation_x=None, params_lasso_distillation_y=None, refit=False, screening=True, screening_threshold=0.1, statistic='residual', centered=True, n_jobs=1, joblib_verbose=0, fit_y=False, n_tree=100, problem_type='regression', random_state=2022)[source]#

Implements distilled conditional randomization test (dCRT) without interactions.

A faster version of the Conditional Randomization Test Candes et al.[1] using the distillation process from Liu et al.[2]. Based on original implementation at: moleibobliu/Distillation-CRT

Parameters:

Xarray-like of shape (n_samples, n_features): Training data
yarray-like of shape (n_samples,): Target values
estimated_coefarray-like of shape (n_features,), optional: Pre-computed feature coefficients
sigma_Xarray-like of shape (n_features, n_features), optional: Covariance matrix of X
params_lasso_screeningdict: Parameters for main Lasso estimation or crossvalidation Lasso, including: - alpha : float, optional - L1 regularization strength. If None, determined by CV. - n_alphas : int, default=0 - Number of alphas for cross-validation. - alphas : array-like, default=None - List of alpha values to try in CV. - alpha_max_fraction : float, default=0.5 - Scale factor for alpha_max. For other parameters see :py:func:LassoCV, here is some advise configuration - cv : int, default=5 - Number of cross-validation folds. - tol : float, default=1e-6 - Tolerance for optimization. - max_iter : int, default=1000 - Maximum iterations. - fit_intercept : bool, default=False - Whether to fit intercept. - selection : str, default=’cyclic’ - Feature selection method.
params_lasso_distillation_xdict, optional: Parameters for X distillation Lasso. Defaults to params_lasso_screening.
params_lasso_distillation_ydict, optional: Parameters for y distillation Lasso. Defaults to params_lasso_screening.
refitbool, default=False: Whether to refit on estimated support set
screeningbool, default=True: Whether to screen variables
screening_thresholdfloat, default=0.1: Threshold for variable screening (0-100)
statistic{‘residual’, ‘random_forest’}, default=’residual’: Learning method for outcome distillation
centeredbool, default=True: Whether to standardize features
n_jobsint, default=1: Number of parallel jobs
joblib_verboseint, default=0: Verbosity level
fit_ybool, default=False: Whether to fit y using selected features
n_treeint, default=100: Number of trees for random forest
problem_type{‘regression’, ‘classification’}, default=’regression’: Type of learning problem
random_stateint, default=2022: Random seed

Returns:

selection_featuresndarray of shape (n_features,): Boolean mask of selected features
X_resndarray of shape (n_selected, n_samples): Residuals after X distillation
sigma2ndarray of shape (n_selected,): Estimated residual variances
y_resndarray of shape (n_selected, n_samples): Response residuals

References

Examples using `hidimstat.dcrt_zero`#

Distilled Conditional Randomization Test (dCRT) using Lasso vs Random Forest learners

Variable Selection Under Model Misspecification

hidimstat.dcrt_zero#

Examples using hidimstat.dcrt_zero#

This Page

Examples using `hidimstat.dcrt_zero`#