d0crt_importance#

hidimstat.d0crt_importance(estimator, X, y, cv=None, method='predict', estimated_coef=None, sigma_X=None, lasso_screening=LassoCV(fit_intercept=False, n_alphas=10, random_state=0, tol=1e-06), model_distillation_x=LassoCV(n_alphas=10, n_jobs=1, random_state=0), refit=False, screening_threshold=10, centered=True, n_jobs=1, joblib_verbose=0, fit_y=False, scaled_statistics=False, random_state=None, reuse_screening_model=True, k_lowest=None, percentile=None, threshold_min=None, threshold_max=None, alternative_hypothesis=False)[source]#

Implements distilled conditional randomization test (dCRT) without interactions.

This class provides a fast implementation of the Conditional Randomization Test Candes et al.[1] using the distillation process from Liu et al.[2]. The approach accelerates variable selection by combining Lasso-based screening and residual-based test statistics. Based on the original implementation at: moleibobliu/Distillation-CRT The y-distillation is based on a given estimator and the x-distillation is based on a Lasso estimator. Parameters ———- estimator : sklearn estimator The base estimator used for y-distillation and prediction (e.g., Lasso, RandomForest, …). method : str, default=”predict” Method of the estimator to use for predictions (“predict”, “predict_proba”, “decision_function”). estimated_coef : array-like of shape (n_features,) or None, default=None Pre-computed feature coefficients. If None, coefficients are estimated via Lasso. estimated_intercept : float or None, default=None Pre-computed intercept. If None, intercept is estimated via Lasso. sigma_X : array-like of shape (n_features, n_features) or None, default=None Covariance matrix of X. If None, Lasso is used for X distillation. lasso_screening : sklearn estimator, default=LassoCV(n_alphas=10, tol=1e-6, fit_intercept=False) Estimator for variable screening (typically LassoCV or Lasso). model_distillation_x : sklearn estimator, default=LassoCV(n_alphas=10) Estimator for X distillation (typically LassoCV or Lasso). refit : bool, default=False Whether to refit the model on selected features after screening. screening_threshold : float, default=10 Percentile threshold for screening (0-100). Larger values include more variables at screening. (screening_threshold=100 keeps all variables). centered : bool, default=True Whether to center and scale features using StandardScaler. n_jobs : int, default=1 Number of parallel jobs. joblib_verbose : int, default=0 Verbosity level for parallel jobs. fit_y : bool, default=True Controls y-distillation behavior: - If False and the estimator is linear, the sub-model predicting y from X^{-j} is created by simply removing the idx-th coefficient from the full model (no fitting is performed). - If True, fits a clone of estimator on (X^{-j}, y) - For non-linear estimators, always fits a clone of estimator on (X^{-j}, y) regardless of fit_y. scaled_statistics : bool, default=False Whether to use scaled statistics when computing importance. random_state : int, default=None Random seed for reproducibility. reuse_screening_model: bool, default=True Whether to reuse the screening model for y-distillation. X : array-like of shape (n_samples, n_features) Training data matrix. y : array-like of shape (n_samples,) Target values. cv : None or int, optional (default=None) Not used. Included for compatibility. A warning will be issued if provided. k_lowest : int, default=None Selects the k features with lowest p-values. percentile : float, default=None Selects features based on a specified percentile of p-values. threshold_max : float, default=0.05 Selects features with p-values below the specified maximum threshold (0 to 1). threshold_min : float, default=None Selects features with p-values above the specified minimum threshold (0 to 1). alternative_hypothesis : bool, default=False If True, selects based on 1-pvalues instead of p-values.

Returns:
selectionndarray of shape (n_features,)
Boolean array indicating selected features (True = selected)
importancesndarray of shape (n_features,)
Feature importance scores/test statistics. For features not selected
during screening, scores are set to 0.
pvaluesndarray of shape (n_features,)
Two-sided p-values for each feature under Gaussian null hypothesis.
For features not selected during screening, p-values are set to 1.