desparsified_lasso_importance#

hidimstat.desparsified_lasso_importance(X, y, estimator=LassoCV(cv=KFold(n_splits=5, random_state=0, shuffle=True), eps=0.01, fit_intercept=False, max_iter=5000, random_state=0, tol=0.001), centered=True, dof_ajdustement=False, model_x=Lasso(), preconfigure_model_x_path=True, alpha_max_fraction=0.01, save_model_x=False, random_state=None, tolerance_reid=0.0001, noise_method='AR', order=1, stationary=True, confidence=0.95, distribution='norm', epsilon_pvalue=1e-14, test='chi2', covariance=None, n_jobs=1, memory=None, verbose=0, k_lowest=None, percentile=None, threshold_min=None, threshold_max=None)[source]#

Desparsified Lasso Estimator (also known as Debiased Lasso)

Statistical inference in high-dimensional regression using the desparsified Lasso. Provides debiased coefficient estimates, confidence intervals and p-values. Algorithm based on Algorithm 1 of d-Lasso and d-MTLasso in Chevalier[1]. Parameters ———- estimator : LassoCV or MultiTaskLassoCV instance, default=LassoCV() Initial model for selecting relevant features. Must implement fit and predict. For single task use LassoCV, for multi-task use MultiTaskLassoCV. model_x : Lasso or MultiTaskLasso instance, default=Lasso() Base model for nodewise regressions. centered : bool, default=True Whether to center X and y before fitting. dof_ajdustement : bool, default=False Whether to apply degrees of freedom adjustment for small samples. preconfigure_model_x_path: bool, default=True Whether to preconfigure model_x with n_jobs and random_state. alpha_max_fraction : float, default=0.01 Only used if preconfigure_model_x_path is True. Fraction of maximum alpha to use when alphas=None. random_state : int or RandomState, default=None Controls randomization. save_model_x : bool, default=False Whether to save fitted nodewise regression models. tolerance_reid : float, default=1e-4 Convergence tolerance for noise estimation. noise_method : {‘AR’, ‘median’}, default=’AR’ Method for noise covariance estimation: - ‘AR’: Autoregressive model - ‘median’: Median correlation order : int, default=1 Order of AR model if noise_method=’AR’. stationary : bool, default=True Whether to assume stationary noise. confidence : float, default=0.95 Confidence level for intervals. distribution : str, default=’norm’ Distribution for p-values, only ‘norm’ supported. epsilon_pvalue : float, default=1e-14 Small constant to avoid numerical issues. test : {‘chi2’, ‘F’}, default=’chi2’ Test statistic for p-values: - ‘chi2’: Chi-squared test (large samples) - ‘F’: F-test (small samples) covariance : ndarray or None, default=None Pre-specified noise covariance matrix. n_jobs : int, default=1 Number of parallel jobs. memory : str or Memory, default=None Cache for intermediate results. verbose : int, default=0 Verbosity level. X : array-like of shape (n_samples, n_features) Training data matrix. y : array-like of shape (n_samples,) or (n_samples, n_task) Target values. For single task, y should be 1D or (n_samples, 1). For multi-task, y should be (n_samples, n_task). k_lowest : int, default=None Selects the k features with lowest p-values. percentile : float, default=None Selects features based on a specified percentile of p-values. threshold_max : float, default=0.05 Selects features with p-values below the specified maximum threshold (0 to 1). threshold_min : float, default=None Selects features with p-values above the specified minimum threshold (0 to 1). alternative_hypothesis : bool, default=False If True, selects based on 1-pvalues instead of p-values.

Returns:
selectionndarray of shape (n_features,)
Boolean array indicating selected features (True = selected)
importancesndarray of shape (n_features,)
Feature importance scores/test statistics. For features not selected
during screening, scores are set to 0.
pvaluesndarray of shape (n_features,)
Two-sided p-values for each feature under Gaussian null hypothesis.
For features not selected during screening, p-values are set to 1.