loco_importance#

hidimstat.loco_importance(estimator, X, y, method: str = 'predict', loss: callable = <function mean_squared_error>, features_groups=None, test_statistic='ttest', k_best=None, percentile=None, threshold_min=None, threshold_max=None, n_jobs: int = 1)[source]#

Leave-One-Covariate-Out (LOCO) algorithm

This method is presented in Lei et al.[1] and Verdinelli and Wasserman[2]. The model is re-fitted for each feature/group of features. The importance is then computed as the difference between the loss of the full model and the loss of the model without the feature/group. Parameters ———- estimator : sklearn compatible estimator The estimator to use for the prediction. method : str, default=”predict” The method to use for the prediction. This determines the predictions passed to the loss function. Supported methods are “predict”, “predict_proba” or “decision_function”. loss : callable, default=mean_squared_error The loss function to use when comparing the perturbed model to the full model. statistical_test : callable or str, default=”ttest” Statistical test function for computing p-values of importance scores. features_groups: dict or None, default=None A dictionary where the keys are the group names and the values are the list of column names corresponding to each features group. If None, the features_groups are identified based on the columns of X. n_jobs : int, default=1 The number of jobs to run in parallel. Parallelization is done over the variables or groups of variables.

Returns:
selectionndarray of shape (n_features,)
Boolean array indicating selected features (True = selected)
importancesndarray of shape (n_features,)
Feature importance scores/test statistics.
pvaluesndarray of shape (n_features,)
None because there is no p-value for this method

Notes

Williamson et al.[3] also presented a LOCO method with an additional data splitting strategy. X : array-like of shape (n_samples, n_features) Training data. y : array-like of shape (n_samples,) Target values. k_best : int, default=None Selects the top k features based on importance scores. percentile : float, default=None Selects features based on a specified percentile of importance scores. threshold_max : float, default=None Selects features with importance scores below the specified maximum threshold. threshold_min : float, default=None Selects features with importance scores above the specified minimum threshold.