pfi_importance#

hidimstat.pfi_importance(estimator, X, y, method: str = 'predict', loss: callable = <function mean_squared_error>, n_permutations: int = 50, test_statistic='ttest', features_groups=None, k_best=None, percentile=None, threshold_min=None, threshold_max=None, random_state: int = None, n_jobs: int = 1)[source]#

Permutation Feature Importance algorithm

This as presented in Breiman[1]. For each variable/group of variables, the importance is computed as the difference between the loss of the initial model and the loss of the model with the variable/group permuted. The method was also used in Mi et al.[2] Parameters ———- estimator : sklearn compatible estimator The estimator to use for the prediction. method : str, default=”predict” The method to use for the prediction. This determines the predictions passed to the loss function. Supported methods are “predict”, “predict_proba” or “decision_function”. loss : callable, default=mean_squared_error The loss function to use when comparing the perturbed model to the full model. n_permutations : int, default=50 The number of permutations to perform. For each variable/group of variables, the mean of the losses over the n_permutations is computed. statistical_test : callable or str, default=”ttest” Statistical test function for computing p-values of importance scores. features_groups: dict or None, default=None A dictionary where the keys are the group names and the values are the list of column names corresponding to each features group. If None, the features_groups are identified based on the columns of X. random_state : int or None, default=None The random state to use for sampling. n_jobs : int, default=1 The number of jobs to run in parallel. Parallelization is done over the variables or groups of variables. X : array-like of shape (n_samples, n_features) Training data. y : array-like of shape (n_samples,) Target values. k_best : int, default=None Selects the top k features based on importance scores. percentile : float, default=None Selects features based on a specified percentile of importance scores. threshold_max : float, default=None Selects features with importance scores below the specified maximum threshold. threshold_min : float, default=None Selects features with importance scores above the specified minimum threshold.

Returns:
selectionndarray of shape (n_features,)
Boolean array indicating selected features (True = selected)
importancesndarray of shape (n_features,)
Feature importance scores/test statistics.
pvaluesndarray of shape (n_features,)
P-values for importance scores.