cfi_importance#
- hidimstat.cfi_importance(estimator, X, y, method: str = 'predict', loss: callable = <function mean_squared_error>, n_permutations: int = 50, imputation_model_continuous=RidgeCV(), imputation_model_categorical=LogisticRegressionCV(), features_groups=None, feature_types='auto', categorical_max_cardinality: int = 10, test_statistic='ttest', k_best=None, percentile=None, threshold_max=None, threshold_min=None, random_state: int = None, n_jobs: int = 1)[source]#
Conditional Feature Importance (CFI) algorithm. Chamma et al.[1] and for group-level see Chamma et al.[2]. Parameters ———- estimator : sklearn compatible estimator The estimator to use for the prediction. method : str, default=”predict” The method to use for the prediction. This determines the predictions passed to the loss function. Supported methods are “predict”, “predict_proba” or “decision_function”. loss : callable, default=mean_squared_error The loss function to use when comparing the perturbed model to the full model. n_permutations : int, default=50 The number of permutations to perform. For each variable/group of variables, the mean of the losses over the n_permutations is computed. imputation_model_continuous : sklearn compatible estimator, default=RidgeCV() The model used to estimate the conditional distribution of a given continuous variable/group of variables given the others. imputation_model_categorical : sklearn compatible estimator, default=LogisticRegressionCV() The model used to estimate the conditional distribution of a given categorical variable/group of variables given the others. Binary is considered as a special case of categorical. features_groups: dict or None, default=None A dictionary where the keys are the group names and the values are the list of column names corresponding to each features group. If None, the features_groups are identified based on the columns of X. feature_types: str or list, default=”auto” The feature type. Supported types include “auto”, “continuous”, and “categorical”. If “auto”, the type is inferred from the cardinality of the unique values passed to the fit method. categorical_max_cardinality : int, default=10 The maximum cardinality of a variable to be considered as categorical when the variable type is inferred (set to “auto” or not provided). statistical_test : callable or str, default=”ttest” Statistical test function for computing p-values of importance scores. random_state : int or None, default=None The random state to use for sampling. n_jobs : int, default=1 The number of jobs to run in parallel. Parallelization is done over the variables or groups of variables. X : array-like of shape (n_samples, n_features) Training data. y : array-like of shape (n_samples,) Target values. k_best : int, default=None Selects the top k features based on importance scores. percentile : float, default=None Selects features based on a specified percentile of importance scores. threshold_max : float, default=None Selects features with importance scores below the specified maximum threshold. threshold_min : float, default=None Selects features with importance scores above the specified minimum threshold.
- Returns:
- selectionndarray of shape (n_features,)
- Boolean array indicating selected features (True = selected)
- importancesndarray of shape (n_features,)
- Feature importance scores/test statistics.
- pvaluesndarray of shape (n_features,)
- P-values for importance scores.