hidimstat.conditional_sampling.ConditionalSampler#

class hidimstat.conditional_sampling.ConditionalSampler(model_regression=None, model_categorical=None, data_type: str = 'auto', categorical_max_cardinality=10)[source]#

Bases: object

__init__(model_regression=None, model_categorical=None, data_type: str = 'auto', categorical_max_cardinality=10)[source]#

Class use to sample from the conditional distribution $p(X^j | X^{-j})$.

Parameters:
model_regressionsklearn compatible estimator, optional

The model to use for continuous data.

model_categoricalsklearn compatible estimator, optional

The model to use for categorical data. Binary is considered as a special case of categorical data.

data_typestr, default=”auto”

The variable type. Supported types include “auto”, “continuous”, and “categorical”. If “auto”, the type is inferred from the cardinality of the unique values passed to the fit method.

categorical_max_cardinalityint, default=10

The maximum cardinality of a variable to be considered as categorical when data_type is “auto”.

fit(X: ndarray, y: ndarray)[source]#

Fit the model that estimates $mathbb{E}[y | X]$.

Parameters:
Xndarray

The variables used to predict the group of variables $y$.

yndarray

The group of variables to predict.

sample(X: ndarray, y: ndarray, n_samples: int = 1, random_state=None) ndarray[source]#

Sample from the conditional distribution $p(X^j | X^{-j})$.

Parameters:
Xndarray

The complementary of the considered set of variables, $X^{-j}$.

yndarray

The group of variables to sample, $X^j$.

n_samplesint, optional

The number of samples to draw.

random_stateint, default=None

The random state to use for sampling.

Returns:
y_conditionalndarray

The samples from the conditional distribution.

Examples using hidimstat.conditional_sampling.ConditionalSampler#

Pitfalls of Permutation Feature Importance (PFI) on the California Housing Dataset

Pitfalls of Permutation Feature Importance (PFI) on the California Housing Dataset