ModelXKnockoff#

class hidimstat.ModelXKnockoff(estimator=LassoCV(cv=KFold(n_splits=5, random_state=0, shuffle=True), max_iter=200000, n_jobs=1, random_state=1, tol=1e-06, verbose=0), ko_generator=<hidimstat.samplers.gaussian_knockoffs.GaussianKnockoffs object>, n_repeats=1, centered=True, preconfigure_lasso_path=True, random_state=None, joblib_verbose=0, memory=None, n_jobs=1)[source]#

Bases: BaseVariableImportance

Model-X Knockoff

This module implements the Model-X knockoff inference procedure, which is an approach to control the False Discovery Rate (FDR) based on Candes et al.[1]. The original implementation can be found at msesia/knockoff-filter The noisy variables are generated with second-order knockoff variables using the equi-correlated method.

In addition, this function generates multiple sets of Gaussian knockoff variables and calculates the test statistics for each set. It then aggregates the test statistics across the sets to improve stability and power.

Parameters:

estimatorestimator, default=LassoCV(…): Estimator used to compute knockoff statistics. Must expose coefficients via coef_ (or best_estimator_.coef_ for CV wrappers) after fit.
ko_generatorobject: Knockoff generator implementing fit(X) and sample(n_repeats, random_state).
n_repeats: int, default=1: Number of knockoff draws to average over.
centeredbool, default=True: If True, standardize X before fitting the generator and computing statistics.
preconfigure_lasso_pathbool, default=True: An optional function is called to configure the LassoCV estimator’s regularization path. The maximum alpha is computed as alpha_max = max(X_ko.T @ y) / (2 * n_features) and an alpha grid of length n_alphas is created between alpha_max * exp(-n_alphas) and alpha_max.
random_stateint or None, default=None: Random seed forwarded to the knockoff generator sampling.
joblib_verboseint, default=0: Verbosity level for parallel jobs.
memorystr, joblib.Memory or None, default=None: Caching backend for expensive operations.
n_jobsint, default=1: Number of parallel jobs (automatically capped to n_repeats).

Attributes:

importances_ndarray, shape (n_repeats, n_features): Test statistics for each repeat.
pvalues_ndarray, shape (n_repeats, n_features): Empirical p-values for each repeat.
threshold_fdr_float: Threshold computed by the FDR selection procedure.
aggregated_pval_ndarray or None: Aggregated p-values (when using p-value aggregation).
aggregated_eval_ndarray or None: Aggregated e-values (when using e-value aggregation).
estimators_list of estimators: List of fitted estimators on the concatenated design matrices for each repeat.
n_features_int: Number of features on which the model was fitted.

Notes

Use the model_x_knockoff function for a functional interface that wraps this class. The class focuses on generator fitting, repeated knockoff sampling, computing statistics and performing FDR-based selection.

__init__(estimator=LassoCV(cv=KFold(n_splits=5, random_state=0, shuffle=True), max_iter=200000, n_jobs=1, random_state=1, tol=1e-06, verbose=0), ko_generator=<hidimstat.samplers.gaussian_knockoffs.GaussianKnockoffs object>, n_repeats=1, centered=True, preconfigure_lasso_path=True, random_state=None, joblib_verbose=0, memory=None, n_jobs=1)[source]#

fit(X, y)[source]#

Fit the knockoff generator and estimators to the data.

Parameters:

Xarray-like of shape (n_samples, n_features): Training data matrix where n_samples is the number of samples and n_features is the number of features.
yarray-like of shape (n_samples,): Target values.

Returns:

selfobject: Returns the instance itself.

importance(X=None, y=None)[source]#

Calculate feature importance scores using Model-X knockoffs.

This method generates knockoff variables and computes test statistics to measure feature importance. For multiple repeats, the scores are averaged across repeats to improve stability.

Parameters:

Xarray-like of shape (n_samples, n_features): Training data matrix where n_samples is the number of samples and n_features is the number of features.
yarray-like of shape (n_samples,): Target values.

Returns:

importances_ndarray of shape (n_features,): Feature importance scores for each feature. Higher absolute values indicate higher importance.

Notes

The method generates knockoff variables that satisfy the exchangeability property and computes test statistics comparing original features against their knockoffs. When n_repeats > 1, multiple sets of knockoffs are generated and results are averaged.

fit_importance(X, y)[source]#

Fits the model to the data and computes feature importance.

Parameters:

Xarray-like of shape (n_samples, n_features): The input data matrix where n_samples is the number of samples and n_features is the number of features.
yarray-like of shape (n_samples,): The target values.
cvNone or cross-validation generator, default=None: Cross-validation parameter. Not used in this method. A warning will be issued if provided.

Returns:

importances_ndarray of shape (n_features,): Feature importance scores (p-values) for each feature. Lower values indicate higher importance. Values range from 0 to 1.

Examples using `hidimstat.ModelXKnockoff`#

Knockoff aggregation

Controlled multiple variable selection on the Wisconsin breast cancer dataset

ModelXKnockoff#

Examples using hidimstat.ModelXKnockoff#

This Page

Examples using `hidimstat.ModelXKnockoff`#