model_x_knockoff_importance#

hidimstat.model_x_knockoff_importance(X, y, estimator=LassoCV(max_iter=200000), generator=<hidimstat.samplers.gaussian_knockoffs.GaussianKnockoffs object>, n_repeats=1, centered=True, random_state=None, preconfigure_lasso_path=True, joblib_verbose=0, memory=None, n_jobs=1, fdr=0.1, fdr_control='bhq', evalues=False, reshaping_function=None, adaptive_aggregation=False, gamma=0.5)[source]#

Model-X Knockoff

This module implements the Model-X knockoff inference procedure, which is an approach to control the False Discovery Rate (FDR) based on Candes et al.[1]. The original implementation can be found at msesia/knockoff-filter The noisy variables are generated with second-order knockoff variables using the equi-correlated method.

In addition, this function generates multiple sets of Gaussian knockoff variables and calculates the test statistics for each set. It then aggregates the test statistics across the sets to improve stability and power. Parameters ———- estimator : estimator, default=LassoCV(…) Estimator used to compute knockoff statistics. Must expose coefficients via coef_ (or best_estimator_.coef_ for CV wrappers) after fit. ko_generator : object Knockoff generator implementing fit(X) and sample(n_repeats, random_state). n_repeats: int, default=1 Number of knockoff draws to average over. centered : bool, default=True If True, standardize X before fitting the generator and computing statistics. preconfigure_lasso_path : bool, default=True An optional function is called to configure the LassoCV estimator’s regularization path. The maximum alpha is computed as alpha_max = max(X_ko.T @ y) / (2 * n_features) and an alpha grid of length n_alphas is created between alpha_max * exp(-n_alphas) and alpha_max. random_state : int or None, default=None Random seed forwarded to the knockoff generator sampling. joblib_verbose : int, default=0 Verbosity level for parallel jobs. memory : str, joblib.Memory or None, default=None Caching backend for expensive operations. n_jobs : int, default=1 Number of parallel jobs (automatically capped to n_repeats). X : array-like of shape (n_samples, n_features) The input data matrix where n_samples is the number of samples and n_features is the number of features. y : array-like of shape (n_samples,) The target values. cv : None or cross-validation generator, default=None Cross-validation parameter. Not used in this method. A warning will be issued if provided. fdr : float, default=None The target false discovery rate level (between 0 and 1) fdr_control: string, default=”bhq” The FDR control method to use. Options are: - “bhq”: Benjamini-Hochberg procedure - ‘bhy’: Benjamini-Hochberg-Yekutieli procedure - “ebh”: e-BH procedure (only for e-values) evalues: boolean, default=False If True, uses e-values for selection. If False, uses p-values. reshaping_function: callable, default=None Reshaping function for BHY method, default uses sum of reciprocals adaptive_aggregation: boolean, default=False If True, uses adaptive weights for p-value aggregation. Only applicable when evalues=False. gamma: boolean, default=0.5 The gamma parameter for quantile aggregation of p-values. Only used when evalues=False.

Returns:
selection: binary array-like of shape (n_features)
Binary array of the selected features
importancearray-like of shape (n_features)
The computed feature importance scores.
pvaluesarray-like of shape (n_features)
The computed significant of feature for the prediction.