GaussianKnockoffs#
- class hidimstat.samplers.GaussianKnockoffs(cov_estimator=LedoitWolf(assume_centered=True), tol=1e-14)[source]#
Bases:
objectGenerator for second-order Gaussian variables using the equi-correlated method. Creates synthetic variables that preserve the covariance structure of the original variables while ensuring conditional independence between the original and synthetic data.
- Parameters:
- cov_estimatorobject
Estimator for computing the covariance matrix. Must implement fit and have a covariance_ attribute after fitting.
- tolfloat, default=1e-14
Tolerance threshold. While the smallest eigenvalue of \(2\Sigma - diag(S)\) is smaller than this threshold, S is incrementally increased.
- Attributes:
- mu_tilde_ndarray of shape (n_samples, n_features)
Mean matrix for generating synthetic variables.
- sigma_tilde_decompose_ndarray of shape (n_features, n_features)
Cholesky decomposition of the synthetic covariance matrix.
References
- fit(X)[source]#
Fit the Gaussian synthetic variable generator. This method estimates the parameters needed to generate Gaussian synthetic variables based on the input data. It follows a methodology for creating second-order synthetic variables that preserve the covariance structure.
- Parameters:
- Xarray-like of shape (n_samples, n_features)
The input samples used to estimate the parameters for synthetic variable generation. The data is assumed to follow a Gaussian distribution.
- Returns:
- selfobject
Returns the instance itself.
Notes
The method implements the following steps: 1. Centers and scales the data if specified 2. Estimates mean and covariance of input data 3. Computes parameters for synthetic variable generation
- sample(n_repeats: int = 1, random_state=None)[source]#
Generate synthetic variables. This function generates synthetic variables that preserve the covariance structure of the original data while ensuring conditional independence.
- Parameters:
- n_repeatsint, default=1
The number of sets of Gaussian knockoff variables
- random_stateint or None, default=None
The random state to use for sampling.
- Returns:
- X_tilde3D ndarray (n_repeats, n_samples, n_features)
The synthetic variables.
Examples using hidimstat.samplers.GaussianKnockoffs#
Controlled multiple variable selection on the Wisconsin breast cancer dataset