GaussianKnockoffs#

class hidimstat.samplers.GaussianKnockoffs(cov_estimator=LedoitWolf(assume_centered=True), tol=1e-14)[source]#

Bases: object

Generator for second-order Gaussian variables using the equi-correlated method. Creates synthetic variables that preserve the covariance structure of the original variables while ensuring conditional independence between the original and synthetic data.

Parameters:
cov_estimatorobject

Estimator for computing the covariance matrix. Must implement fit and have a covariance_ attribute after fitting.

tolfloat, default=1e-14

Tolerance threshold. While the smallest eigenvalue of \(2\Sigma - diag(S)\) is smaller than this threshold, S is incrementally increased.

Attributes:
mu_tilde_ndarray of shape (n_samples, n_features)

Mean matrix for generating synthetic variables.

sigma_tilde_decompose_ndarray of shape (n_features, n_features)

Cholesky decomposition of the synthetic covariance matrix.

References

__init__(cov_estimator=LedoitWolf(assume_centered=True), tol=1e-14)[source]#
fit(X)[source]#

Fit the Gaussian synthetic variable generator. This method estimates the parameters needed to generate Gaussian synthetic variables based on the input data. It follows a methodology for creating second-order synthetic variables that preserve the covariance structure.

Parameters:
Xarray-like of shape (n_samples, n_features)

The input samples used to estimate the parameters for synthetic variable generation. The data is assumed to follow a Gaussian distribution.

Returns:
selfobject

Returns the instance itself.

Notes

The method implements the following steps: 1. Centers and scales the data if specified 2. Estimates mean and covariance of input data 3. Computes parameters for synthetic variable generation

sample(n_repeats: int = 1, random_state=None)[source]#

Generate synthetic variables. This function generates synthetic variables that preserve the covariance structure of the original data while ensuring conditional independence.

Parameters:
n_repeatsint, default=1

The number of sets of Gaussian knockoff variables

random_stateint or None, default=None

The random state to use for sampling.

Returns:
X_tilde3D ndarray (n_repeats, n_samples, n_features)

The synthetic variables.

Examples using hidimstat.samplers.GaussianKnockoffs#

Controlled multiple variable selection on the Wisconsin breast cancer dataset

Controlled multiple variable selection on the Wisconsin breast cancer dataset