CluDL#

class hidimstat.CluDL(clustering, desparsified_lasso=DesparsifiedLasso(), cluster_boostrap_size=1.0, bootstrap_groups=None, random_state=None, memory=None)[source]#

Bases: BaseVariableImportance

Clustered inference with desparsified lasso.

This algorithm computes a single clustered inference on groups of features using the desparsified lasso method for statistical inference.

Parameters:
clustering: sklearn.cluster.FeatureAgglomeration

An instance of a clustering method that operates on features.

desparsified_lasso: DesparsifiedLasso

An instance of the DesparsifiedLasso class for statistical inference.

cluster_boostrap_size: float, optional (default=1.0)

Fraction of samples used for computing the clustering. When cluster_boostrap_size=1.0, all samples are used.

bootstrap_groups: ndarray, shape (n_samples,), optional (default=None)

Sample group labels for stratified subsampling.

random_state: int, optional (default=None)

Random seed for reproducible subsampling.

memoryjoblib.Memory or str, optional (default=None)

Used to cache the output of the clustering and inference computation. By default, no caching is done. If provided, it should be the path to the caching directory or a joblib.Memory object.

Attributes:
desparsified_lasso_DesparsifiedLasso

Fitted desparsified lasso estimator.

clustering_sklearn.cluster.FeatureAgglomeration

Fitted clustering object.

clustering_samples_ndarray, (n_samples*cluster_boostrap_size,)

Indices of samples used for clustering.

importances_ndarray, shape (n_clusters,) or (n_clusters, n_tasks)

Estimated coefficients at cluster level.

pvalues_ndarray, shape (n_clusters,)

P-values for each cluster.

n_features_int

Number of features in the original data.

__init__(clustering, desparsified_lasso=DesparsifiedLasso(), cluster_boostrap_size=1.0, bootstrap_groups=None, random_state=None, memory=None)[source]#
fit(X, y)[source]#

Fit the clustering and desparsified lasso on the data.

Parameters:
Xndarray, shape (n_samples, n_features)

Input data matrix.

yndarray, shape (n_samples,) or (n_samples, n_tasks)

Target variable(s).

Returns:
selfCluDL

Fitted estimator.

importance(X=None, y=None)[source]#

Compute feature importance using desparsified lasso. Then map the importance scores from cluster level back to feature level.

Parameters:
X

Not used, present for API consistency by convention.

y

Not used, present for API consistency by convention.

fit_importance(X, y)[source]#

Fit the model and compute feature importance.

Parameters:
Xndarray, shape (n_samples, n_features)

Input data matrix.

yndarray, shape (n_samples,) or (n_samples, n_tasks)

Target variable(s).

Returns:
selfCluDL

Fitted estimator with computed importances.

fdr_selection(fdr, fdr_control='bhq', reshaping_function=None, two_tailed_test=True)[source]#

Overrides the signature to set two_tailed_test=True by default.

fwer_selection(fwer, procedure='bonferroni', n_tests=None, two_tailed_test=False)[source]#

Performs feature selection based on Family-Wise Error Rate (FWER) control.

Parameters:
fwerfloat

The target family-wise error rate level (between 0 and 1)

procedure{‘bonferroni’}, default=’bonferroni’

The FWER control method to use: - ‘bonferroni’: Bonferroni correction

n_testsint or None, default=None

Factor for multiple testing correction. If None, uses the number of clusters or the number of features in this order.

two_tailed_testbool, default=False

If True, uses the sign of the importance scores to indicate whether the selected features have positive or negative effects.

Returns:
selectedndarray of int

Integer array indicating the selected features. 1 indicates selected features with positive effects, -1 indicates selected features with negative effects, 0 indicates non-selected features.

get_metadata_routing()[source]#

Get metadata routing of this object.

Please check User Guide on how the routing mechanism works.

Returns:
routingMetadataRequest

A MetadataRequest encapsulating routing information.

get_params(deep=True)[source]#

Get parameters for this estimator.

Parameters:
deepbool, default=True

If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns:
paramsdict

Parameter names mapped to their values.

importance_selection(k_best=None, percentile=None, threshold_max=None, threshold_min=None)[source]#

Selects features based on variable importance.

Parameters:
k_bestint, default=None

Selects the top k features based on importance scores.

percentilefloat, default=None

Selects features based on a specified percentile of importance scores.

threshold_maxfloat, default=None

Selects features with importance scores below the specified maximum threshold.

threshold_minfloat, default=None

Selects features with importance scores above the specified minimum threshold.

Returns:
selectionarray-like of shape (n_features,)

Binary array indicating the selected features.

plot_importance(ax=None, ascending=False, feature_names=None, **seaborn_barplot_kwargs)[source]#

Plot feature importances as a horizontal bar plot.

Parameters:
axmatplotlib.axes.Axes or None, (default=None)

Axes object to draw the plot onto, otherwise uses the current Axes.

ascending: bool, default=False

Whether to sort features by ascending importance.

**seaborn_barplot_kwargsadditional keyword arguments

Additional arguments passed to seaborn.barplot. https://seaborn.pydata.org/generated/seaborn.barplot.html

Returns:
axmatplotlib.axes.Axes

The Axes object with the plot.

pvalue_selection(k_lowest=None, percentile=None, threshold_max=0.05, threshold_min=None, alternative_hypothesis=False)[source]#

Selects features based on p-values.

Parameters:
k_lowestint, default=None

Selects the k features with lowest p-values.

percentilefloat, default=None

Selects features based on a specified percentile of p-values.

threshold_maxfloat, default=0.05

Selects features with p-values below the specified maximum threshold (0 to 1).

threshold_minfloat, default=None

Selects features with p-values above the specified minimum threshold (0 to 1).

alternative_hypothesisbool, default=False

If True, selects based on 1-pvalues instead of p-values.

Returns:
selectionarray-like of shape (n_features,)

Binary array indicating the selected features (True for selected).

set_params(**params)[source]#

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters:
**paramsdict

Estimator parameters.

Returns:
selfestimator instance

Estimator instance.

Examples using hidimstat.CluDL#

Support Recovery on fMRI Data

Support Recovery on fMRI Data

Ensemble Clustered Inference on 2D Data

Ensemble Clustered Inference on 2D Data