CluDL#
- class hidimstat.CluDL(clustering, desparsified_lasso=DesparsifiedLasso(), cluster_boostrap_size=1.0, bootstrap_groups=None, random_state=None, memory=None)[source]#
Bases:
BaseVariableImportanceClustered inference with desparsified lasso.
This algorithm computes a single clustered inference on groups of features using the desparsified lasso method for statistical inference.
- Parameters:
- clustering: sklearn.cluster.FeatureAgglomeration
An instance of a clustering method that operates on features.
- desparsified_lasso: DesparsifiedLasso
An instance of the DesparsifiedLasso class for statistical inference.
- cluster_boostrap_size: float, optional (default=1.0)
Fraction of samples used for computing the clustering. When cluster_boostrap_size=1.0, all samples are used.
- bootstrap_groups: ndarray, shape (n_samples,), optional (default=None)
Sample group labels for stratified subsampling.
- random_state: int, optional (default=None)
Random seed for reproducible subsampling.
- memoryjoblib.Memory or str, optional (default=None)
Used to cache the output of the clustering and inference computation. By default, no caching is done. If provided, it should be the path to the caching directory or a joblib.Memory object.
- Attributes:
- desparsified_lasso_DesparsifiedLasso
Fitted desparsified lasso estimator.
- clustering_sklearn.cluster.FeatureAgglomeration
Fitted clustering object.
- clustering_samples_ndarray, (n_samples*cluster_boostrap_size,)
Indices of samples used for clustering.
- importances_ndarray, shape (n_clusters,) or (n_clusters, n_tasks)
Estimated coefficients at cluster level.
- pvalues_ndarray, shape (n_clusters,)
P-values for each cluster.
- n_features_int
Number of features in the original data.
- __init__(clustering, desparsified_lasso=DesparsifiedLasso(), cluster_boostrap_size=1.0, bootstrap_groups=None, random_state=None, memory=None)[source]#
- fit(X, y)[source]#
Fit the clustering and desparsified lasso on the data.
- Parameters:
- Xndarray, shape (n_samples, n_features)
Input data matrix.
- yndarray, shape (n_samples,) or (n_samples, n_tasks)
Target variable(s).
- Returns:
- selfCluDL
Fitted estimator.
- importance(X=None, y=None)[source]#
Compute feature importance using desparsified lasso. Then map the importance scores from cluster level back to feature level.
- Parameters:
- X
Not used, present for API consistency by convention.
- y
Not used, present for API consistency by convention.
- fit_importance(X, y)[source]#
Fit the model and compute feature importance.
- Parameters:
- Xndarray, shape (n_samples, n_features)
Input data matrix.
- yndarray, shape (n_samples,) or (n_samples, n_tasks)
Target variable(s).
- Returns:
- selfCluDL
Fitted estimator with computed importances.
- fdr_selection(fdr, fdr_control='bhq', reshaping_function=None, two_tailed_test=True)[source]#
Overrides the signature to set two_tailed_test=True by default.
- fwer_selection(fwer, procedure='bonferroni', n_tests=None, two_tailed_test=False)[source]#
Performs feature selection based on Family-Wise Error Rate (FWER) control.
- Parameters:
- fwerfloat
The target family-wise error rate level (between 0 and 1)
- procedure{‘bonferroni’}, default=’bonferroni’
The FWER control method to use: - ‘bonferroni’: Bonferroni correction
- n_testsint or None, default=None
Factor for multiple testing correction. If None, uses the number of clusters or the number of features in this order.
- two_tailed_testbool, default=False
If True, uses the sign of the importance scores to indicate whether the selected features have positive or negative effects.
- Returns:
- selectedndarray of int
Integer array indicating the selected features. 1 indicates selected features with positive effects, -1 indicates selected features with negative effects, 0 indicates non-selected features.
- get_metadata_routing()[source]#
Get metadata routing of this object.
Please check User Guide on how the routing mechanism works.
- Returns:
- routingMetadataRequest
A
MetadataRequestencapsulating routing information.
- get_params(deep=True)[source]#
Get parameters for this estimator.
- Parameters:
- deepbool, default=True
If True, will return the parameters for this estimator and contained subobjects that are estimators.
- Returns:
- paramsdict
Parameter names mapped to their values.
- importance_selection(k_best=None, percentile=None, threshold_max=None, threshold_min=None)[source]#
Selects features based on variable importance.
- Parameters:
- k_bestint, default=None
Selects the top k features based on importance scores.
- percentilefloat, default=None
Selects features based on a specified percentile of importance scores.
- threshold_maxfloat, default=None
Selects features with importance scores below the specified maximum threshold.
- threshold_minfloat, default=None
Selects features with importance scores above the specified minimum threshold.
- Returns:
- selectionarray-like of shape (n_features,)
Binary array indicating the selected features.
- plot_importance(ax=None, ascending=False, feature_names=None, **seaborn_barplot_kwargs)[source]#
Plot feature importances as a horizontal bar plot.
- Parameters:
- axmatplotlib.axes.Axes or None, (default=None)
Axes object to draw the plot onto, otherwise uses the current Axes.
- ascending: bool, default=False
Whether to sort features by ascending importance.
- **seaborn_barplot_kwargsadditional keyword arguments
Additional arguments passed to seaborn.barplot. https://seaborn.pydata.org/generated/seaborn.barplot.html
- Returns:
- axmatplotlib.axes.Axes
The Axes object with the plot.
- pvalue_selection(k_lowest=None, percentile=None, threshold_max=0.05, threshold_min=None, alternative_hypothesis=False)[source]#
Selects features based on p-values.
- Parameters:
- k_lowestint, default=None
Selects the k features with lowest p-values.
- percentilefloat, default=None
Selects features based on a specified percentile of p-values.
- threshold_maxfloat, default=0.05
Selects features with p-values below the specified maximum threshold (0 to 1).
- threshold_minfloat, default=None
Selects features with p-values above the specified minimum threshold (0 to 1).
- alternative_hypothesisbool, default=False
If True, selects based on 1-pvalues instead of p-values.
- Returns:
- selectionarray-like of shape (n_features,)
Binary array indicating the selected features (True for selected).
- set_params(**params)[source]#
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as
Pipeline). The latter have parameters of the form<component>__<parameter>so that it’s possible to update each component of a nested object.- Parameters:
- **paramsdict
Estimator parameters.
- Returns:
- selfestimator instance
Estimator instance.