.. DO NOT EDIT. .. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY. .. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE: .. "generated/gallery/examples/plot_fmri_data_example.py" .. LINE NUMBERS ARE GIVEN BELOW. .. only:: html .. note:: :class: sphx-glr-download-link-note :ref:`Go to the end ` to download the full example code. .. rst-class:: sphx-glr-example-title .. _sphx_glr_generated_gallery_examples_plot_fmri_data_example.py: Support Recovery on fMRI Data ============================= This example compares methods based on Desparsified Lasso (DL) to estimate voxel activation maps associated with behavior, specifically decoder map support. All methods presented here provide statistical guarantees. To demonstrate these methods, we use the Haxby dataset, focusing on the 'face vs house' contrast. We analyze labeled activation maps from a single subject to produce a brain map showing the discriminative pattern between these two conditions. This example illustrates that in high-dimensional settings (many voxels), DL becomes impractical due to memory constraints. However, we can overcome this limitation using feature aggregation methods that leverage the spatial structure of the data (high correlation between neighboring voxels). We introduce two feature aggregation methods that maintain statistical guarantees, though with a small spatial tolerance in support detection (i.e., they may identify null covariates "close" to non-null covariates): * Clustered Desparsified Lasso (CLuDL): combines clustering (parcellation) with statistical inference * Ensemble Clustered Desparsified Lasso (EnCluDL): adds randomization to the clustering process EnCluDL is particularly powerful as it doesn't rely on a single clustering choice. As demonstrated in :footcite:t:`chevalier2021decoding`, it produces relevant predictive regions across various tasks. .. GENERATED FROM PYTHON SOURCE LINES 31-35 fetch and preprocess Haxby dataset ---------------------------------------------- We define a function that gathers and preprocesses the Haxby dataset for a given subject. Only the 'face' and 'house' conditions are kept for this example. .. GENERATED FROM PYTHON SOURCE LINES 35-84 .. code-block:: Python import numpy as np import pandas as pd from nilearn import datasets from nilearn.image import mean_img from nilearn.maskers import NiftiMasker from sklearn.utils import Bunch def preprocess_haxby(subject=2, memory=None): """Gathering and preprocessing Haxby dataset for a given subject.""" # Gathering data haxby_dataset = datasets.fetch_haxby(subjects=[subject]) fmri_filename = haxby_dataset.func[0] behavioral = pd.read_csv(haxby_dataset.session_target[0], sep=" ") conditions = behavioral["labels"].values session_label = behavioral["chunks"].values condition_mask = np.logical_or(conditions == "face", conditions == "house") groups = session_label[condition_mask] # Loading anatomical image (back-ground image) if haxby_dataset.anat[0] is None: bg_img = None else: bg_img = mean_img(haxby_dataset.anat, copy_header=True) # Building target where '1' corresponds to 'face' and '-1' to 'house' y = np.asarray((conditions[condition_mask] == "face") * 2 - 1) # Loading mask mask_img = haxby_dataset.mask masker = NiftiMasker( mask_img=mask_img, standardize="zscore_sample", smoothing_fwhm=None, memory=memory, ) # Computing masked data fmri_masked = masker.fit_transform(fmri_filename) X = np.asarray(fmri_masked)[condition_mask, :] return Bunch(X=X, y=y, groups=groups, bg_img=bg_img, masker=masker) .. GENERATED FROM PYTHON SOURCE LINES 85-92 Gathering and preprocessing Haxby dataset for a given subject ------------------------------------------------------------- The `preprocess_haxby` function make the preprocessing of the Haxby dataset, it outputs the preprocessed activation maps for the two conditions 'face' or 'house' (contained in `X`), the conditions (in `y`), the session labels (in `groups`) and the mask (in `masker`). You may choose a subject in [1, 2, 3, 4, 5, 6]. By default subject=2. .. GENERATED FROM PYTHON SOURCE LINES 94-107 .. code-block:: Python import warnings # Remove warnings during loading data warnings.filterwarnings( "ignore", message="The provided image has no sform in its header." ) data = preprocess_haxby(subject=2) X, y, groups, masker = data.X, data.y, data.groups, data.masker mask = masker.mask_img_.get_fdata().astype(bool) .. rst-class:: sphx-glr-script-out .. code-block:: none [fetch_haxby] Dataset found in /home/circleci/nilearn_data/haxby2001 .. GENERATED FROM PYTHON SOURCE LINES 108-111 Initializing FeatureAgglomeration object that performs the clustering --------------------------------------------------------------------- For fMRI data taking 500 clusters is generally a good default choice. .. GENERATED FROM PYTHON SOURCE LINES 111-125 .. code-block:: Python from sklearn.cluster import FeatureAgglomeration from sklearn.feature_extraction import image n_clusters = 500 # Deriving voxels connectivity. shape = mask.shape n_x, n_y, n_z = shape[0], shape[1], shape[2] connectivity = image.grid_to_graph(n_x=n_x, n_y=n_y, n_z=n_z, mask=mask) # Initializing FeatureAgglomeration object. ward = FeatureAgglomeration(n_clusters=n_clusters, connectivity=connectivity) .. GENERATED FROM PYTHON SOURCE LINES 126-128 Making the inference with several algorithms -------------------------------------------- .. GENERATED FROM PYTHON SOURCE LINES 128-144 .. code-block:: Python from sklearn.linear_model import LassoCV from sklearn.model_selection import KFold # Default estimator estimator = LassoCV( eps=1e-2, fit_intercept=False, cv=KFold(n_splits=5, shuffle=True, random_state=0), tol=1e-2, max_iter=6000, random_state=1, n_jobs=1, ) .. GENERATED FROM PYTHON SOURCE LINES 145-148 Due to the very high dimensionality of the problem (close to 40,000 voxels) computing a p-value map with Desparsified Lasso would be very memory consuming and may also be inadequate since it does not take into account the spatial structure of the data. .. GENERATED FROM PYTHON SOURCE LINES 151-153 Now, the clustered inference algorithm which combines parcellation and high-dimensional inference (c.f. References). .. GENERATED FROM PYTHON SOURCE LINES 153-176 .. code-block:: Python from copy import deepcopy from sklearn.base import clone from hidimstat.desparsified_lasso import DesparsifiedLasso from hidimstat.ensemble_clustered_inference import CluDL # number of worker n_jobs = 3 desparsified_lasso_1 = DesparsifiedLasso( noise_method="median", estimator=clone(estimator), random_state=0, n_jobs=1, ) cludl = CluDL( clustering=ward, desparsified_lasso=deepcopy(desparsified_lasso_1), random_state=0 ) cludl.fit_importance(X, y) .. rst-class:: sphx-glr-script-out .. code-block:: none array([ 5.90998411e-06, 5.90998411e-06, 5.90998411e-06, ..., -1.44788141e-03, -1.44788141e-03, -1.44788141e-03], shape=(39912,)) .. GENERATED FROM PYTHON SOURCE LINES 177-184 Below, we run the ensemble clustered inference algorithm which adds a randomization step over the clustered inference algorithm (c.f. References). To make the example as short as possible we take `n_bootstraps=5` which means that 5 different parcellations are considered and then 5 statistical maps are produced and aggregated into one. However you might benefit from clustering randomization taking `n_bootstraps=25` or `n_bootstraps=100`, also we set `n_jobs=n_jobs`. .. GENERATED FROM PYTHON SOURCE LINES 184-198 .. code-block:: Python from hidimstat.ensemble_clustered_inference import EnCluDL encludl = EnCluDL( clustering=ward, desparsified_lasso=deepcopy(desparsified_lasso_1), n_jobs=n_jobs, n_bootstraps=10, cluster_boostrap_size=0.75, random_state=0, ) encludl.fit_importance(X, y) .. rst-class:: sphx-glr-script-out .. code-block:: none Fitting clustered inferences: 0%| | 0/10 [00:00` .. container:: sphx-glr-download sphx-glr-download-python :download:`Download Python source code: plot_fmri_data_example.py ` .. container:: sphx-glr-download sphx-glr-download-zip :download:`Download zipped: plot_fmri_data_example.zip ` .. only:: html .. rst-class:: sphx-glr-signature `Gallery generated by Sphinx-Gallery `_