HiDimStat: High-dimensional statistical inference tool for Python#

Build CircleCI/Documentation codecov codestyle

The HiDimStat package provides statistical inference methods to solve the problem of support recovery in the context of high-dimensional and spatially structured data.

Installation#

HiDimStat working only with Python 3, ideally Python 3.10+. For installation, run the following from terminal:

pip install hidimstat

Or if you want the latest version available (for example to contribute to the development of this project):

git clone https://github.com/mind-inria/hidimstat.git
cd hidimstat
pip install -e .

Dependencies#

HiDimStat depends on the following packages:

joblib
numpy
pandas
scipy
scikit-learn
tqdm

To run examples it is necessary to install seaborn, and to run tests it is also needed to install pytest.

Documentation & Examples#

Documentation about the main HiDimStat functions is available here and examples are available there.

As of now, there are three different examples (Python scripts) that illustrate how to use the main HiDimStat functions. In each example we handle a different kind of dataset: plot_2D_simulation_example.py handles a simulated dataset with a 2D spatial structure, plot_fmri_data_example.py solves the decoding problem on Haxby fMRI dataset, plot_meg_data_example.py tackles the source localization problem on several MEG/EEG datasets.

# For example run the following command in terminal
python plot_2D_simulation_example.py

Build the documentation#

To build the documentation you will need to run:

pip install -U '.[doc]'
cd docs
make html

References#

The algorithms developed in this package have been detailed in several conference/journal articles that can be downloaded at https://team.inria.fr/mind/publications/.

Main references#

  • Ensemble of Clustered desparsified Lasso (ECDL): Chevalier et al. [2018], Chevalier et al. [2022]

  • Aggregation of multiple Knockoffs (AKO): Nguyen et al. [2020]

  • Application to decoding (fMRI data): Chevalier et al. [2021]

  • Application to source localization (MEG/EEG data): Chevalier et al. [2020]

  • Single/Group statistically validated importance using conditional permutations: Chamma et al. [2023], Chamma et al. [2024]

If you use our packages, we would appreciate citations to the relevant aforementioned papers.

Other useful references#

  • For de-sparsified (or de-biased) Lasso: Javanmard and Montanari [2014], [Zhang and Zhang, 2014], van de Geer et al. [2014]

  • For Knockoffs Inference: Barber and Candès [2015], Candes et al. [2018]

References#

[BCandes15]

Rina Foygel Barber and Emmanuel J Candès. Controlling the false discovery rate via knockoffs. The Annals of statistics, pages 2055–2085, 2015.

[CFJL18]

Emmanuel Candes, Yingying Fan, Lucas Janson, and Jinchi Lv. Panning for gold:'model-x' knockoffs for high dimensional controlled variable selection. Journal of the Royal Statistical Society Series B: Statistical Methodology, 80(3):551–577, 2018.

[CET23]

Ahmad Chamma, Denis A. Engemann, and Bertrand Thirion. Statistically valid variable importance assessment through conditional permutations. In A. Oh, T. Naumann, A. Globerson, K. Saenko, M. Hardt, and S. Levine, editors, Advances in Neural Information Processing Systems, volume 36, 67662–67685. Curran Associates, Inc., 2023.

[CTE24]

Ahmad Chamma, Bertrand Thirion, and Denis Engemann. Variable importance in high-dimensional settings requires grouping. Proceedings of the AAAI Conference on Artificial Intelligence, 38(10):11195–11203, 2024. doi:10.1609/aaai.v38i10.28997.

[CNS+21]

Jérôme-Alexis Chevalier, Tuan-Binh Nguyen, Joseph Salmon, Gaël Varoquaux, and Bertrand Thirion. Decoding with confidence: statistical control on decoder maps. NeuroImage, 234:117921, 2021.

[CNTS22]

Jérôme-Alexis Chevalier, Tuan-Binh Nguyen, Bertrand Thirion, and Joseph Salmon. Spatially relaxed inference on high-dimensional linear models. Statistics and Computing, 32(5):83, 2022.

[CSGT20]

Jérôme-Alexis Chevalier, Joseph Salmon, Alexandre Gramfort, and Bertrand Thirion. Statistical control for spatio-temporal meg/eeg source imaging with desparsified mutli-task lasso. Advances in Neural Information Processing Systems, 33:1759–1770, 2020.

[CST18]

Jérôme-Alexis Chevalier, Joseph Salmon, and Bertrand Thirion. Statistical inference with ensemble of clustered desparsified lasso. In International Conference on Medical Image Computing and Computer-Assisted Intervention, 638–646. Springer, 2018.

[JM14]

Adel Javanmard and Andrea Montanari. Confidence intervals and hypothesis testing for high-dimensional regression. The Journal of Machine Learning Research, 15(1):2869–2909, 2014.

[NCTA20]

Tuan-Binh Nguyen, Jerome-Alexis Chevalier, Bertrand Thirion, and Sylvain Arlot. Aggregation of multiple knockoffs. In Hal Daumé III and Aarti Singh, editors, Proceedings of the 37th International Conference on Machine Learning, volume 119 of Proceedings of Machine Learning Research, 7283–7293. PMLR, 2020.

[vdGBuhlmannRD14]

Sara van de Geer, Peter Bühlmann, Ya'acov Ritov, and Ruben Dezeure. On asymptotically optimal confidence regions and tests for high-dimensional models. The Annals of Statistics, pages 1166–1202, 2014.

[ZZ14]

Cun-Hui Zhang and Stephanie S Zhang. Confidence intervals for low dimensional parameters in high dimensional linear models. Journal of the Royal Statistical Society Series B: Statistical Methodology, 76(1):217–242, 2014.