Post hoc inference via multiple testing. Project ANR-16-CE40-0019 (2016-2021).
Rationale of the project
The number and size of available data sets of different types has increased dramatically over the past twenty years. This “data deluge” has been accompanied by a shift from hypothesis-driven research to data-driven research in many scientific fields including astronomy, biology, genetics, or medicine. Analyzing and interpreting such data require innovative approaches for the simultaneous testing of a large number of biological hypotheses.
This project gathers specialists of multiple testing theory, high-dimensional data analysis, and genomics. It aims at filling a gap between the statistical guarantees provided by state-of-the-art multiple testing procedures and the actual needs of practitioners.
We propose to develop “post hoc” procedures (in the sense of Goeman and Solari, Statistical Science, 2011), which provide confidence statements on the number or proportion of false positives among any subset of hypotheses chosen by the user after analyzing the data. Both theoretical and applied aspects of post hoc multiple testing will be covered.
Main events
Mar 3-4, 2022: Workshop “Post-selection inference for genomic and neuroimaging data”, Toulouse. With A. Blain, G. Blanchard, S. Davenport, G. Durand, N. Enjalbert-Courrech, J. Gonzales-Delgado, A. Marandon, C. Maugis-Rabusseau, I. Meah, P. Neuvial, L. Risser, E. Roquain, M. Perrot-Dockès, B. Thirion.
Jun 15-19, 2020: Participation to the scientific committee of the Mathematical Methods of Modern Statistics 2 conference at CIRM (Luminy, France). This conference has been virtualized.
Mar 10-12, 2020: ANR meeting, Paris. With G. Blanchard, M. Perrot-Dockès, P. Neuvial, E. Roquain.
Dec 12-15, 2019: Participation of M. Perrot-Dockès, P. Neuvial, E. Roquain and F. Villers at MCP 2019 in Taiwan. Organization of a session on post-selection inference and multiple testing.
Apr 8, 2019: ANR meeting, Paris. With G. Blanchard, G. Durand, M. Perrot-Dockès, P. Neuvial, G. Rigaill, E. Roquain, B. Sadacca.
Feb 7-9, 2018: Workshop “Post-selection inference and multiple testing” in Toulouse. This event is part of a thematic semester Mathematics and Computer Science for biology organized by CIMI, the International Centre for Mathematics and Computer Science in Toulouse.
January 6, 2017: Kick-off meeting, Evry.
Preprints
Notip: Non-parametric True Discovery Proportion estimation for brain imaging
@misc{enjalbert-courrech2022powerful,title={{Powerful and interpretable control of false discoveries in differential expression studies}},author={Enjalbert-Courrech, Nicolas and Neuvial, Pierre},url={https://hal.archives-ouvertes.fr/hal-03601095},year={2022},month=mar,doi={10.1101/2022.03.08.483449},hal_id={hal-03601095},hal_version={v1},category={submitted}}
False clustering rate control in mixture models
Ariane Marandon, Tabea Rebafka, Etienne Roquain, and Nataliya Sokolovska
@misc{marandon2022false,title={False clustering rate control in mixture models},author={Marandon, Ariane and Rebafka, Tabea and Roquain, Etienne and Sokolovska, Nataliya},year={2022},category={submitted}}
Online multiple testing with super-uniformity reward
@misc{doehler2021online,title={Online multiple testing with super-uniformity reward},author={D{\"o}hler, Sebastian and Meah, Iqraa and Roquain, Etienne},year={2021},category={submitted}}
Sharp multiple testing boundary for sparse sequences
@misc{abraham2021sharp,title={Sharp multiple testing boundary for sparse sequences},author={Abraham, Kweku and Castillo, Ismael and Roquain, Etienne},year={2021},category={submitted}}
Selective inference for the false discovery proportion in a Hidden Markov Model.
@misc{PBNR,title={Selective inference for the false discovery proportion in a Hidden Markov Model.},author={Perrot-Dockès, Marie and Blanchard, Gilles and Neuvial, Pierre and Roquain, Etienne},year={2021},hal_id={hal-03214472},category={submitted}}
DiscreteFDR: An R package for controlling the false discovery rate for discrete test statistics
@misc{durand2019discretefdr,title={DiscreteFDR: An R package for controlling the false discovery rate for discrete test statistics},author={Durand, Guillermo and Junge, Florian and D{\"o}hler, Sebastian and Roquain, Etienne},journal={arXiv preprint},year={2019},category={submitted}}
@article{mary2021semi-supervised,title={Semi-supervised multiple testing},author={Mary, David and Roquain, Etienne},journal={arXiv preprint arXiv:2106.13501},year={2021}}
@article{rebafka2019graph,title={Graph inference with clustering and false discovery rate control},author={Rebafka, Tabea and Roquain, Etienne and Villers, Fanny},journal={Electronic Journal of Statistics},year={2022},category={to appear},}
@article{roquain2019false,title={False discovery rate control with unknown null distribution: is it possible to mimic the oracle?},author={Roquain, Etienne and Verzelen, Nicolas},journal={Annals of Statistics},year={to appear},}
Error rate control for classification rules in multiclass mixture models
Tristan Mary-Huard, Vittorio Perduca, Marie Laure Martin-Magniette, and Gilles Blanchard
@article{mary-huard2021error,title={{Error rate control for classification rules in multiclass mixture models}},author={Mary-Huard, Tristan and Perduca, Vittorio and Martin-Magniette, Marie Laure and Blanchard, Gilles},url={https://hal-universite-paris-saclay.archives-ouvertes.fr/hal-03357461},journal={{International Journal of Biostatistics}},publisher={{De Gruyter}},year={2021},doi={10.1515/ijb-2020-0105},hal_id={hal-03357461},hal_version={v1}}
@article{doehler2020controlling,title={Controlling the false discovery exceedance for heterogeneous tests},author={D{\"o}hler, Sebastian and Roquain, Etienne},journal={Electronic Journal of Statistics},volume={14},number={2},pages={4244--4272},year={2020},publisher={Institute of Mathematical Statistics and Bernoulli Society},category={published},}
@article{carpentier2018estimating,title={Estimating minimum effect with outlier selection},author={Carpentier, Alexandra and Delattre, Sylvain and Roquain, Etienne and Verzelen, Nicolas},journal={The Annals of Statistics},volume={49},number={1},pages={272--294},year={2021},publisher={Institute of Mathematical Statistics},hal_id={hal-03173293v1}}
@article{castillo2020spike,title={On spike and slab empirical Bayes multiple testing},author={Castillo, Ismael and Roquain, Etienne},journal={Annals of Statistics},year={2020},url={https://projecteuclid.org/euclid.aos/1600480923},}
@article{durand20post-hoc,title={{Post hoc false positive control for structured hypotheses}},author={Durand, Guillermo and Blanchard, Gilles and Neuvial, Pierre and Roquain, Etienne},url={https://dx.doi.org/10.1111/sjos.12453},year={2020},keywords={ multiple testing ; Simes inequality ; post hoc inference ; selective inference ; Forest structure ; DKW inequality},hal_id={hal-01829037},hal_version={v1},journal={Scandinavian Journal of Statistics},}
@article{blanchard20post-hoc,title={Post Hoc Confidence Bounds on False Positives Using Reference Families},author={Blanchard, Gilles and Neuvial, Pierre and Roquain, Etienne},url={https://projecteuclid.org/euclid.aos/1594972818},journal={Annals of Statistics},year={2020},volume={48},number={3},pages={1281--1303},hal_id={hal-01483585},}
@article{bachoc18posi,author={Bachoc, Fran\c{c}ois and Blanchard, Gilles and Neuvial, Pierre},title={On the post selection inference constant under restricted isometry properties},journal={Electron. J. Statist.},fjournal={Electronic Journal of Statistics},year={2018},volume={12},number={2},pages={3736-3757},issn={1935-7524},doi={10.1214/18-EJS1490},sici={1935-7524(2018)12:2<3736:OTPSIC>2.0.CO;2-0},url={https://hal.archives-ouvertes.fr/hal-01772538},hal_id={hal-01772538},hal_version={v2},month=nov,}
@article{picard18continuous,title={Continuous testing for Poisson process intensities: a new perspective on scanning statistics},author={Picard, Franck and Reynaud-Bouret, Patricia and Roquain, Etienne},journal={Biometrika},volume={105},number={4},pages={931--944},year={2018},publisher={Oxford University Press},}
@article{doehler2018,author={D{\"o}hler, Sebastian and Durand, Guillermo and Roquain, Etienne},doi={10.1214/18-EJS1441},fjournal={Electronic Journal of Statistics},journal={Electron. J. Statist.},number={1},pages={1867--1900},publisher={The Institute of Mathematical Statistics and the Bernoulli Society},title={New {FDR} bounds for discrete and heterogeneous tests},url={https://doi.org/10.1214/18-EJS1441},volume={12},year={2018},}
Université Paris-Saclay, Institut de Mathématiques d’Orsay
Open source software
The R package sanssouci implements most of the methods developed in the course of the project. The R package sanssouci.data stores examples of genomic and neuroimaging data that can be used within the main package. A Python implementation is also availble since 2021.