Florent Leclercq

Last update: 03-02-2023

pySELFI is a statistical software package which implements the Simulator Expansion for Likelihood-Free Inference (SELFI) algorithm.

pySELFI is written in python 3 and licensed under GPLv3. The code is publicly available on GitHub, and the documentation is available on Read the Docs.

Limited user-support may be asked by email.

The SELFI algorithm

SELFI (Leclercq et al. 2019) is part of the family of simulation-based inference (SBI) methods, which replace the use of the likelihood function with a data-generating "black-box" simulator. It builds upon a Gaussian effective likelihood, and upon the linearisation of black-box models around an expansion point. Further assuming that the prior is Gaussian, the effective posterior is Gaussian, with mean and covariance given by two "filter equations" reproduced below. The computational workload is fixed a priori by the user, and perfectly parallel.

The SELFI "filter equations" (top), and a summary of the statistical variables appearing in the equations and their interpretation in the context of galaxy survey data analysis (bottom).

An article on the Aquila Consortium's website discusses SELFI (in particular in comparison to BOLFI). It is available here: Algorithms for likelihood-free cosmological data analysis.

SELFI: primordial power spectrum inference

Primordial matter power spectrum inference with pySELFI, using a Gaussian random field data model. The measurements of the final galaxy power spectrum in BOSS (0.2 < z < 0.5 bin, SGC, pre-recon, Beutler et al. 2016) are overplotted.

Check for model misspecification

As discussed in Leclercq (2022), SELFI can be used within a two-step framework to perform SBI of a general class of Bayesian hierarchical models (BHMs), while checking for model misspecification. First, the latent function that appears as second layer of the BHM is inferred with SELFI and used to diagnose possible model misspecification. Second, target parameters of the trusted model are inferred via SBI.

Simulations used in the first step are recycled for score compression, which is necessary to the second step.

A python package that implements the data model used to test the framework, described in section III of Leclercq (2022), lotkavolterra_simulator, is available here.

Key features of pySELFI

Current key features of pySELFI are:

implementation of the core "filter equations" of SELFI,
support of different black-box simulators,
input and output from simulation pools stored as hdf5 files,
optimisation of the prior hyperparameters for primordial power spectrum inference,
check for model misspecification using the Mahalanobis distance,
score compression and calculation of Fisher-Rao distances.

Reference

To acknowledge the use of pySELFI in research papers, please cite its doi:10.5281/zenodo.3341588 (or for the latest version, see the badge above), as well as the papers Leclercq et al. (2019) and Leclercq (2022):

Primordial power spectrum and cosmology from black-box galaxy surveys
F. Leclercq, W. Enzi, J. Jasche, A. Heavens
MNRAS 490, 4237 (2019), arXiv:1902.10149 [astro-ph.CO] ADS pdf
Simulation-based inference of Bayesian hierarchical models while checking for model misspecification
F. Leclercq
Proceedings of the 41st International Conference on Bayesian and Maximum Entropy methods in Science and Engineering (MaxEnt2022), 18-22 July 2022, Paris, France
Physical Sciences Forum 5, 4 (2022), arXiv:2209.11057 [stat.ME] ADS pdf

Florent Leclercq

Chargé de recherche CNRS, Institut d'Astrophysique de Paris

pySELFI

The SELFI algorithm

Check for model misspecification

Key features of pySELFI

Reference

Public data and software