*pySELFI* is a statistical software package which implements the *Simulator Expansion for Likelihood-Free Inference* (*SELFI*) algorithm.

pySELFI is written in python 3 and licensed under GPLv3. The code is publicly available on GitHub, and the documentation is available on Read the Docs.

Limited user-support may be asked by email.

### The SELFI algorithm

SELFI (Leclercq *et al*. 2019) is part of the family of *simulation-based inference* (SBI) methods, which replace the use of the likelihood function with a data-generating *"black-box" simulator*. It builds upon a Gaussian effective likelihood, and upon the linearisation of black-box models around an expansion point. Further assuming that the prior is Gaussian, the effective posterior is Gaussian, with mean and covariance given by two *"filter equations"* reproduced below. The computational workload is fixed *a priori* by the user, and perfectly parallel.

*top*), and a summary of the statistical variables appearing in the equations and their interpretation in the context of galaxy survey data analysis (

*bottom*).

An article on the Aquila Consortium's website discusses SELFI (in particular in comparison to BOLFI). It is available here: Algorithms for likelihood-free cosmological data analysis.

Primordial matter power spectrum inference with pySELFI, using a Gaussian random field data model. The measurements of the final galaxy power spectrum in BOSS (0.2 <*z*< 0.5 bin, SGC, pre-recon, Beutler

*et al.*2016) are overplotted.

### Check for model misspecification

As discussed in Leclercq (2022), SELFI can be used within a two-step framework to perform SBI of a general class of Bayesian hierarchical models (BHMs), while *checking for model misspecification*. First, the latent function that appears as second layer of the BHM is inferred with SELFI and used to diagnose possible model misspecification. Second, target parameters of the trusted model are inferred via SBI.

Simulations used in the first step are recycled for *score compression*, which is necessary to the second step.

A python package that implements the data model used to test the framework, described in section III of Leclercq (2022), *lotkavolterra_simulator*, is available here.

### Key features of pySELFI

Current *key features* of pySELFI are:

- implementation of the core "filter equations" of SELFI,
- support of different black-box simulators,
- input and output from simulation pools stored as hdf5 files,
- optimisation of the prior hyperparameters for primordial power spectrum inference,
- check for model misspecification using the Mahalanobis distance,
- score compression and calculation of Fisher-Rao distances.

### Reference

To acknowledge the use of pySELFI in research papers, please cite its doi:10.5281/zenodo.3341588 (or for the latest version, see the badge above), as well as the papers Leclercq *et al*. (2019) and Leclercq (2022):

Primordial power spectrum and cosmology from black-box galaxy surveys

F. Leclercq, W. Enzi, J. Jasche, A. Heavens

MNRAS**490**, 4237 (2019), arXiv:1902.10149 [astro-ph.CO] ADS pdfSimulation-based inference of Bayesian hierarchical models while checking for model misspecification

F. Leclercq

Proceedings of the 41st International Conference on Bayesian and Maximum Entropy methods in Science and Engineering (MaxEnt2022), 18-22 July 2022, Paris, France

Physical Sciences Forum**5**, 4 (2022), arXiv:2209.11057 [stat.ME] ADS pdf