Computational study protocol: leveraging synthetic data to validate a benchmark study for differential abundance tests for 16s microbiome sequencing data

Abstract: Background
Synthetic data’s utility in benchmark studies depends on its ability to closely mimic real-world conditions and reproduce results obtained from experimental data. Building on Nearing et al.’s study (1), who assessed 14 differential abundance tests using 38 experimental 16S rRNA datasets in a case-control design, we are generating synthetic datasets that mimic the experimental data to verify their findings. We will employ statistical tests to rigorously assess the similarity between synthetic and experimental data and to validate the conclusions on the performance of these tests drawn by Nearing et al. (1). This protocol adheres to the SPIRIT guidelines, demonstrating how established reporting frameworks can support robust, transparent, and unbiased study planning.

Methods
We replicate Nearing et al.’s (1) methodology, incorporating synthetic data simulated using two distinct tools, mirroring the 38 experimental datasets. Equivalence tests will be conducted on a non-redundant subset of 46 data characteristics comparing synthetic and experimental data, complemented by principal component analysis for overall similarity assessment. The 14 differential abundance tests will be applied to synthetic and experimental datasets, evaluating the consistency of significant feature identification and the number of significant features per tool. Correlation analysis and multiple regression will explore how differences between synthetic and experimental data characteristics may affect the results.

Conclusions
Synthetic data enables the validation of findings through controlled experiments. We assess how well synthetic data replicates experimental data, try to validate previous findings with the most recent versions of the DA methods and delineate the strengths and limitations of synthetic data in benchmark studies. Moreover, to our knowledge this is the first computational benchmark study to systematically incorporate synthetic data for validating differential abundance methods while strictly adhering to a pre-specified study protocol following SPIRIT guidelines, contributing to transparency, reproducibility, and unbiased research

Standort
Deutsche Nationalbibliothek Frankfurt am Main
Umfang
Online-Ressource
Sprache
Englisch
Anmerkungen
F1000Research. - 13 (2024) , 1180, ISSN: 2046-1402

Klassifikation
Informatik

Ereignis
Veröffentlichung
(wo)
Freiburg
(wer)
Universität
(wann)
2025
Urheber
Kohnert, Eva
Kreutz, Clemens

DOI
10.12688/f1000research.155230.1
URN
urn:nbn:de:bsz:25-freidok-2613322
Rechteinformation
Open Access; Der Zugriff auf das Objekt ist unbeschränkt möglich.
Letzte Aktualisierung
15.08.2025, 07:33 MESZ

Datenpartner

Dieses Objekt wird bereitgestellt von:
Deutsche Nationalbibliothek. Bei Fragen zum Objekt wenden Sie sich bitte an den Datenpartner.

Beteiligte

Entstanden

  • 2025

Ähnliche Objekte (12)