Computational study protocol: leveraging synthetic data to validate a benchmark study for differential abundance tests for 16s microbiome sequencing data

to related objects

Abstract: Background
Synthetic data’s utility in benchmark studies depends on its ability to closely mimic real-world conditions and reproduce results obtained from experimental data. Building on Nearing et al.’s study (1), who assessed 14 differential abundance tests using 38 experimental 16S rRNA datasets in a case-control design, we are generating synthetic datasets that mimic the experimental data to verify their findings. We will employ statistical tests to rigorously assess the similarity between synthetic and experimental data and to validate the conclusions on the performance of these tests drawn by Nearing et al. (1). This protocol adheres to the SPIRIT guidelines, demonstrating how established reporting frameworks can support robust, transparent, and unbiased study planning.

Methods
We replicate Nearing et al.’s (1) methodology, incorporating synthetic data simulated using two distinct tools, mirroring the 38 experimental datasets. Equivalence tests will be conducted on a non-redundant subset of 46 data characteristics comparing synthetic and experimental data, complemented by principal component analysis for overall similarity assessment. The 14 differential abundance tests will be applied to synthetic and experimental datasets, evaluating the consistency of significant feature identification and the number of significant features per tool. Correlation analysis and multiple regression will explore how differences between synthetic and experimental data characteristics may affect the results.

Conclusions
Synthetic data enables the validation of findings through controlled experiments. We assess how well synthetic data replicates experimental data, try to validate previous findings with the most recent versions of the DA methods and delineate the strengths and limitations of synthetic data in benchmark studies. Moreover, to our knowledge this is the first computational benchmark study to systematically incorporate synthetic data for validating differential abundance methods while strictly adhering to a pre-specified study protocol following SPIRIT guidelines, contributing to transparency, reproducibility, and unbiased research

Location: Deutsche Nationalbibliothek Frankfurt am Main

Extent: Online-Ressource

Language: Englisch

Notes: F1000Research. - 13 (2024) , 1180, ISSN: 2046-1402

Classification: Informatik

Event: Veröffentlichung

(where): Freiburg

(who): Universität

(when): 2025

Creator: Kohnert, Eva
Kreutz, Clemens

DOI: 10.12688/f1000research.155230.1

URN: urn:nbn:de:bsz:25-freidok-2613322

Rights: Open Access; Der Zugriff auf das Objekt ist unbeschränkt möglich.

Last update: 15.08.2025, 7:33 AM CEST

Data provider

This object is provided by:
Deutsche Nationalbibliothek. If you have any questions about the object, please contact the data provider.

Show original at data provider

Associated

Kohnert, Eva
Kreutz, Clemens
Universität

Time of origin

2025

Other Objects (12)

Hochschulschrift

Failure algebra to validate sensor data

Leveraging Georeferenced Open Government Data

Clinlabomics: leveraging clinical laboratory data by data mining strategies

The potential of ground gravity measurements to validate GRACE data

zweidimensionales bewegtes Bild

Leveraging Big Geo Data through Metadata

Hochschulschrift

Leveraging data science for marketing-finance

LEVERAGING INFORMATION RETRIEVAL OVER LINKED DATA

Hochschulschrift

Leveraging tagging data for recommender systems

zweidimensionales bewegtes Bild

Leveraging Linked Data using Python and SPARQL

Leveraging big data in population health management

Leveraging Flexible Data Management with Graph Databases

Leveraging Data Science for a Personalized Haemodialysis

Hochschulschrift

Failure algebra to validate sensor data

Leveraging Georeferenced Open Government Data

Clinlabomics: leveraging clinical laboratory data by data mining strategies

The potential of ground gravity measurements to validate GRACE data

zweidimensionales bewegtes Bild

Leveraging Big Geo Data through Metadata

Hochschulschrift

Leveraging data science for marketing-finance

LEVERAGING INFORMATION RETRIEVAL OVER LINKED DATA

Hochschulschrift

Leveraging tagging data for recommender systems

zweidimensionales bewegtes Bild

Leveraging Linked Data using Python and SPARQL

Leveraging big data in population health management

Leveraging Flexible Data Management with Graph Databases

Leveraging Data Science for a Personalized Haemodialysis

Hochschulschrift

Failure algebra to validate sensor data

Leveraging Georeferenced Open Government Data

Clinlabomics: leveraging clinical laboratory data by data mining strategies

The potential of ground gravity measurements to validate GRACE data

zweidimensionales bewegtes Bild

Leveraging Big Geo Data through Metadata

Hochschulschrift

Leveraging data science for marketing-finance

LEVERAGING INFORMATION RETRIEVAL OVER LINKED DATA

Hochschulschrift

Leveraging tagging data for recommender systems

zweidimensionales bewegtes Bild

Leveraging Linked Data using Python and SPARQL

Leveraging big data in population health management

Leveraging Flexible Data Management with Graph Databases

Leveraging Data Science for a Personalized Haemodialysis

Cultural heritage institutions wishing to register will find more information here.

Fields marked * need to be filled in.

Username*

Please enter your username

Email*

Please enter your email address

Please do not fill this field

First name

Last name

Password*

Please enter your password

Confirm password*

Please enter the same password

I have read and agree to the privacy policy for the collection of personal data.*

This field is required.

I would like to subscribe to the newsletter of the Deutsche Digitale Bibliothek. See newsletter subscription info.

Account created

Your "My DDB" account has been successfully created. Before you can log in to your account, you must click the confirmation link in the message we just sent to the email address you provided.

Computational study protocol: leveraging synthetic data to validate a benchmark study for differential abundance tests for 16s microbiome sequencing data

Show object

Classification and Topics

Contributors, Places and Time

Further information

Data provider

Associated

Time of origin

Other Objects (12)

Failure algebra to validate sensor data

Leveraging Georeferenced Open Government Data

Clinlabomics: leveraging clinical laboratory data by data mining strategies

The potential of ground gravity measurements to validate GRACE data

Leveraging Big Geo Data through Metadata

Leveraging data science for marketing-finance

LEVERAGING INFORMATION RETRIEVAL OVER LINKED DATA

Leveraging tagging data for recommender systems

Leveraging Linked Data using Python and SPARQL

Leveraging big data in population health management

Leveraging Flexible Data Management with Graph Databases

Leveraging Data Science for a Personalized Haemodialysis

Failure algebra to validate sensor data

Leveraging Georeferenced Open Government Data

Clinlabomics: leveraging clinical laboratory data by data mining strategies

The potential of ground gravity measurements to validate GRACE data

Leveraging Big Geo Data through Metadata

Leveraging data science for marketing-finance

LEVERAGING INFORMATION RETRIEVAL OVER LINKED DATA

Leveraging tagging data for recommender systems

Leveraging Linked Data using Python and SPARQL

Leveraging big data in population health management

Leveraging Flexible Data Management with Graph Databases

Leveraging Data Science for a Personalized Haemodialysis

Failure algebra to validate sensor data

Leveraging Georeferenced Open Government Data

Clinlabomics: leveraging clinical laboratory data by data mining strategies

The potential of ground gravity measurements to validate GRACE data

Leveraging Big Geo Data through Metadata

Leveraging data science for marketing-finance

LEVERAGING INFORMATION RETRIEVAL OVER LINKED DATA

Leveraging tagging data for recommender systems

Leveraging Linked Data using Python and SPARQL

Leveraging big data in population health management

Leveraging Flexible Data Management with Graph Databases

Leveraging Data Science for a Personalized Haemodialysis

Related objects

Reset password