Development of bioinformatic methods for the prediction and understanding of biosynthesis and activity of natural products

Abstract: Natural products represent a valuable source for novel drugs and therapeutics. Rapid pro- gresses in computer technology, allow for the generation of knowledge about natural product biosynthesis and activity by investigation of large biological datasets. In this thesis bioinfor- matic know-how and machine learning algorithms were applied to develop methods for the prediction of secondary metabolite scaffolds based on their encoding biosynthetic gene clus- ter. Furthermore, the potential to predict the B cell and T cell epitope activity of non-peptidic molecules was explored. In order to connect biosynthetic gene clusters with their produced secondary metabolites the Secondary Metabolite Prediction and Identification pipeline (SeMPI) was developed. SeMPI v1 could predict polyketides (PK) of type I modular. The predicted scaffolds were screened in the StreptomeDB v2 in order to identify similar known secondary metabolites. Therefore, a novel algorithm was designed which allowed for the extraction of the putative initially biosynthesized carbon-chain of a secondary metabolite. In a benchmark based on the ranking power of annotated natural products, SeMPI v1 could outperform state-of-the-art biosynthetic gene cluster scaffold prediction software. The update SeMPI v2 was extended by nonribosomal peptide (NRP) and PK-NRP hy- brid predictions. The bottleneck in NRP scaffold generation is given by the prediction of the correct adenylation (A) domain substrate. To increase the prediction performance, a large selection of annotated A domains with known substrates was collected. The database scope was increased by 7 publicly available natural compound related libraries, which allows for the screening of almost 190,000 compounds. Additionally, SeMPI v2 includes the predic- tion of post-synthetic modifications, which were added to the screening process. Furthermore, the database screening was optimized using a benchmark, based on 559 biosynthetic gene clusters with annotated secondary metabolites. The same benchmark was applied to compare SeMPI v2 to the secondary metabolite scaffold prediction server antiSMASH v5. SeMPI v2 performed similar or better in all compared categories. SeMPI v2 provides a so- phisticated web server, including a genome browser, a molecular workbench and a prepro- cessed database. The genome browser allows for the observation of biosynthetic domains, modules and clusters in a visual overview. The molecular workbench enables the modifi- cation of predicted scaffolds before submission to the database screening. The molecular workbench can also be used to submit scaffolds to the screening without prior processing of a biosynthetic gene cluster. The preprocessed database includes biosynthetic gene clusters from the Minimum Information about a Biosynthetic Gene Cluster (MIBiG) database as well as a selection of streptomyces genomes. In order to identify novel A domain specificities based on the production of so far un- characterized A domains a cooperation project with the group of Prof. Dr. Helge Bode at the university of Frankfurt was initiated. Bode et al. developed a novel NRP production system, with the potential to rapidly identify the substrate specificities of A domains for the genera photorhabdus and xenorhabdus. In order to use this system to identify so far un- characterized A domain specificities, the available space of photorhabdus and xenorhabdus A domains was collected. The sequences were phylogenetically investigated and promising domains, with a high potential to encode for novel specificities, were selected. The results of the production experiments are pending. Different functionalities of protein subfamilies, such as the substrate specificity of A and acyltransferase (AT) domains, are associated with subfamily specific residues (SSRs). In order to allow researches a thorough analysis of protein subfamilies the Subfamily Spe- cific Residue vizualization toolbox (SSR-viz) was developed. SSR-viz uses a novel algo- rithm, which allows for the detection of SSRs based on different detection strategies. The performance of the tool was benchmarked using a dataset of 20 protein subfamilies with experimental validated SSRs. SSR-viz performed comparable to state-of-the-art software and could outperform all other tools in 4 cases. The graphical user interface of SSR-viz combines various features for the detection and visualization of SSRs. The expertise in cheminformatics and machine learning collected during the work on the aforementioned projects could be applied in a methodically related cooperation project con- ducted in the work group of Prof. Dr. Björn Peters at the La Jolla Institute of Immunology (LJI) in California. The adaptive immune system relies on the identification of pathogens based on the recognition of epitopes by T cell receptors, B cell receptors and antibodies. Apart from peptidic epitopes, various non-peptidic epitopes have been described. In order to analyze the potential of non-peptidic molecules to induce an immune response, a tool was developed which allows for the prediction of non-peptidic epitopes. The built machine learning models were thoroughly benchmarked and the prediction logic was investigated in an immunological context

Location
Deutsche Nationalbibliothek Frankfurt am Main
Extent
Online-Ressource
Language
Englisch
Notes
Universität Freiburg, Dissertation, 2020

Keyword
Natural products
Biosynthesis
Comprehension

Event
Veröffentlichung
(where)
Freiburg
(who)
Universität
(when)
2021
Creator

DOI
10.6094/UNIFR/194304
URN
urn:nbn:de:bsz:25-freidok-1943045
Rights
Der Zugriff auf das Objekt ist unbeschränkt möglich.
Last update
25.03.2025, 1:43 PM CET

Data provider

This object is provided by:
Deutsche Nationalbibliothek. If you have any questions about the object, please contact the data provider.

Time of origin

  • 2021

Other Objects (12)