PubMedPortable: a framework for supporting the development of text mining applications

Abstract: Information extraction from biomedical literature is continuously growing in scope and importance. Many tools exist that perform named entity recognition, e.g. of proteins, chemical compounds, and diseases. Furthermore, several approaches deal with the extraction of relations between identified entities. The BioCreative community supports these developments with yearly open challenges, which led to a standardised XML text annotation format called BioC. PubMed provides access to the largest open biomedical literature repository, but there is no unified way of connecting its data to natural language processing tools.Therefore, an appropriate data environment is needed as a basis to combine different software solutions and to develop customised text mining applications. PubMedPortable builds a relational database and a full text index on PubMed citations. It can be applied either to the complete PubMed data set or an arbitrary subset of downloaded PubMed XML files.The software provides the infrastructure to combine stand-alone applications by exporting different data formats, e.g. BioC. The presented workflows show how to use PubMedPortable to retrieve, store, and analyse a disease-specific data set. The provided use cases are well documented in the PubMedPortable wiki. The open-source software library is small, easy to use, and scalable to the user's system requirements. It is freely available for Linux on the web at https://github.com/KerstenDoering PubMedPortable and for other operating systems as a virtual container. The approach was tested extensively and applied successfully in several projects

Location
Deutsche Nationalbibliothek Frankfurt am Main
Extent
Online-Ressource
Language
Englisch
Notes
PLoS ONE. 11, 19 (2016), e0163794, DOI 10.1371/journal.pone.0163794, issn: 1932-6203
IN COPYRIGHT http://rightsstatements.org/page/InC/1.0 rs

Keyword
Medizintechnik

Event
Veröffentlichung
(where)
Freiburg
(who)
Universität
(when)
2016
Creator
Contributor
Pharmazeutische Bioinformatik
Albert-Ludwigs-Universität Freiburg
Professur für Bioinformatik
Technische Fakultät
Fakultät für Chemie und Pharmazie

DOI
10.1371/journal.pone.0163794
URN
urn:nbn:de:bsz:25-freidok-121793
Rights
Der Zugriff auf das Objekt ist unbeschränkt möglich.
Last update
14.08.2025, 10:53 AM CEST

Data provider

This object is provided by:
Deutsche Nationalbibliothek. If you have any questions about the object, please contact the data provider.

Associated

Time of origin

  • 2016

Other Objects (12)