Towards precise and convenient semantic search on text and knowledge bases
Abstract: In this dissertation, we consider the problem of making semantic search on text and knowledge bases more precise and convenient. In a nutshell, semantic search is search with meaning. To this respect, text and knowledge bases have different advantages and disadvantages. Large amounts of text are easily available on the web, and they contain a wealth of information in natural language. However, text represents information in an unstructured form. It follows no pre-defined schema, and without further processing, a machine can understand its meaning only on a superficial level. Knowledge bases, on the other hand, contain structured information in the form of subject-predicate-object triples. The meaning of triples is well defined, and triples can be retrieved precisely via a query language. However, formulating queries in this language is inconvenient and compared to text only a small fraction of information is currently available in knowledge bases.
In this document, we summarize our contributions on making semantic search on text and knowledge bases more precise and convenient. For knowledge bases, we introduce an approach to answer natural language questions. A user can pose questions conveniently in natural language and ask, for example, "who is the ceo of apple?", instead of having to learn and use a specific query language. Our approach applies learning-to-rank strategies and improved the state of the art on two widely used benchmarks at the time of publication. For knowledge bases, we also describe a novel approach to compute relevance scores for triples from type-like relations like profession and nationality. For example, on a large knowledge base, a query for "american actors" can return a list of more than 60 thousand actors in no particular order. Relevance scores allow to sort this list so that, e.g., frequent lead actors appear before those who only had single cameo roles. In a benchmark that we generated via crowdsourcing, we show that our rankings are closer to human judgments than approaches from the literature. Finally, for text, we introduce a novel natural language processing technique that identifies which words in a sentence "semantically belong together". For example, in the sentence "Bill Gates, founder of Microsoft, and Jeff Bezos, founder of Amazon, are among the wealthiest persons in the world", the words "Bill Gates", "founder", and "Amazon" do not belong together, but the words "Bill Gates", "founder", and "Microsoft" do. We show that when query keywords are required to belong together in order to match, search results become more precise.
Given the characteristics of text and knowledge bases outlined above, it is promising to consider a search that combines both. For example, for the query "CEOs of U.S. companies who advocate cryptocurrencies", a list of CEOs of U.S. companies can be retrieved from a knowledge base. The information who is advocating cryptocurrencies is rather specific and changes frequently. It is, therefore, better found in full text. As part of this thesis, we describe how a combined search could be achieved and present and evaluate a fully functional prototype. All of our approaches are accompanied by an extensive evaluation which show their practicability and, where available, compare them to established approaches from the literature
- Location
-
Deutsche Nationalbibliothek Frankfurt am Main
- Extent
-
Online-Ressource
- Language
-
Englisch
- Notes
-
Universität Freiburg, Dissertation, 2017
- Keyword
-
Questions and answers
- Event
-
Veröffentlichung
- (where)
-
Freiburg
- (who)
-
Universität
- (when)
-
2018
- Creator
- DOI
-
10.6094/UNIFR/16031
- URN
-
urn:nbn:de:bsz:25-freidok-160318
- Rights
-
Kein Open Access; Der Zugriff auf das Objekt ist unbeschränkt möglich.
- Last update
-
25.03.2025, 1:51 PM CET
Data provider
Deutsche Nationalbibliothek. If you have any questions about the object, please contact the data provider.
Associated
Time of origin
- 2018