Konferenzbeitrag
Language Independent Named Entity Recognition using Distant Supervision
While good results have been achieved for named entity recognition (NER) in supervised settings, it remains a problem that for low resource languages and less studied domains little or no labelled data is available. As NER is a crucial preprocessing step for many natural language processing tasks, finding a way to overcome this deficit in data remains of great interest. We propose a distant supervision approach to NER that is both language and domain independent where we automatically generate labelled training data using gazetteers that we previously extracted from Wikipedia. We test our approach on English, German and Estonian data sets and contribute further by introducing several successful methods to reduce the noise in the generated training data. The tested models beat baseline systems and our results show that distant supervision can be a promising approach for NER when no labelled data is available. For the English model we also show that the distant supervision model is better at generalizing within the same domain of news texts by comparing it against a supervised model on a different test set.
- Sprache
-
Englisch
- Thema
-
Maschinelles Lernen
Information Extraction
Computerlinguistik
Text Mining
Name
Sprache
- Ereignis
-
Geistige Schöpfung
- (wer)
-
Dembowski, Julia
Wiegand, Michael
Klakow, Dietrich
- Ereignis
-
Veröffentlichung
- (wer)
-
Poznań : Fundacja Uniwersytetu im. Adama Mickiewicza
- (wann)
-
2019-03-19
- URN
-
urn:nbn:de:bsz:mh39-86198
- Letzte Aktualisierung
-
06.03.2025, 09:00 MEZ
Datenpartner
Leibniz-Institut für Deutsche Sprache - Bibliothek. Bei Fragen zum Objekt wenden Sie sich bitte an den Datenpartner.
Objekttyp
- Konferenzbeitrag
Beteiligte
- Dembowski, Julia
- Wiegand, Michael
- Klakow, Dietrich
- Poznań : Fundacja Uniwersytetu im. Adama Mickiewicza
Entstanden
- 2019-03-19