Konferenzbeitrag

Transparent, efficient, and robust word embedding access with WOMBAT

We present WOMBAT, a Python tool which supports NLP practitioners in accessing word embeddings from code. WOMBAT addresses common research problems, including unified access, scaling, and robust and reproducible preprocessing. Code that uses WOMBAT for accessing word embeddings is not only cleaner, more readable, and easier to reuse, but also much more efficient than code using standard in-memory methods: a Python script using WOMBAT for evaluating seven large word embedding collections (8.7M embedding vectors in total) on a simple SemEval sentence similarity task involving 250 raw sentence pairs completes in under ten seconds end-to-end on a standard notebook computer.

Transparent, efficient, and robust word embedding access with WOMBAT

Urheber*in: Müller, Mark-Christoph; Strube, Michael

Attribution 4.0 International

Language
Englisch

Subject
Python <Programmiersprache>
Automatische Sprachanalyse
Code
Computerlinguistik
Sprache

Event
Geistige Schöpfung
(who)
Müller, Mark-Christoph
Strube, Michael
Event
Veröffentlichung
(who)
Stroudsburg, Pennsylvania : Association for Computational Linguistics
Mannheim : Leibniz-Institut für Deutsche Sprache (IDS)
(when)
2022-06-14

URN
urn:nbn:de:bsz:mh39-110862
Last update
06.03.2025, 9:00 AM CET

Data provider

This object is provided by:
Leibniz-Institut für Deutsche Sprache - Bibliothek. If you have any questions about the object, please contact the data provider.

Object type

  • Konferenzbeitrag

Associated

  • Müller, Mark-Christoph
  • Strube, Michael
  • Stroudsburg, Pennsylvania : Association for Computational Linguistics
  • Mannheim : Leibniz-Institut für Deutsche Sprache (IDS)

Time of origin

  • 2022-06-14

Other Objects (12)