Konferenzbeitrag

Processing and querying large web corpora with the COW14 architecture

In this paper, I present the COW14 tool chain, which comprises a web corpus creation tool called texrex, wrappers for existing linguistic annotation tools as well as an online query software called Colibri2. By detailed descriptions of the implementation and systematic evaluations of the performance of the software on different types of systems, I show that the COW14 architecture is capable of handling the creation of corpora of up to at least 100 billion tokens. I also introduce our running demo system which currently serves corpora of up to roughly 20 billion tokens in Dutch, English, French, German, Spanish, and Swedish

Language
Englisch

Subject
Korpus <Linguistik>
Annotation
Datenbanksystem
Linguistik

Event
Geistige Schöpfung
(who)
Schäfer, Roland
Event
Veröffentlichung
(who)
Mannheim : Institut für Deutsche Sprache
(when)
2015-07-02

URN
urn:nbn:de:bsz:mh39-38367
Last update
06.03.2025, 9:00 AM CET

Data provider

This object is provided by:
Leibniz-Institut für Deutsche Sprache - Bibliothek. If you have any questions about the object, please contact the data provider.

Object type

  • Konferenzbeitrag

Associated

  • Schäfer, Roland
  • Mannheim : Institut für Deutsche Sprache

Time of origin

  • 2015-07-02

Other Objects (12)