Arbeitspapier

The SearchEngine: A holistic approach to matching

The SearchEngine is an open source project providing an integrated framework for diverse matching activities, especially the linkage of large scale firm data by fuzzy criteria like company names and addresses. At its core, it utilizes an efficient candidate retrieval mechanism implementing a word respectively token driven heuristic. Every record in one table becomes a search term to retrieve similar candidate records in the base table according to a search strategy replacing blocking strategies of conventional matching efforts. Because similarity is inherently established by the candidate selection, it is only required to filter false positives by using the meta data export file derived from the matching heuristic to implement a machine learning approach. This paper discusses the general foundation of the heuristic and the algorithm while two detailed walkthroughs of company linkages show practical examples.

Sprache
Englisch

Erschienen in
Series: ZEW Discussion Papers ; No. 23-001

Klassifikation
Wirtschaft
Methodology for Collecting, Estimating, and Organizing Microeconomic Data; Data Access
Data Collection and Data Estimation Methodology; Computer Programs: Other Computer Software
Thema
data linkage
firm matching
entity resolution
machine learning

Ereignis
Geistige Schöpfung
(wer)
Doherr, Thorsten
Ereignis
Veröffentlichung
(wer)
ZEW - Leibniz-Zentrum für Europäische Wirtschaftsforschung
(wo)
Mannheim
(wann)
2023

Handle
Letzte Aktualisierung
10.03.2025, 11:43 MEZ

Datenpartner

Dieses Objekt wird bereitgestellt von:
ZBW - Deutsche Zentralbibliothek für Wirtschaftswissenschaften - Leibniz-Informationszentrum Wirtschaft. Bei Fragen zum Objekt wenden Sie sich bitte an den Datenpartner.

Objekttyp

  • Arbeitspapier

Beteiligte

  • Doherr, Thorsten
  • ZEW - Leibniz-Zentrum für Europäische Wirtschaftsforschung

Entstanden

  • 2023

Ähnliche Objekte (12)