Arbeitspapier

The SearchEngine: A holistic approach to matching

The SearchEngine is an open source project providing an integrated framework for diverse matching activities, especially the linkage of large scale firm data by fuzzy criteria like company names and addresses. At its core, it utilizes an efficient candidate retrieval mechanism implementing a word respectively token driven heuristic. Every record in one table becomes a search term to retrieve similar candidate records in the base table according to a search strategy replacing blocking strategies of conventional matching efforts. Because similarity is inherently established by the candidate selection, it is only required to filter false positives by using the meta data export file derived from the matching heuristic to implement a machine learning approach. This paper discusses the general foundation of the heuristic and the algorithm while two detailed walkthroughs of company linkages show practical examples.

Language
Englisch

Bibliographic citation
Series: ZEW Discussion Papers ; No. 23-001

Classification
Wirtschaft
Methodology for Collecting, Estimating, and Organizing Microeconomic Data; Data Access
Data Collection and Data Estimation Methodology; Computer Programs: Other Computer Software
Subject
data linkage
firm matching
entity resolution
machine learning

Event
Geistige Schöpfung
(who)
Doherr, Thorsten
Event
Veröffentlichung
(who)
ZEW - Leibniz-Zentrum für Europäische Wirtschaftsforschung
(where)
Mannheim
(when)
2023

Handle
Last update
2025-03-10T11:43:00+0100

Data provider

This object is provided by:
ZBW - Deutsche Zentralbibliothek für Wirtschaftswissenschaften - Leibniz-Informationszentrum Wirtschaft. If you have any questions about the object, please contact the data provider.

Object type

  • Arbeitspapier

Associated

  • Doherr, Thorsten
  • ZEW - Leibniz-Zentrum für Europäische Wirtschaftsforschung

Time of origin

  • 2023

Other Objects (12)