Arbeitspapier
The SearchEngine: A holistic approach to matching
The SearchEngine is an open source project providing an integrated framework for diverse matching activities, especially the linkage of large scale firm data by fuzzy criteria like company names and addresses. At its core, it utilizes an efficient candidate retrieval mechanism implementing a word respectively token driven heuristic. Every record in one table becomes a search term to retrieve similar candidate records in the base table according to a search strategy replacing blocking strategies of conventional matching efforts. Because similarity is inherently established by the candidate selection, it is only required to filter false positives by using the meta data export file derived from the matching heuristic to implement a machine learning approach. This paper discusses the general foundation of the heuristic and the algorithm while two detailed walkthroughs of company linkages show practical examples.
- Language
-
Englisch
- Bibliographic citation
-
Series: ZEW Discussion Papers ; No. 23-001
- Classification
-
Wirtschaft
Methodology for Collecting, Estimating, and Organizing Microeconomic Data; Data Access
Data Collection and Data Estimation Methodology; Computer Programs: Other Computer Software
- Subject
-
data linkage
firm matching
entity resolution
machine learning
- Event
-
Geistige Schöpfung
- (who)
-
Doherr, Thorsten
- Event
-
Veröffentlichung
- (who)
-
ZEW - Leibniz-Zentrum für Europäische Wirtschaftsforschung
- (where)
-
Mannheim
- (when)
-
2023
- Handle
- Last update
-
2025-03-10T11:43:00+0100
Data provider
ZBW - Deutsche Zentralbibliothek für Wirtschaftswissenschaften - Leibniz-Informationszentrum Wirtschaft. If you have any questions about the object, please contact the data provider.
Object type
- Arbeitspapier
Associated
- Doherr, Thorsten
- ZEW - Leibniz-Zentrum für Europäische Wirtschaftsforschung
Time of origin
- 2023