Arbeitspapier

O desafio do pareamento de grandes bases de dados: Mapeamento de métodos de record linkage probabilístico e diagnóstico de sua viabilidade empírica

This paper verified the predictive performance of probabilistic record linkage algorithms for the integration big sized real databases, evaluating the effects of the blocking key definition, as well as string metric functions and phonetic code pairing algorithms with respect to the prediction's quality and computational complexity. A bibliographical survey of the main deterministic and probabilistic record linkage methods was carried out, as well as of recent advances combining machine learning techniques and main packages and implementations available in open-source R language. The results can provide heuristics for problems of administrative records integration at national level and have potential value for the formulation and evaluation of public policies

Language
Portugiesisch

Bibliographic citation
Series: Texto para Discussão ; No. 2420

Classification
Wirtschaft
Model Evaluation, Validation, and Selection
Large Data Sets: Modeling and Analysis
Miscellaneous Mathematical Tools
Data Collection and Data Estimation Methodology; Computer Programs: General
Data Collection and Data Estimation Methodology; Computer Programs: Other Computer Software
Subject
pairs linking
blocking
administrative records
Big Data

Event
Geistige Schöpfung
(who)
Peng, Yaohao
Mation, Lucas Ferreira
Event
Veröffentlichung
(who)
Instituto de Pesquisa Econômica Aplicada (IPEA)
(where)
Brasília
(when)
2018

Handle
Last update
10.03.2025, 11:43 AM CET

Data provider

This object is provided by:
ZBW - Deutsche Zentralbibliothek für Wirtschaftswissenschaften - Leibniz-Informationszentrum Wirtschaft. If you have any questions about the object, please contact the data provider.

Object type

  • Arbeitspapier

Associated

  • Peng, Yaohao
  • Mation, Lucas Ferreira
  • Instituto de Pesquisa Econômica Aplicada (IPEA)

Time of origin

  • 2018

Other Objects (12)