Arbeitspapier

Creating Data from Unstructured Text with Context Rule Assisted Machine Learning (CRAML)

Popular approaches to building data from unstructured text come with limitations, such as scalability, interpretability, replicability, and real-world applicability. These can be overcome with Context Rule Assisted Machine Learning (CRAML), a method and no-code suite of software tools that builds structured, labeled datasets which are accurate and reproducible. CRAML enables domain experts to access uncommon constructs within a document corpus in a low-resource, transparent, and flexible manner. CRAML produces document-level datasets for quantitative research and makes qualitative classification schemes scalable over large volumes of text. We demonstrate that the method is useful for bibliographic analysis, transparent analysis of proprietary data, and expert classification of any documents with any scheme. To demonstrate this process for building data from text with Machine Learning, we publish open-source resources: the software, a new public document corpus, and a replicable analysis to build an interpretable classifier of suspected "no poach" clauses in franchise documents.

Language
Englisch

Bibliographic citation
Series: GLO Discussion Paper ; No. 1214

Classification
Wirtschaft
Economic Methodology
Multiple or Simultaneous Equation Models: Classification Methods; Cluster Analysis; Principal Components; Factor Models
Methodology for Collecting, Estimating, and Organizing Microeconomic Data; Data Access
Data Collection and Data Estimation Methodology; Computer Programs: Other Computer Software
Labor Economics Policies
Labor Contracts
Monopsony; Segmented Labor Markets
Coercive Labor Markets
Labor-Management Relations; Industrial Jurisprudence
Economic Sociology; Economic Anthropology; Language; Social and Economic Stratification
Subject
machine learning
natural language processing
text classification
big data

Event
Geistige Schöpfung
(who)
Meisenbacher, Stephen
Norlander, Peter
Event
Veröffentlichung
(who)
Global Labor Organization (GLO)
(where)
Essen
(when)
2022

Handle
Last update
10.03.2025, 11:43 AM CET

Data provider

This object is provided by:
ZBW - Deutsche Zentralbibliothek für Wirtschaftswissenschaften - Leibniz-Informationszentrum Wirtschaft. If you have any questions about the object, please contact the data provider.

Object type

  • Arbeitspapier

Associated

  • Meisenbacher, Stephen
  • Norlander, Peter
  • Global Labor Organization (GLO)

Time of origin

  • 2022

Other Objects (12)