Journal article | Zeitschriftenartikel

Three Methods for Occupation Coding Based on Statistical Learning

Occupation coding, an important task in official statistics, refers to coding a respondent's text answer into one of many hundreds of occupation codes. To date, occupation coding is still at least partially conducted manually, at great expense. We propose three methods for automatic coding: combining separate models for the detailed occupation codes and for aggregate occupation codes, a hybrid method that combines a duplicate-based approach with a statistical learning algorithm, and a modified nearest neighbor approach. Using data from the German General Social Survey (ALLBUS), we show that the proposed methods improve on both the coding accuracy of the underlying statistical learning algorithm and the coding accuracy of duplicates where duplicates exist. Further, we find defining duplicates based on ngram variables (a concept from text mining) is preferable to one based on exact string matches.

Three Methods for Occupation Coding Based on Statistical Learning

Urheber*in: Gweon, Hyukjun; Schonlau, Matthias; Kaczmirek, Lars; Blohm, Michael; Steiner, Stefan

Attribution - NonCommercial - NoDerivates 4.0 International

0
/
0

ISSN
2001-7367
Extent
Seite(n): 101-122
Language
Englisch
Notes
Status: Veröffentlichungsversion; begutachtet (peer reviewed)

Bibliographic citation
Journal of Official Statistics, 33(1)

Subject
Sozialwissenschaften, Soziologie
Erhebungstechniken und Analysetechniken der Sozialwissenschaften
Codierung
Beruf
Algorithmus
ALLBUS
amtliche Statistik
Methode

Event
Geistige Schöpfung
(who)
Gweon, Hyukjun
Schonlau, Matthias
Kaczmirek, Lars
Blohm, Michael
Steiner, Stefan
Event
Veröffentlichung
(where)
Deutschland
(when)
2017

DOI
Rights
GESIS - Leibniz-Institut für Sozialwissenschaften. Bibliothek Köln
Last update
21.06.2024, 4:27 PM CEST

Data provider

This object is provided by:
GESIS - Leibniz-Institut für Sozialwissenschaften. Bibliothek Köln. If you have any questions about the object, please contact the data provider.

Object type

  • Zeitschriftenartikel

Associated

  • Gweon, Hyukjun
  • Schonlau, Matthias
  • Kaczmirek, Lars
  • Blohm, Michael
  • Steiner, Stefan

Time of origin

  • 2017

Other Objects (12)