Journal article | Zeitschriftenartikel

Semi-automated categorization of open-ended questions

"Text data from open-ended questions in surveys are difficult to analyze and are frequently ignored. Yet open-ended questions are important because they do not constrain respondents' answer choices. Where open-ended questions are necessary, sometimes multiple human coders hand-code answers into one of several categories. At the same time, computer scientists have made impressive advances in text mining that may allow automation of such coding. Automated algorithms do not achieve an overall accuracy high enough to entirely replace humans. We categorize open-ended questions soliciting narrative responses using text mining for easy-to-categorize answers and humans for the remainder using expected accuracies to guide the choice of the threshold delineating between 'easy' and 'hard'. Employing multinomial boosting avoids the common practice of converting machine learning 'confidence scores' into pseudo-probabilities. This approach is illustrated with examples from open-ended questions related to respondents’ advice to a patient in a hypothetical dilemma, a follow-up probe related to respondents' perception of disclosure/privacy risk, and from a question on reasons for quitting smoking from a follow-up survey from the Ontario Smoker's Helpline. Targeting 80% combined accuracy, we found that 54%-80% of the data could be categorized automatically in research surveys." (author's abstract)

ISSN
1864-3361
Extent
Seite(n): 143-152
Language
Englisch
Notes
Status: Veröffentlichungsversion; begutachtet (peer reviewed)

Bibliographic citation
Survey Research Methods, 10(2)

Subject
Sozialwissenschaften, Soziologie
Erhebungstechniken und Analysetechniken der Sozialwissenschaften
Datengewinnung
qualitative Methode
Fragebogen
Codierung
Automatisierung
Datenqualität
Umfrageforschung
Grundlagenforschung
Methodenentwicklung

Event
Geistige Schöpfung
(who)
Schonlau, Matthias
Couper, Mick P.
Event
Veröffentlichung
(where)
Deutschland
(when)
2016

DOI
Last update
21.06.2024, 4:27 PM CEST

Data provider

This object is provided by:
GESIS - Leibniz-Institut für Sozialwissenschaften. Bibliothek Köln. If you have any questions about the object, please contact the data provider.

Object type

  • Zeitschriftenartikel

Associated

  • Schonlau, Matthias
  • Couper, Mick P.

Time of origin

  • 2016

Other Objects (12)