Arbeitspapier
Topic Modeling for Analyzing Open-Ended Survey Responses
Open-ended responses are widely used in market research studies. Processing of such responses requires labor-intensive human coding. This paper focuses on unsupervised topic models and tests their ability to automate the analysis of open-ended responses. Since state-of-the-art topic models struggle with the shortness of open-ended responses, the paper considers three novel short text topic models: Latent Feature Latent Dirichlet Allocation, Biterm Topic Model and Word Network Topic Model. The models are fitted and evaluated on a set of realworld open-ended responses provided by a market research company. Multiple components such as topic coherence and document classification are quantitatively and qualitatively evaluated to appraise whether topic models can replace human coding. The results suggest that topic models are a viable alternative for open-ended response coding. However, their usefulness is limited when a correct one-to-one mapping of responses and topics or the exact topic distribution is needed.
- Language
-
Englisch
- Bibliographic citation
-
Series: IRTG 1792 Discussion Paper ; No. 2018-054
- Classification
-
Wirtschaft
Mathematical and Quantitative Methods: General
- Subject
-
Market research
open-ended responses
text analytics
short text topic models
- Event
-
Geistige Schöpfung
- (who)
-
Pietsch, Andra-Selina
Lessmann, Stefan
- Event
-
Veröffentlichung
- (who)
-
Humboldt-Universität zu Berlin, International Research Training Group 1792 "High Dimensional Nonstationary Time Series"
- (where)
-
Berlin
- (when)
-
2018
- Handle
- Last update
-
10.03.2025, 11:46 AM CET
Data provider
ZBW - Deutsche Zentralbibliothek für Wirtschaftswissenschaften - Leibniz-Informationszentrum Wirtschaft. If you have any questions about the object, please contact the data provider.
Object type
- Arbeitspapier
Associated
- Pietsch, Andra-Selina
- Lessmann, Stefan
- Humboldt-Universität zu Berlin, International Research Training Group 1792 "High Dimensional Nonstationary Time Series"
Time of origin
- 2018