Evaluating the Language Abilities of Large Language Models vs. Humans: Three Caveats

Abstract: We identify and analyze three caveats that may arise when analyzing the linguistic abilities of Large Language Models. The problem of unlicensed generalizations refers to the danger of interpreting performance in one task as predictive of the models’ overall capabilities, based on the assumption that because a specific task performance is indicative of certain underlying capabilities in humans, the same association holds for models. The human-like paradox refers to the problem of lacking human comparisons, while at the same time attributing human-like abilities to the models. Last, the problem of double standards refers to the use of tasks and methodologies that either cannot be applied to humans or they are evaluated differently in models vs. humans. While we recognize the impressive linguistic abilities of LLMs, we conclude that specific claims about the models’ human-likeness in the grammatical domain are premature. https://bioling.psychopen.eu/index.php/bioling/article/view/14391

Location
Deutsche Nationalbibliothek Frankfurt am Main
Extent
Online-Ressource
Language
Englisch

Bibliographic citation
Evaluating the Language Abilities of Large Language Models vs. Humans: Three Caveats ; volume:18 ; day:19 ; month:04 ; year:2024
Biolinguistics ; 18 (19.04.2024)

Creator
Leivada, Evelina
Dentella, Vittoria
Günther, Fritz

DOI
10.5964/bioling.14391
URN
urn:nbn:de:101:1-2404270511160.314212757568
Rights
Open Access; Der Zugriff auf das Objekt ist unbeschränkt möglich.
Last update
14.08.2025, 10:56 AM CEST

Data provider

This object is provided by:
Deutsche Nationalbibliothek. If you have any questions about the object, please contact the data provider.

Associated

  • Leivada, Evelina
  • Dentella, Vittoria
  • Günther, Fritz

Other Objects (12)