Enhancing chatbot performance for imaging recommendations: Leveraging GPT-4 and context-awareness for trustworthy clinical guidance
Abstract: Purpose
To investigate if GPT-4 improves the accuracy, consistency, and trustworthiness of a context-aware chatbot to provide personalized imaging recommendations from American College of Radiology (ACR) appropriateness criteria documents using semantic similarity processing: In addition, we sought to enable auditability of the output by revealing the information source the decision relies on.
Material and Methods
We refined an existing chatbot that incorporated specialized knowledge of the ACR guidelines by upgrading GPT-3.5-Turbo to its successor GPT-4 by OpenAI, using the latest version of LlamaIndex, and improving the prompting strategy. This chatbot was compared to the previous version, generic GPT-3.5-Turbo and GPT-4, and general radiologists regarding the performance in applying the ACR appropriateness guidelines.
Results
The refined context-aware chatbot performed superior to the previous version using GPT-3.5-Turbo, generic chatbots GPT-3.5-Turbo and GPT-4, and general radiologists in providing “usually or may be appropriate” recommendations according to the ACR guidelines (all p < 0.001). It also outperformed GPT-3.5-Turbo and general radiologists in respect to “usually appropriate” recommendations (both p < 0.001). Moreover, the consistency in correct answers was higher with 78 % consistent correct “usually appropriate” answers and 94 % for “usually or may be appropriate” recommendations. In all cases, the same source documents were chosen, ensuring transparency.
Conclusion
Our study demonstrates the significance of context awareness in ensuring the use of appropriate knowledge and proposes a strategy to enhance trust in chatbot-based outputs to provide transparency. The improvements in accuracy, consistency, and source transparency address trust issues and enhance the clinical decision support process.
Abbreviations: ACR, American College of Radiology; accGPT, appropriateness criteria context aware GPT; accGPT-4, appropriateness criteria context aware GPT using GPT-4; GPT, generative pre-trained transformer; LLM, Large Language Model
- Location
-
Deutsche Nationalbibliothek Frankfurt am Main
- Extent
-
Online-Ressource
- Language
-
Englisch
- Notes
-
European journal of radiology. - 181 (2024) , 111756, ISSN: 1872-7727
- Event
-
Veröffentlichung
- (where)
-
Freiburg
- (who)
-
Universität
- (when)
-
2024
- Creator
-
Rau, Alexander
Bamberg, Fabian
Fink, Anna
Tran, Phuong Hien
Reisert, Marco
Russe, Maximilian Frederik
- DOI
-
10.1016/j.ejrad.2024.111756
- URN
-
urn:nbn:de:bsz:25-freidok-2572736
- Rights
-
Open Access; Der Zugriff auf das Objekt ist unbeschränkt möglich.
- Last update
-
25.03.2025, 1:53 PM CET
Data provider
Deutsche Nationalbibliothek. If you have any questions about the object, please contact the data provider.
Associated
- Rau, Alexander
- Bamberg, Fabian
- Fink, Anna
- Tran, Phuong Hien
- Reisert, Marco
- Russe, Maximilian Frederik
- Universität
Time of origin
- 2024