Using Vision Transformers for Classifying Surgical Tools in Computer Aided Surgeries
Abstract: Automated laparoscopic video analysis is essential for assisting surgeons during computer aided medical procedures. Nevertheless, it faces challenges due to complex surgical scenes and limited annotated data. Most of the existing methods for classifying surgical tools in laparoscopic surgeries rely on conventional deep learning methods such as convolutional and recurrent neural networks. This paper explores the use of pure self-attention based models-Vision Transformers for classifying both single-label (SL) and multi-label (ML) frames in Laparoscopic surgeries. The proposed SL and ML models were comprehensively evaluated on the Cholec80 surgical workflow dataset using 5-fold cross validation. Experimental results showed an excellent classification performance with a mean average precision mAP=95.8% that outperforms conventional deep learning multi-label models developed in previous studies. Our results open new avenues for further research on the use of deep transformer models for surgical tool detection in modern operating theaters.
- Standort
-
Deutsche Nationalbibliothek Frankfurt am Main
- Umfang
-
Online-Ressource
- Sprache
-
Englisch
- Erschienen in
-
Using Vision Transformers for Classifying Surgical Tools in Computer Aided Surgeries ; volume:10 ; number:4 ; year:2024 ; pages:232-235 ; extent:4
Current directions in biomedical engineering ; 10, Heft 4 (2024), 232-235 (gesamt 4)
- Urheber
- DOI
-
10.1515/cdbme-2024-2056
- URN
-
urn:nbn:de:101:1-2412181802205.276932104545
- Rechteinformation
-
Open Access; Der Zugriff auf das Objekt ist unbeschränkt möglich.
- Letzte Aktualisierung
-
15.08.2025, 07:37 MESZ
Datenpartner
Deutsche Nationalbibliothek. Bei Fragen zum Objekt wenden Sie sich bitte an den Datenpartner.
Beteiligte
- El Moaqet, Hisham
- Janini, Rami
- Abdulbaki Alshirbaji, Tamer
- Aldeen Jalal, Nour
- Möller, Knut