Using Vision Transformers for Classifying Surgical Tools in Computer Aided Surgeries
Abstract: Automated laparoscopic video analysis is essential for assisting surgeons during computer aided medical procedures. Nevertheless, it faces challenges due to complex surgical scenes and limited annotated data. Most of the existing methods for classifying surgical tools in laparoscopic surgeries rely on conventional deep learning methods such as convolutional and recurrent neural networks. This paper explores the use of pure self-attention based models-Vision Transformers for classifying both single-label (SL) and multi-label (ML) frames in Laparoscopic surgeries. The proposed SL and ML models were comprehensively evaluated on the Cholec80 surgical workflow dataset using 5-fold cross validation. Experimental results showed an excellent classification performance with a mean average precision mAP=95.8% that outperforms conventional deep learning multi-label models developed in previous studies. Our results open new avenues for further research on the use of deep transformer models for surgical tool detection in modern operating theaters.
- Location
-
Deutsche Nationalbibliothek Frankfurt am Main
- Extent
-
Online-Ressource
- Language
-
Englisch
- Bibliographic citation
-
Using Vision Transformers for Classifying Surgical Tools in Computer Aided Surgeries ; volume:10 ; number:4 ; year:2024 ; pages:232-235 ; extent:4
Current directions in biomedical engineering ; 10, Heft 4 (2024), 232-235 (gesamt 4)
- Creator
- DOI
-
10.1515/cdbme-2024-2056
- URN
-
urn:nbn:de:101:1-2412181802205.276932104545
- Rights
-
Open Access; Der Zugriff auf das Objekt ist unbeschränkt möglich.
- Last update
-
15.08.2025, 7:37 AM CEST
Data provider
Deutsche Nationalbibliothek. If you have any questions about the object, please contact the data provider.
Associated
- El Moaqet, Hisham
- Janini, Rami
- Abdulbaki Alshirbaji, Tamer
- Aldeen Jalal, Nour
- Möller, Knut