Towards hierarchical regional transformer-based multiple instance learning

Cersovsky, J.; Mohammadi, S.; Kainmueller, D.; Hoehne, J.

Towards hierarchical regional transformer-based multiple instance learning

Tools

Item Type:	Conference or Workshop Item
Title:	Towards hierarchical regional transformer-based multiple instance learning
Creators Name:	Cersovsky, J., Mohammadi, S., Kainmueller, D. and Hoehne, J.
Abstract:	The classification of gigapixel histopathology images with deep multiple instance learning models has become a critical task in digital pathology and precision medicine. In this work, we propose a Transformer-based multiple instance learning method that replaces the traditional learned attention mechanism with a regional, Vision Transformer inspired self-attention mechanism. We additionally propose a method that fuses regional patch information to derive slide-level predictions. We then show how this regional aggregation can be stacked to hierarchically process features on different distance levels. To increase predictive accuracy, especially for datasets with small, local morphological features, we also suggest a method to focus the image processing on high attention regions during inference. Our approach is able to significantly improve performance over the baseline on two histopathology datasets and points towards promising directions for further research.
Keywords:	Multiple Instance Learning, Attention Mechanism, High Attention, Digital Pathology, Directions for further Research, Vision Transformer, Effective Approach, Convolutional Neural Network, Computer Vision, Size Of Region, Model Architecture, Hyperparameter Tuning, Image Patches, Slide Images, Improve Model Performance, Memory Footprint, Small Region of Interest, Individual Patches, Single-level Model, Sample Patches, Input Tokens, Patch Region
Source:	2023 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW)
ISSN:	2473-9944
ISBN:	979-8-3503-0744-3
Publisher:	IEEE
Page Range:	3954-3962
Date:	25 December 2023
Additional Information:	Copyright ©2023 IEEE. This ICCV workshop paper is the Open Access version, provided by the Computer Vision Foundation. Except for this watermark, it is identical to the accepted version; the final published version of the proceedings is available on IEEE Xplore. This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author's copyright.
Official Publication:	https://doi.org/10.1109/iccvw60793.2023.00427
External Fulltext:	View full text on external repository or document server

Repository Staff Only: item control page