"Analyzing Components of a Transformer under Different Data Scales in 3D Prostate CT Segmentation"

Yicong Tan, Prerak Mody, Viktor van der Valk, Marius Staring and Jan van Gemert

Abstract

Literature on medical imaging segmentation claims that hybrid UNet models containing both Transformer and convolutional blocks perform better than purely convolutional UNet models. This recently touted success of Transformers warrants an investigation into which of its components contribute to its performance. Also, previous work has a limitation of analysis only at fixed data scales as well as unfair comparisons with others models where parameter counts are not equivalent. This work investigates the performance of the window-Based Transformer for prostate CT Organ-at-Risk (OAR) segmentation at different data scales in context of replacing its various components. To compare with literature, the first experiment replaces the window-based Transformer block with convolution. Results show that the convolution prevails as the data scale increases. In the second experiment, to reduce complexity, the self-attention mechanism is replaced with an equivalent albeit simpler spatial mixing operation i.e. max-pooling. We observe improved performance for max-pooling in smaller data scales, indicating that the window-based Transformer may not be the best choice in both small and larger data scales. Finally, since convolution has an inherent local inductive bias of positional information, we conduct a third experiment to imbibe such a property to the Transformer by exploring two kinds of positional encodings. The results show that there are insignificant improvements after adding positional encoding, indicating the Transformers deficiency in capturing positional information given our data scales. We hope that our approach can serve as a framework for others evaluating the utility of Transformers for their tasks. Code is available via GitHub.

Download
PDF (5 pages, 15979 kB)
From publisher	link
Copyright © 2023 by the authors. Published version © 2023 by SPIE. Personal use of this material is permitted. However, permission to reprint or republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works, must be obtained from the copyright holder.

BibTeX entry

@inproceedings{Tan:2023,
author	= {Tan, Yicong and Mody, Prerak and van der Valk, Viktor and Staring, Marius and van Gemert, Jan},
title	= {Analyzing Components of a Transformer under Different Data Scales in 3D Prostate CT Segmentation},
booktitle	= {SPIE Medical Imaging: Computer-Aided Diagnosis},
editor	= {Colliot, Olivier and Išgum, Ivana},
address	= {San Diego, CA, USA},
series	= {Proceedings of SPIE},
volume	= {12464},
pages	= {1246408},
month	= {February},
year	= {2023},
}

last modified: 21-04-2023

webmaster

"Analyzing Components of a Transformer under Different Data Scales in 3D Prostate CT Segmentation"

Yicong Tan, Prerak Mody, Viktor van der Valk, Marius Staring and Jan van Gemert

Abstract

Download

BibTeX entry