Technology Enablers for Big Data, Multi-Stage Analysis in Medical Image Processing
Shunxing Bao, Prasanna Parvathaneni, Yuankai Huo, Yogesh Barve, Andrew J. Plassard, Yuang Yao, Hongyang Sun, Ilwoo Lyu, David H. Zald, Bennett A. Landman and Aniruddha Gokhale. “Technology Enablers for Big Data, Multi-Stage Analysis in Medical Image Processing.” Big Data (Big Data), 2018 IEEE International Conference. (accepted) (acceptance rate 18.9%)
Full text: TBD
Abstract
Big data medical image processing applications involving multi-stage analysis often exhibit significant variability in processing times ranging from a few seconds to several days. Moreover, due to the sequential nature of executing the analysis stages by traditional software technologies and platforms, any errors in the pipeline are only detected at the later stages despite the sources of errors predominantly being the highly compute-intensive first stage. This wastes precious computing resources and incurs prohibitively higher costs for re-executing the application. The medical image processing community remains, to date, largely unaware of these issues and continues to use traditional high-performance computing clusters, which incur a high operating cost due to the use of dedicated resources and expensive centralized file systems. To overcome these challenges, this paper proposes an alternative approach for medical image processing multi-stage analysis by using the Apache Hadoop ecosystem and offering it as a service in the cloud. Specifically, we make the following contributions. First, we propose a concurrent pipeline execution framework and an associated semi-automatic, real-time monitoring and checkpointing framework that can detect outliers and achieve quality assurance without having to completely execute the expensive first stage of processing thereby expediting the entire multi-stage analysis. Second, we present a simulator to rapidly estimate the execution time for a given multi-stage analysis, which can aid the user in deciding the appropriate approach for their use case. We conduct empirical evaluation of our framework and show that it provides 76.75% less wall time and 29.22% less resource time compared to the traditional approach without the quality assurance mechanism.
Keywords: Hadoop, Medical image processing, Big Data multi-stage analysis, Simulator.
A classical medical image processing multi-level analysis Tract-Based Spatial Statistics and Gray Matter Surface-based Spatial statistics. Our target pipeline of this work is the first example dtiQA & TBSS pipeline. The purple box indicates our second contribution focus area.
Our proposed semi-automatic real-time monitor and checkpoint framework for (1) early error detection and (2) draw conclusion early via QA intermediate group analysis result.