Theoretical and Empirical Comparison of Big Data Image Processing with Apache Hadoop and Sun Grid Engine
Shunxing Bao, Frederick D. Weitendorf, Andrew J. Plassard, Yuankai Huo, Aniruddha Gokhale, Bennett A. Landman. “Theoretical and Empirical Comparison of Big Data Image Processing with Apache Hadoop and Sun Grid Engine”. Orlando, Florida, February 2017. Oral presentation.
Traditional large scale processing uses a cluster computer that combines a group of workstation nodes into a functional unit that is controlled by a job scheduler. Data transfer from storage to processing nodes can saturate network when data is frequently uploaded/retrieved from the NFS. An alternative approach using Hadoop and HBase was presented for medical imaging to enable co-location of data storage and computation while minimizing data transfer. Theoretical models for wall-clock time and resource time for both approaches are introduced and empirically validated. A comparative analysis is presented for when the Hadoop framework will be relevant for medical imaging.