Skip to main content

Empirical Assessment of the Assumptions of ComBat with Diffusion Tensor Imaging

Posted by on Friday, May 24, 2024 in News.

Michael E. Kim, Chenyu Gao, Leon Y. Cai, Qi Yang, Nancy R. Newlin, Karthik Ramadass, Angela Jefferson, Derek Archer, Niranjana Shashikumar, Kimberly R. Pechman, Katherine A. Gifford, Timothy J. Hohman, Lori L. Beason-Held, Susan M. Resnick, Stefan Winzeck, Kurt G. Schilling, Panpan Zhang, Daniel Moyer, Bennett A. Landman, Empirical assessment of the assumptions of ComBat with diffusion tensor imaging, J. Med. Imag. 11(2), 024011 (2024), doi: 10.1117/1.JMI.11.2.024011.



Figure 1. After registration of the JHU EVE-III Atlas, mean FA values were calculated in all the regions for each participant in the silver standard cohort. A point in the experimental space is “feasible” if the sample size for either site is at least , the imbalance level does not result in  for either site exceeding the available number of participants for that site, and if sampling of participants yielded a covariate shift within 1 year of the target age difference between sites. For each feasible point in the experimental space, 10 bootstraps were subsampled from the silver standard cohort, and the FA values for the subsamples were harmonized by ComBat. The resulting parameters were then compared to those from the silver standard to determine reliability of ComBat at that location in the experimental space.



Figure 2. The root mean squared error (RMSE) of standardized  estimates for mean FA vs age compared to the silver standard indicate that ComBat is not stable with all experimental permutations considered, as the error increases when the cohort changes to have an average mean age difference between VMAP and BLSA of (A) 0 years, (B) 2 years, (C) 4 years, (D) 6 years, (E) 8 years, and (F) 10 years. The values represent the mean normalized RMSE across EVE Type-III Atlas regions averages across 10 iterations of each feasible point in the experimental space. For each subplot, total sample size of the cohort is on the x-axis and sample size imbalance is on the y-axis, where Y:10 represents Y participants at VMAP for every 10 at BLSA. Any non-feasible experimental permutations are represented in gray.


Purpose: Diffusion tensor imaging (DTI) is a magnetic resonance imaging technique that provides unique information about white matter microstructure in the brain but is susceptible to confounding effects introduced by scanner or acquisition differences. ComBat is a leading approach for addressing these site biases. However, despite its frequent use for harmonization, ComBat’s robustness toward site dissimilarities and overall cohort size have not yet been evaluated in terms of DTI.

Approach: As a baseline, we match N = 358 participants from two sites to create a “silver standard” that simulates a cohort for multi-site harmonization. Across sites, we harmonize mean fractional anisotropy and mean diffusivity, calculated using participant DTI data, for the regions of interest defined by the JHU EVE-Type III atlas. We bootstrap 10 iterations at 19 levels of total sample size, 10 levels of sample size imbalance between sites, and 6 levels of mean age difference between sites to quantify (i) β_AGE, the linear regression coefficient of the relationship between FA and age; (ii) γ*_sf , the ComBat-estimated site-shift; and (iii) δ*_sf , the ComBat-estimated site-scaling. We characterize the reliability of ComBat by evaluating the root mean squared error in these three metrics and examine if there is a correlation between the reliability of ComBat and a violation of assumptions.

Results: ComBat remains well behaved for βAGE when N > 162 and when the mean age difference is less than 4 years. The assumptions of the ComBat model regarding the normality of residual distributions are not violated as the model becomes unstable.

Conclusion: Prior to harmonization of DTI data with ComBat, the input cohort should be examined for size and covariate distributions of each site. Direct assessment of residual distributions is less informative on stability than bootstrap analysis. We caution use ComBat of in situations that do not conform to the above thresholds.