Learning site-invariant features of connectomes to harmonize complex network measures
Figure 1. Previous research elucidated that connectomes suffer from confounding site effects. In this work we propose a data-driven model to learn disjoint site (š = {1,2}) and biological features (siteless z) for BIOCARD (orange) and VMAP (blue) (left). We then inject a prescribed site, cā, to the learned representations to compute harmonized connectome modularity, Q (right).
Abstract
Multi-site diffusion MRI data is often acquired on different scanners and with distinct protocols. Differences in hardware and acquisition result in data that contains site dependent information, which confounds connectome analyses aiming to combine such multi-site data. We propose a data-driven solution that isolates site-invariant information whilst maintaining relevant features of the connectome. We construct a latent space that is uncorrelated with the imaging site and highly correlated with patient age and a connectome summary measure. Here, we focus on network modularity. The proposed model is a conditional, variational autoencoder with three additional prediction tasks: one for patient age, and two for modularity trained exclusively on data from each site. This model enables us to 1) isolate site-invariant biological features, 2) learn site context, and 3) re-inject site context and project biological features to desired site domains. We tested these hypotheses by projecting 77 connectomes from two studies and protocols (Vanderbilt Memory and Aging Project (VMAP) and Biomarkers of Cognitive Decline Among Normal Individuals (BIOCARD) to a common site. We find that the resulting dataset of modularity has statistically similar means (p-value <0.05) across sites. In addition, we fit a linear model to the joint dataset and find that positive correlations between age and modularity were preserved.
Submitted to SPIE: Medical Imaging 2024
Keywords: Diffusion MRI, connectome, multi-site analysis, site-invariance, complex network measures
Figure 2. The data are 84 by 84 adjacency matrices weighted by the number of streamlines connecting brain regions corresponding to each matrix edge. This matrix is flattened and passed as input to the model. The reconstruction task has three components: encoding block, latent space, and decoding block. The prediction tasks are shallow networks that learn calculated, unharmonized modularity of their respective site, š. Separate model layers are used for each site. To ground the latent space with biological information, we also predict patient age.
Figure 3. Comparing distributions of modularity from BIOCARD (orange) and VMAP (blue) generated by A) computing modularity using formula on raw connectome generated from tractography, B) predicting from site-invariant latent space and trained on BIOCARD data, and C) predicting from siteinvariant latent space and trained on VMAP data. P-values correspond to t-test results comparing means of VMAP and BIOCARD distributions.