Electronic health records (EHR) are being increasingly utilized and form a unique source of extensive data gathered during routine clinical care. Through use of codified and free text concepts identified using clinical informatics tools, disease labels can be assigned with a high degree of accuracy. Analysis linking such EHR-assigned disease labels to a biospecimen repository has demonstrated that genetic associations identified in prospective cohorts can be replicated with adequate statistical power, and novel phenotypic associations identified.