Informatics tools for proteomics
A major strength of our research is a strong bioinformatics infrastructure to support work in proteomics and integration of proteomic and genomic data. This infrastructure was developed by the research groups of David Tabb and Bing Zhang of the Department of Biomedical Informatics in collaboration with the Liebler lab and a strong information technology team developed in partnership with the Jim Ayers Institute, the Mass Spectrometry Research Center and the Vanderbilt-Ingram Cancer Center.
We use an informatics pipeline and a supporting tool suite for protein identification built entirely from open-source applications developed at Vanderbilt and utilizing resources from the ProteoWizard project. Protein identifications from MS/MS data are done by database searching (Myrimatch), sequence tag search (TagRecon) and peptide spectral library searching (Pepitome). Identification of posttranslationally-modified and chemically adducted peptide sequences is done with TagRecon. LC-MS system quality control is monitored with QuaMeter. MS/MS spectral quality is assessed with ScanRanker. Peptide identification filtering and parsimonious protein assembly are done with the IDPicker utility. Finally, shotgun proteome datasets are compared based on spectral count data using quasi-likelihood modeling with QuasiTel.
Another suite of tools developed by the Zhang lab employ integrative bioinformatics approaches to improve protein identification and to facilitate a systems level understanding of proteomics data. A bioinformatics workflow that incorporates the CanProVar database allows the detection of cancer-related protein sequence variants from MS/MS datasets. A new pipeline enables use of RNAseq data to construct sample-specific databases for proteomic analyses (R package to be released). For systems biology studies, a set of tools are available for pathway and gene set enrichment analysis (WebGestalt), for network-based gene and protein prioritization (NetWalker), and for the detection of co-expressed gene and protein modules (ICE).
Representative references from our work and collaborations
Tabb, D. L., Fernando, C. G., and Chambers, M. C. (2007) MyriMatch: highly accurate tandem mass spectral peptide identification by multivariate hypergeometric analysis. J. Proteome Res., 6, 654-661. PubMed
Zhang, B., Chambers, M. C., and Tabb, D. L. (2007) Proteomic parsimony through bipartite graph analysis improves accuracy and transparency. J. Proteome Res., 6, 3549-3557. PubMed
Tabb, D. L., Ma, Z. Q., Martin, D. B., Ham, A. J., and Chambers, M. C. (2008) DirecTag: accurate sequence tags from peptide MS/MS through statistical scoring. J. Proteome Res., 7, 3838-3846. PubMed
Kessner, D., Chambers, M., Burke, R., Agus, D., and Mallick, P. (2008) ProteoWizard: open source software for rapid proteomics tools development. Bioinformatics, 24, 2534-2536. PubMed
Ma, Z. Q., Dasari, S., Chambers, M. C., Litton, M. D., Sobecki, S. M., Zimmerman, L. J., Halvey, P. J., Schilling, B., Drake, P. M., Gibson, B. W., and Tabb, D. L. (2009) IDPicker 2.0: Improved protein assembly with high discrimination peptide identification filtering. J. Proteome Res., 8, 3872-3881. PubMed
Dasari, S., Chambers, M. C., Slebos, R. J., Zimmerman, L. J., Ham, A. J., and Tabb, D. L. (2010) TagRecon: high-throughput mutation identification through sequence tagging. J. Proteome Res., 9, 1716-1726. PubMed
Li, J., Duncan, D. T., and Zhang, B. (2010) CanProVar: a human cancer proteome variation database. Hum Mutat, 31, 219-228. PubMed
Li, M., Gray, W., Zhang, H., Chung, C. H., Billheimer, D., Yarbrough, W. G., Liebler, D. C., Shyr, Y., and Slebos, R. J. (2010) Comparative shotgun proteomics using spectral count data and quasi-likelihood modeling. J. Proteome Res., 9, 4295-4305 PubMed
MacLean, B., Tomazela, D. M., Shulman, N., Chambers, M., Finney, G. L., Frewen, B., Kern, R., Tabb, D. L., Liebler, D. C., and MacCoss, M. J. (2010) Skyline: an open source document editor for creating and analyzing targeted proteomics experiments. Bioinformatics, 26, 966-968. PubMed
Dasari, S., Chambers, M. C., Codreanu, S. G., Liebler, D. C., Collins, B. C., Pennington, S. R., Gallagher, W. M., and Tabb, D. L. (2011) Sequence tagging reveals unexpected modifications in toxicoproteomics. Chem. Res. Toxicol., 24, 204-216 PubMed
Li, J., Su, Z., Ma, Z. Q., Slebos, R. J., Halvey, P., Tabb, D. L., Liebler, D. C., Pao, W., and Zhang, B. (2011) A bioinformatics workflow for variant peptide detection in shotgun proteomics. Mol Cell Proteomics, 10, M110 006536 PubMed
Ma, Z. Q., Tabb, D. L., Burden, J., Chambers, M. C., Cox, M. B., Cantrell, M. J., Ham, A. J., Litton, M. D., Oreto, M. R., Schultz, W. C., Sobecki, S. M., Tsui, T. Y., Wernke, G. R., and Liebler, D. C. (2011) Supporting tool suite for production proteomics. Bioinformatics, 27, 3214-3215. PubMed
Wang, X., Slebos, R. J., Wang, D., Halvey, P. J., Tabb, D. L., Liebler, D. C., and Zhang, B. (2011) Protein identification using customized protein sequence databases derived from RNA-Seq data. J. Proteome Res., 11, 1009-1017 PubMed
Zhang, B., Shi, Z., Duncan, D. T., Prodduturi, N., Marnett, L. J., and Liebler, D. C. (2011) Relating protein adduction to gene expression changes: a systems approach. Molecular Biosystems, 7, 2118-2127. PubMed
Dasari, S., Chambers, M. C., Martinez, M. A., Carpenter, K. L., Ham, A. J., Vega-Montoto, L. J., and Tabb, D. L. (2012) Pepitome: Evaluating Improved Spectral Library Search for Identification Complementarity and Quality Assessment. J. Proteome Res. PubMed