Home » Technology » Informatics tools for proteomics

Informatics tools for proteomics

A major strength of our research is a strong bioinformatics infrastructure to support work in proteomics and integration of proteomic and genomic data.  This infrastructure was developed by the research groups of David Tabb and Bing Zhang of the Department of Biomedical Informatics in collaboration with the Liebler lab and a strong information technology team developed in partnership with the Jim Ayers Institute, the Mass Spectrometry Research Center and the Vanderbilt-Ingram Cancer Center.

We use an informatics pipeline and a supporting tool suite for protein identification built entirely from open-source applications developed at Vanderbilt and utilizing resources from the ProteoWizard project.  Protein identifications from MS/MS data are done by database searching (Myrimatch), sequence tag search (TagRecon) and peptide spectral library searching (Pepitome).  Identification of posttranslationally-modified and chemically adducted peptide sequences is done with TagRecon.  LC-MS system quality control is monitored with QuaMeter.  MS/MS spectral quality is assessed with ScanRanker.  Peptide identification filtering and parsimonious protein assembly are done with the IDPicker utility.  Finally, shotgun proteome datasets are compared based on spectral count data using quasi-likelihood modeling with QuasiTel.

 

Our principal tool for designing MRM analyses and analyzing MRM data is Skyline, which was developed by Brendan MacLean in the MacCoss laboratory with support from the Vanderbilt CPTAC program.

Another suite of tools developed by the Zhang lab employ integrative bioinformatics approaches to improve protein identification and to facilitate a systems level understanding of proteomics data. A bioinformatics workflow that incorporates the CanProVar database allows the detection of cancer-related protein sequence variants from MS/MS datasets. A new pipeline enables use of RNAseq data to construct sample-specific databases for proteomic analyses (R package to be released). For systems biology studies, a set of tools are available for pathway and gene set enrichment analysis (WebGestalt), for network-based gene and protein prioritization (NetWalker), and for the detection of co-expressed gene and protein modules (ICE).

Representative references from our work and collaborations

Tabb, D. L., Fernando, C. G., and Chambers, M. C. (2007) MyriMatch: highly accurate tandem mass spectral peptide identification by multivariate hypergeometric analysis. J. Proteome Res., 6, 654-661.  PubMed

Zhang, B., Chambers, M. C., and Tabb, D. L. (2007) Proteomic parsimony through bipartite graph analysis improves accuracy and transparency. J. Proteome Res., 6, 3549-3557.  PubMed

Tabb, D. L., Ma, Z. Q., Martin, D. B., Ham, A. J., and Chambers, M. C. (2008) DirecTag: accurate sequence tags from peptide MS/MS through statistical scoring. J. Proteome Res., 7, 3838-3846.  PubMed

Kessner, D., Chambers, M., Burke, R., Agus, D., and Mallick, P. (2008) ProteoWizard: open source software for rapid proteomics tools development. Bioinformatics, 24, 2534-2536.  PubMed

Ma, Z. Q., Dasari, S., Chambers, M. C., Litton, M. D., Sobecki, S. M., Zimmerman, L. J., Halvey, P. J., Schilling, B., Drake, P. M., Gibson, B. W., and Tabb, D. L. (2009) IDPicker 2.0: Improved protein assembly with high discrimination peptide identification filtering. J. Proteome Res., 8, 3872-3881.  PubMed

Dasari, S., Chambers, M. C., Slebos, R. J., Zimmerman, L. J., Ham, A. J., and Tabb, D. L. (2010) TagRecon: high-throughput mutation identification through sequence tagging. J. Proteome Res., 9, 1716-1726.  PubMed

Li, J., Duncan, D. T., and Zhang, B. (2010) CanProVar: a human cancer proteome variation database. Hum Mutat, 31, 219-228.  PubMed

Li, M., Gray, W., Zhang, H., Chung, C. H., Billheimer, D., Yarbrough, W. G., Liebler, D. C., Shyr, Y., and Slebos, R. J. (2010) Comparative shotgun proteomics using spectral count data and quasi-likelihood modeling. J. Proteome Res., 9, 4295-4305  PubMed

MacLean, B., Tomazela, D. M., Shulman, N., Chambers, M., Finney, G. L., Frewen, B., Kern, R., Tabb, D. L., Liebler, D. C., and MacCoss, M. J. (2010) Skyline: an open source document editor for creating and analyzing targeted proteomics experiments. Bioinformatics, 26, 966-968.  PubMed

Dasari, S., Chambers, M. C., Codreanu, S. G., Liebler, D. C., Collins, B. C., Pennington, S. R., Gallagher, W. M., and Tabb, D. L. (2011) Sequence tagging reveals unexpected modifications in toxicoproteomics. Chem. Res. Toxicol., 24, 204-216  PubMed

Li, J., Su, Z., Ma, Z. Q., Slebos, R. J., Halvey, P., Tabb, D. L., Liebler, D. C., Pao, W., and Zhang, B. (2011) A bioinformatics workflow for variant peptide detection in shotgun proteomics. Mol Cell Proteomics, 10, M110 006536  PubMed

Ma, Z. Q., Tabb, D. L., Burden, J., Chambers, M. C., Cox, M. B., Cantrell, M. J., Ham, A. J., Litton, M. D., Oreto, M. R., Schultz, W. C., Sobecki, S. M., Tsui, T. Y., Wernke, G. R., and Liebler, D. C. (2011) Supporting tool suite for production proteomics. Bioinformatics, 27, 3214-3215.  PubMed

Wang, X., Slebos, R. J., Wang, D., Halvey, P. J., Tabb, D. L., Liebler, D. C., and Zhang, B. (2011) Protein identification using customized protein sequence databases derived from RNA-Seq data. J. Proteome Res., 11, 1009-1017  PubMed

Zhang, B., Shi, Z., Duncan, D. T., Prodduturi, N., Marnett, L. J., and Liebler, D. C. (2011) Relating protein adduction to gene expression changes: a systems approach. Molecular Biosystems, 7, 2118-2127.  PubMed

Dasari, S., Chambers, M. C., Martinez, M. A., Carpenter, K. L., Ham, A. J., Vega-Montoto, L. J., and Tabb, D. L. (2012) Pepitome: Evaluating Improved Spectral Library Search for Identification Complementarity and Quality Assessment. J. Proteome Res. PubMed