Samuel Schmitz

Restricting the sequence space in Rosetta antibody design to human-like antibodies

Project Abstract

Sam will work on three method development projects in year 1 of his training: 1. Introducing sequence space and somatic mutation biases into Rosetta Design. This is important to design antibodies that can be elicited with a vaccine. 2. Implement a pipeline that determines antibody sequence and structural liabilities for manufacturability and patentability. 3. Implement correlated mutations in Rosetta Design. This is important to capture correlated mutations in the heavy/light chain interface and between the complementarity determining regions (CDRs) that are important for paratope shape and conformation when designing antibodies with Rosetta.

Project Update

Fast and easy antibody sequence optimization for improved expression rates by highlighting and scoring potential liabilities


Humanization, which is also referred to as reshaping, complementarity determining region (CDR)-grafting, veneering, resurfacing, specificity-determining residue (SDR)-transfer, or DeImmunizationTM, comprises strategies for reducing the immunogenicity of monoclonal antibodies (mAbs) from animal sources and for improving their activation of the human immune system. For this purpose, I’ve developed a sequence processing framework AntiBodyLiabilities (ABL), to assess the human-likeness of antibody sequences. In its current form, it is capable of detecting 13 distinct liabilities known to contribute to either manufacturability issues or immunogenicity effects. Also, ABL is accessible as a webserver, called ABLweb. Thirteen liabilities, like oxidation, isomerization, deamidation, fragmentation, or glycosylation are detected using antibody nucleotide sequence information. For this purpose, ABL integrates specialized tools for sequence preprocessing: IgBLAST, to determine the V, D, and J germline sequence for each antibody. HTJoinSolver, a state-of-the art antibody sequence partitioning analysis tool, which is able to differentiate between the highly variable Complementary Determining Region (CDR) and the rigid framework (FW) area with high accuracy. With germline and V, D, J information collected, ABL executes the search for 13 liabilities either within the CDR region or the FW area depending on liability type and importance depending on its location in the sequence. This happens either with simple pattern matches or artificial neural networks like NetNGlyc for N-linked glycosylation prediction or APPNN, a neural network for aggregation prediction.

A final humanization score is returned by counting all found liabilities and multiplying each liability with its custom weight. By now, these weights are set manually based on qualitative experience with experimental expression studies and can be changed arbitrarily. Challenge of year two of my research will be to redesign the weight based scoring system using big data and to determine correlated residues in the antibody sequence as preparation step for a new comprehensive Rosetta antibody design technique with design for improved manufacturability and elicitation as vaccine in mind.

Year 2 Roadmap

Similar to existing humanization scores, ABL returns a global score per sequence. This however is unsuitable for the challenge of improving the humanization level while maintaining the antibodies binding characteristics during (back-)mutation to a more human (germline) sequence. To overcome this shortcoming, a position specific scoring matrix (PSSM) will be implemented to allow a per residue score of the sequences. Facilitated by next generation sequencing data (NGS) of human antibody sequences, position specific residue type information can be acquired and used to determine how human-like each position within an antibody appears. With a mutagenesis rate of at least 7% in the CDR, the underlying germline gene still heavily influences the sequence space of the antibody, for what reason separate PSSMs for each V, D, J, as well as FW regions will be created. Thus, a single antibody sequence relies on up to 4 different PSSMs for precise evaluation of the human likeness. However, the great challenge of identifying and measuring how strongly each liability affects manufacturability and vaccine eligibility. To evaluate and categorize liabilities depending on its effect, experimental data is essential and has to be assembled for this purpose by tight collaboration with experimentalists.

Ultimately, ABL will be optimized to integrate experimental and next generation sequencing (NGS) data into its scoring system. Information about allowed and favorable sequence information can then be integrated into Rosetta design, to direct Rosetta’s sampling algorithms towards a vaccine friendly sequence space.

It has been shown, that correlated mutation analysis of homologous sequences can enhance the computational protein folding process significantly. Existing methods can be applied on antibody sequence data and incorporated into Rosetta’s design process accordingly. The expected merit of this approach is to incorporate knowledge about correlations between residues within antibody sequences caused by its underlying V, D, and J gene ensemble. Specialized correlation analysis of antibody sequences depending on organism and antigen might further improve the Rosetta design performance of vaccines using knowledge about important residue constellations in the interface between antigen and antibody.


Primary: Jens Meiler

Secondary: James E. Crowe, Jr.

Type of Trainee

Graduate student