Game theoretic methods for antibody design
Infectious diseases pose an increasing threat due to a combination of globalization, poor treatment management, and evolutionary treatment escape (that is, mutations that escape drug therapies or antibodies elicited by vaccines). The goal of this proposal is to develop highly scalable methods for sequence-based design of antibodies for HIV that exhibit very broad recognition of diverse isolates, leveraging a combination of game theoretic modeling, computational structure design, scalable sequence search, and scalable bi-level combinatorial optimization. The computational design in the combinatorial antibody and virus sequence space could create a new paradigm for developing new biologics by enabling the design of antibodies with increased breadth, high potency, and high similarity to antibodies in the human repertoire.
Co-crystal structure of the antibody VRC01 complexed with the HIV envelope protein GP120, i.e., the native antibody and virus (Left), and the game theoretically designed antibody with the escaping virus (Right). The designed and the native antibodies have escape cost of 7 and 1 respectively. The designed antibody keeps binding to the virus even after 7 mutations from the native, while there is a single mutation after which the native antibody no longer binds to the virus.
The current goal is the sequence-based design of broadly binding antibodies (that bind to a high fraction of virus sequences in a given panel).
We start with the antibody VRC23 and a set of 180 virus sequences (the virus panel), similar to GP120. We identify a set of binding sites on the antibody and the virus sequences respectively. Next, we use Rosetta to generate binding and stability scores data, by making 5 random mutations on the antibody side and then computing scores with the 180 virus sequences.
We learn a linear SVM classifier to predict binding, using the antibody and virus binding sites and amino acids as the features. Our classifier has ~70% accuracy on test data, using 10 fold cross validation. Similarly, we learn a linear regression model to predict stability scores. The accuracy is ~0.85 correlation coefficient with test data, using 10 fold cross validation.
Using these models we solve a mixed integer linear program (MILP) to design new antibodies with maximum breadth on the panel. Specifically, we train our models on a subset of the data (corresponding to a fraction of viruses out of the 180, the “training dataset”) and design a breadth-maximizing antibody by solving the MILP using these models. We evaluate the effectiveness (breadth) of the designed antibody by testing on the held-out data, using models learnt on the entire dataset (corresponding to all 180 virus sequences). We repeat the above procedure for different training data size, and multiple random samples for a given training data size. Finally, we evaluate the set of designed breadth-maximizing antibodies using Rosetta modeling.
With the proposed algorithm we exhibit stable antibodies with ~99% breadth compared to 60% in case of the native VRC23 (in terms of Rosetta computations).
Primary: Yevgeniy Vorobeychik
Secondary: Jens Meiler
Type of Trainee