Towards mechanistic models of antiviral immunity
Simon Frost, M.A. D.Phil.
Dept. of Veterinary Medicine, and Institute of Public Health
University of Cambridge
Introduction
- Antiviral immunity
- Adaptive
- Humoral: phenotypic data, viral genotype, Ig genotype
- (Cellular)
- Innate
- Balance between viral control and immunosuppression/pathology
- How to model? What kind of data?
- Dynamic rather than steady state or quasi-steady state
Heilmeier's catechism
- What are you trying to do?
- Articulate your objectives using absolutely no jargon.
- How is it done today, and what are the limits of current practice?
- What's new in your approach and why do you think it will be successful?
- Who cares?
- If you're successful, what difference will it make?
- What are the risks and the payoffs?
- How much will it cost? How long will it take?
- What are the midterm and final "exams" to check for success?
Acknowledgements
- Cambridge
- Fundacio irsiiCaixa
- UCSD
- Sergei Kosakovsky Pond
- Art Poon
- Selene Zarate
- Michael Golinski
- Ben Murrell
- Susan Little
- Doug Richman
- Monogram Biosciences
- Terri Wrin
- Yang Liu
- Colombe Chappey
- Chris Petropoulos
- NYU
Dynamics of antibody responses in HIV
Frost et al. Curr. Opin. HIV AIDS (2008)
Primary infection cohort
- Since 1996, individuals with acute/early HIV infection have been prospectively invited to join studies at UCSD
- Susan Little, Doug Richman
- Based on molecular markers, the baseline visit is on average 6 weeks after infection
- Longitudinal follow-up
- Measurement of many biological markers
Measuring antibody responses
Between-host responses
Influenza HI data
Antigenic cartography
- Derek Smith popularised the use of multidimensional scaling to represent HI data
- The algorithm generates a single point estimate of the antigenic map
- Yet there may be significant uncertainty in the map due to measurement error etc.
Smith et al. Science (2004)
Bayesian antigenic cartography
- The model estimates $X$, a $n \times p$ matrix of coordinates in antigenic space of viruses, and $Y$, a $k \times p$ matrix of coordinates in antigenic space of plasma samples.
- Let $y_{ij}$ denote the ${\rm log}_2$ transformed ${\rm IC}_{50}$ neutralization titer between virus $i$ and plasma $j$
- The observed dissimilarity measure, $d_{ij}$, is obtained by normalizing the data using the maximum neutralization for each titer, $d_{ij}=\max y_{i \cdot} - y_{ij}$
- $d_{ij}$ is assumed to follow a truncated normal distribution, $d_{ij}\sim N (\delta_{ij}, \tau),\;I(d_{ij}>0)$, where $i=1, \ldots ,n,\;j=1,\ldots,k$
- $\delta_{ij}$ was calculated assuming a two dimensional map
Antigenic map of HIV
Map of influenza A
Focusing on 'important' mutations
Summary
- Bayesian MDS is a useful tool in the analysis of large multivariate datasets
- Probabilistic
- Interpretable
- Can include covariates
- Other applications
Within-host antibody responses
- We can also use neutralisation assays to investigate how neutralisation changes within a single HIV-infected individual
- We can perform a time-shift experiment, where virus and sera from different timepoints are combined
$
log_{10} (y) \sim t_v + t_p + f(t_v, t_p)
$
Within-host antibody responses
Variation between individuals
Capturing selection pressure
- A common bioinformatic approach to detecting selection is to compare $dN$ and $dS$:
- $dN$: the rate of nonsynonymous or amino acid changing mutations
- $dS$: the rate of synonymous or amino acid preserving mutations
- The relative rates of $dN$ and $dS$ are informative about different types of selection
Evolution rate and antibody escape
Evolution rate and antibody escape
Modelling escape from antibody responses
$
\begin{align}
\dot{N_i} & = rN_i-\sum_{j}\beta_{i-j}N_iB_j+\frac{\mu}{2}(N_{i+1}+N_{i-1})-\mu N_i\\
\dot{B_i} & = \sum_{j}\alpha_{i-j}N_jB_i\\
\text{where}\\
\alpha_{d} & = exp(-d^2/2l_\alpha^2) \; \text{cross-stimulation}\\
\beta_{d} & = exp(-d^2/2l_\beta^2) \; \text{cross-reactivity}
\end{align}
$
Haraguchi and Sasaki (1997)
Model results
Cross-reactivity and escape
Changing responses over time
Modeling temporal changes
Neutralization of viral clones: unique mutations
Neutralization of clones: convergent mutations
Neutralisation in HIV-1 infected identical twins
Viral phylogeny
Neutralization variation between viruses
Neutralization of early clones from TW2
Neutralization of late clones from TW2
Within-host phylogenies
Summary
- Data are consistent with a model whereby HIV escapes antibody responses through a large number of mutations of small effect
- Data support the use of neutralisation titre as a measure of fitness
- Why does a simple (c.f. Armita Nourmohammad's model) work?
- Virus responds to local selection pressure
- Evolution within an individual like a walk in sequence space that can be 'unravelled'
- More detailed data reveal higher dimensionality
B cell repertoire data
- Ultimately, we would like to know how changes in phenotype (neutralisation) and viral genotype are linked to changes in B cell repertoires
- Selection pressures may manifest:
- Changes in frequency of a clone
- Somatic hypermutation
Phylogeny of Ig clonal lineages
Frost et al. Phil Trans Roy Soc B (2015)
What can we do with clones?
- Analysis of tree shape of individual clones
- Diversification
- Asymmetry
- Local branching index
- Ancestral sequence reconstruction
- Dynamics of multiple lineages
Aims
- Primary focus is on somatic hypermutation and the evolution of clonal lineages
- Divergence from germline
- Diversity within clone
- Hence, need to identify germline correctly
Existing methods
- Alignment based methods
- Use sequence similarity
- Example
- Align sequence against V
- Align remainder against J
- Align remainder against D
- Model based methods
- Use a higher-level representation of similarity
- Hidden Markov Model
- Conditional random fields
Phylogenetic approaches
- We modified a method previously used to assign viruses to genotypes to analyse immunoglobulin sequences
- A variant of phylogenetic placement (Matsen)
- Joint estimates of breakpoints and phylogenetic placement of V and J regions using an evolutionary model
- D region identified by the maximum local alignment score (using a codon alignment algorithm) to the junction region of the query sequence
- We search for the best models using a genetic algorithm
- By marginalising across models, we get a probabilistic assignment
Human IGHV (F+ORF) phylogeny
Frost et al. Phil Trans Roy Soc B (2015)
Evaluation: programs
- IMGT/HighV-QUEST v. 1.3.1
- IgBLAST v. 1.4.0
- iHMMune-Align (1-06-2007)
- SoDA v1.1
- vdjalign
- vdj
- Cloanalyst
Evaluation: real data
- Datasets from genotyped individuals
- Stanford S22 (from a single genotyped individual)
- A set of 6329 clonally unrelated IGH rearrangements, obtained from individuals homozygous for IGHV3-23*01 and IGHJ6*02 from Ohm-Laursen et al.
- Clonal data
- Two datasets derived from IgD+ IgM-CD38+ B cells (n=57 and n=106)
- 11 sequences from an HIV-infected individual (N152) the source of the broadly neutralizing antibody 10E8
Evaluation: simulations
- Simple rearrangements (n=12,060)
- IGHV, IGHD and IGHJ *01 alleles were concatenated
- Rearrangements plus insertions/deletions (n=10,000)
- Random selection of germlines
- Length distribution of indels taken from Jackson et al.
- Base distribution of N-nucleotides taken from Jackson et al.
- Verification process
- Free of stop codons
- Contained a CDR3 region recognizable by the regular expression proposed by D'Angelo et al.
- In-frame J region with ‘[FW]G.G’ and ‘TVSS’ motifs
- Rearrangements plus indels and mutations
- As above, with mutations introduced under the S5F model
Performance: simple rearrangements
## Loading required package: ggplot2
Performance: insertions and deletions
Performance: 40 mutations
Performance: 80 mutation
Visualization
- The output of (any) germline assignment program is complex and multivariate
- To help understand the output, we developed visualisation tools for the output
Improvements
- Germline identification is a combination of finding breakpoints and identifying similarities
- Our algorithm spends a lot of time finding the breakpoints
- Greatly speeds up if one uses 'presegmented' data
Hidden Markov Models
- Hidden Markov Models have been used for repertoire analysis in a number of studies
- iHMMune
- VDJfasta
- SoDA
- PARTIS
- repgenHMM
- Limitation is the Markov assumption
Simple pattern matching
- In contrast to the complex nature of HMM models, simple pattern matching rules can be used
- CDR3:
(TT[TC]|TA[CT])(TT[CT]|TA[TC]|CA[TC]|GT[AGCT]|TGG)(TG[TC])(([GA][AGCT])|TC)[AGCT]([ACGT]{3}){5,32}TGGG[GCT][GCT]
- Ab Mining Toolbox, D'Angelo et al. (2014)
- J region:
[FW]G[A-Z]G
and T[LMT]VTVSS
Conditional random fields (CRFs)
- Alternative to HMMs for segmenting data
- Rather than use a 'hidden' state, features are calculated from the data
- Downstream/upstream nucleotides/amino acids
- More complex motifs
- Malhotra et al. used conditional random fields to pre-segment IGH genes, and found that performance of iHMMune improved
CRFs and segmenting data
- Used simulations of 10000 sequences with rearrangements/indels as before
- Labelled each nucleotide by its corresponding amino acid, plus the previous amino acid
- Included the CDR3 regular expression, plus the J region regular expressions
- Used 90% for training, 10% for validation
Performance
Summary
- Our method provides a probabilistic assignment of reassorted immunoglobulin sequences to germline genes that is more robust to hypermutation than other methods
- Can be used as a pre-processor for clone identification