Towards mechanistic models of antiviral immunity

Simon Frost, M.A. D.Phil.

Dept. of Veterinary Medicine, and Institute of Public Health

University of Cambridge

Introduction

Antiviral immunity
- Adaptive
  - Humoral: phenotypic data, viral genotype, Ig genotype
  - (Cellular)
- Innate
  - Balance between viral control and immunosuppression/pathology
  - How to model? What kind of data?
Dynamic rather than steady state or quasi-steady state

Heilmeier's catechism

What are you trying to do?
- Articulate your objectives using absolutely no jargon.
How is it done today, and what are the limits of current practice?
What's new in your approach and why do you think it will be successful?
Who cares?
- If you're successful, what difference will it make?
What are the risks and the payoffs?
How much will it cost? How long will it take?
What are the midterm and final "exams" to check for success?

Acknowledgements

Cambridge
- Mukarram Hossain
Fundacio irsiiCaixa
- Javier Martinez-Picado
UCSD
- Sergei Kosakovsky Pond
- Art Poon
- Selene Zarate
- Michael Golinski
- Ben Murrell
- Susan Little
- Doug Richman
Monogram Biosciences
- Terri Wrin
- Yang Liu
- Colombe Chappey
- Chris Petropoulos
NYU
- Gregg Silverman

Dynamics of antibody responses in HIV

Frost et al. Curr. Opin. HIV AIDS (2008)

Primary infection cohort

Since 1996, individuals with acute/early HIV infection have been prospectively invited to join studies at UCSD
- Susan Little, Doug Richman
Based on molecular markers, the baseline visit is on average 6 weeks after infection
Longitudinal follow-up
- Measurement of many biological markers

Measuring antibody responses

Between-host responses

Influenza HI data

Antigenic cartography

Derek Smith popularised the use of multidimensional scaling to represent HI data
The algorithm generates a single point estimate of the antigenic map
Yet there may be significant uncertainty in the map due to measurement error etc.

Smith et al. Science (2004)

Bayesian antigenic cartography

The model estimates $X$, a $n \times p$ matrix of coordinates in antigenic space of viruses, and $Y$, a $k \times p$ matrix of coordinates in antigenic space of plasma samples.
Let $y_{ij}$ denote the ${\rm log}_2$ transformed ${\rm IC}_{50}$ neutralization titer between virus $i$ and plasma $j$
The observed dissimilarity measure, $d_{ij}$, is obtained by normalizing the data using the maximum neutralization for each titer, $d_{ij}=\max y_{i \cdot} - y_{ij}$
$d_{ij}$ is assumed to follow a truncated normal distribution, $d_{ij}\sim N (\delta_{ij}, \tau),\;I(d_{ij}>0)$, where $i=1, \ldots ,n,\;j=1,\ldots,k$
$\delta_{ij}$ was calculated assuming a two dimensional map

Antigenic map of HIV

Map of influenza A

Focusing on 'important' mutations

Summary

Bayesian MDS is a useful tool in the analysis of large multivariate datasets
- Probabilistic
- Interpretable
- Can include covariates
Other applications
- Delfia data
- CyToF

Within-host antibody responses

We can also use neutralisation assays to investigate how neutralisation changes within a single HIV-infected individual
We can perform a time-shift experiment, where virus and sera from different timepoints are combined

$ log_{10} (y) \sim t_v + t_p + f(t_v, t_p) $

Within-host antibody responses

Frost et al. PNAS (2005)

Variation between individuals

Frost et al. PNAS (2005)

Capturing selection pressure

A common bioinformatic approach to detecting selection is to compare $dN$ and $dS$:
- $dN$: the rate of nonsynonymous or amino acid changing mutations
- $dS$: the rate of synonymous or amino acid preserving mutations
The relative rates of $dN$ and $dS$ are informative about different types of selection

Evolution rate and antibody escape

Frost et al. PNAS (2005)

Evolution rate and antibody escape

Frost et al. PNAS (2005)

Modelling escape from antibody responses

$ \begin{align} \dot{N_i} & = rN_i-\sum_{j}\beta_{i-j}N_iB_j+\frac{\mu}{2}(N_{i+1}+N_{i-1})-\mu N_i\\ \dot{B_i} & = \sum_{j}\alpha_{i-j}N_jB_i\\ \text{where}\\ \alpha_{d} & = exp(-d^2/2l_\alpha^2) \; \text{cross-stimulation}\\ \beta_{d} & = exp(-d^2/2l_\beta^2) \; \text{cross-reactivity} \end{align} $

Haraguchi and Sasaki (1997)

Model results

Frost et al. PNAS (2005)

Cross-reactivity and escape

Frost et al. PNAS (2005)

Changing responses over time

Modeling temporal changes

Neutralization of viral clones: unique mutations

Neutralization of clones: convergent mutations

Neutralisation in HIV-1 infected identical twins

Viral phylogeny

Neutralization variation between viruses

Neutralization of early clones from TW2

Neutralization of late clones from TW2

Within-host phylogenies

Summary

Data are consistent with a model whereby HIV escapes antibody responses through a large number of mutations of small effect
Data support the use of neutralisation titre as a measure of fitness
Why does a simple (c.f. Armita Nourmohammad's model) work?
- Virus responds to local selection pressure
- Evolution within an individual like a walk in sequence space that can be 'unravelled'
More detailed data reveal higher dimensionality

B cell repertoire data

Ultimately, we would like to know how changes in phenotype (neutralisation) and viral genotype are linked to changes in B cell repertoires
Selection pressures may manifest:
- Changes in frequency of a clone
- Somatic hypermutation

Phylogeny of Ig clonal lineages

Frost et al. Phil Trans Roy Soc B (2015)

What can we do with clones?

Analysis of tree shape of individual clones
- Diversification
- Asymmetry
- Local branching index
- Ancestral sequence reconstruction
Dynamics of multiple lineages

Aims

Primary focus is on somatic hypermutation and the evolution of clonal lineages
- Divergence from germline
- Diversity within clone
Hence, need to identify germline correctly

Existing methods

Alignment based methods
- Use sequence similarity
- Example
  - Align sequence against V
  - Align remainder against J
  - Align remainder against D
Model based methods
- Use a higher-level representation of similarity
  - Hidden Markov Model
  - Conditional random fields

Phylogenetic approaches

We modified a method previously used to assign viruses to genotypes to analyse immunoglobulin sequences
- A variant of phylogenetic placement (Matsen)
- Joint estimates of breakpoints and phylogenetic placement of V and J regions using an evolutionary model
- D region identified by the maximum local alignment score (using a codon alignment algorithm) to the junction region of the query sequence
We search for the best models using a genetic algorithm
By marginalising across models, we get a probabilistic assignment

Human IGHV (F+ORF) phylogeny

Frost et al. Phil Trans Roy Soc B (2015)

Evaluation: programs

IMGT/HighV-QUEST v. 1.3.1
IgBLAST v. 1.4.0
iHMMune-Align (1-06-2007)
SoDA v1.1
vdjalign
vdj
Cloanalyst

Evaluation: real data

Datasets from genotyped individuals
- Stanford S22 (from a single genotyped individual)
- A set of 6329 clonally unrelated IGH rearrangements, obtained from individuals homozygous for IGHV3-23*01 and IGHJ6*02 from Ohm-Laursen et al.
Clonal data
- Two datasets derived from IgD+ IgM-CD38+ B cells (n=57 and n=106)
- 11 sequences from an HIV-infected individual (N152) the source of the broadly neutralizing antibody 10E8

Evaluation: simulations

Simple rearrangements (n=12,060)
- IGHV, IGHD and IGHJ *01 alleles were concatenated
Rearrangements plus insertions/deletions (n=10,000)
- Random selection of germlines
- Length distribution of indels taken from Jackson et al.
- Base distribution of N-nucleotides taken from Jackson et al.
- Verification process
  - Free of stop codons
  - Contained a CDR3 region recognizable by the regular expression proposed by D'Angelo et al.
  - In-frame J region with ‘[FW]G.G’ and ‘TVSS’ motifs
Rearrangements plus indels and mutations
- As above, with mutations introduced under the S5F model

antibodyo.me

Performance: simple rearrangements

## Loading required package: ggplot2

plot of chunk unnamed-chunk-1

Performance: insertions and deletions

plot of chunk unnamed-chunk-2

Performance: 40 mutations

plot of chunk unnamed-chunk-3

Performance: 80 mutation

plot of chunk unnamed-chunk-4

Visualization

The output of (any) germline assignment program is complex and multivariate
To help understand the output, we developed visualisation tools for the output

Improvements

Germline identification is a combination of finding breakpoints and identifying similarities
Our algorithm spends a lot of time finding the breakpoints
- Greatly speeds up if one uses 'presegmented' data

Hidden Markov Models

Hidden Markov Models have been used for repertoire analysis in a number of studies
- iHMMune
- VDJfasta
- SoDA
- PARTIS
- repgenHMM
Limitation is the Markov assumption

Simple pattern matching

In contrast to the complex nature of HMM models, simple pattern matching rules can be used
- CDR3: (TT[TC]|TA[CT])(TT[CT]|TA[TC]|CA[TC]|GT[AGCT]|TGG)(TG[TC])(([GA][AGCT])|TC)[AGCT]([ACGT]{3}){5,32}TGGG[GCT][GCT]
  - Ab Mining Toolbox, D'Angelo et al. (2014)
- J region: [FW]G[A-Z]G and T[LMT]VTVSS

Conditional random fields (CRFs)

Alternative to HMMs for segmenting data
Rather than use a 'hidden' state, features are calculated from the data
- Downstream/upstream nucleotides/amino acids
- More complex motifs
Malhotra et al. used conditional random fields to pre-segment IGH genes, and found that performance of iHMMune improved

CRFs and segmenting data

Used simulations of 10000 sequences with rearrangements/indels as before
Labelled each nucleotide by its corresponding amino acid, plus the previous amino acid
Included the CDR3 regular expression, plus the J region regular expressions
Used 90% for training, 10% for validation

Performance

plot of chunk unnamed-chunk-5

Summary

Our method provides a probabilistic assignment of reassorted immunoglobulin sequences to germline genes that is more robust to hypermutation than other methods
- Can be used as a pre-processor for clone identification

Towards mechanistic models of antiviral immunity

Simon Frost, M.A. D.Phil.

Introduction

Heilmeier's catechism

Acknowledgements

Dynamics of antibody responses in HIV

Primary infection cohort

Measuring antibody responses

Between-host responses

Influenza HI data

Antigenic cartography

Bayesian antigenic cartography

Antigenic map of HIV

Map of influenza A

Focusing on 'important' mutations

Summary

Within-host antibody responses

Within-host antibody responses

Variation between individuals

Capturing selection pressure

Evolution rate and antibody escape

Evolution rate and antibody escape

Modelling escape from antibody responses

Model results

Cross-reactivity and escape

Changing responses over time

Modeling temporal changes

Neutralization of viral clones: unique mutations

Neutralization of clones: convergent mutations

Neutralisation in HIV-1 infected identical twins

Viral phylogeny

Neutralization variation between viruses

Neutralization of early clones from TW2

Neutralization of late clones from TW2

Within-host phylogenies

Summary

B cell repertoire data

Phylogeny of Ig clonal lineages

What can we do with clones?

Aims

Existing methods

Phylogenetic approaches

Human IGHV (F+ORF) phylogeny

Evaluation: programs

Evaluation: real data

Evaluation: simulations

antibodyo.me

Performance: simple rearrangements

Performance: insertions and deletions

Performance: 40 mutations

Performance: 80 mutation

Visualization

Improvements

Hidden Markov Models

Simple pattern matching

Conditional random fields (CRFs)

CRFs and segmenting data

Performance

Summary

Thanks!