Towards mechanistic models of antiviral immunity


Simon Frost, M.A. D.Phil.

Dept. of Veterinary Medicine, and Institute of Public Health

University of Cambridge

Introduction

  • Antiviral immunity
    • Adaptive
      • Humoral: phenotypic data, viral genotype, Ig genotype
      • (Cellular)
    • Innate
      • Balance between viral control and immunosuppression/pathology
      • How to model? What kind of data?
  • Dynamic rather than steady state or quasi-steady state

Heilmeier's catechism

  • What are you trying to do?
    • Articulate your objectives using absolutely no jargon.
  • How is it done today, and what are the limits of current practice?
  • What's new in your approach and why do you think it will be successful?
  • Who cares?
    • If you're successful, what difference will it make?
  • What are the risks and the payoffs?
  • How much will it cost? How long will it take?
  • What are the midterm and final "exams" to check for success?

Acknowledgements

  • Cambridge
    • Mukarram Hossain
  • Fundacio irsiiCaixa
    • Javier Martinez-Picado
  • UCSD
    • Sergei Kosakovsky Pond
    • Art Poon
    • Selene Zarate
    • Michael Golinski
    • Ben Murrell
    • Susan Little
    • Doug Richman
  • Monogram Biosciences
    • Terri Wrin
    • Yang Liu
    • Colombe Chappey
    • Chris Petropoulos
  • NYU
    • Gregg Silverman

Dynamics of antibody responses in HIV

Frost et al. Curr. Opin. HIV AIDS (2008)

Primary infection cohort

  • Since 1996, individuals with acute/early HIV infection have been prospectively invited to join studies at UCSD
    • Susan Little, Doug Richman
  • Based on molecular markers, the baseline visit is on average 6 weeks after infection
  • Longitudinal follow-up
    • Measurement of many biological markers

Measuring antibody responses

Between-host responses

Influenza HI data

Antigenic cartography


  • Derek Smith popularised the use of multidimensional scaling to represent HI data
  • The algorithm generates a single point estimate of the antigenic map
  • Yet there may be significant uncertainty in the map due to measurement error etc.

Smith et al. Science (2004)

Bayesian antigenic cartography

  • The model estimates $X$, a $n \times p$ matrix of coordinates in antigenic space of viruses, and $Y$, a $k \times p$ matrix of coordinates in antigenic space of plasma samples.
  • Let $y_{ij}$ denote the ${\rm log}_2$ transformed ${\rm IC}_{50}$ neutralization titer between virus $i$ and plasma $j$
  • The observed dissimilarity measure, $d_{ij}$, is obtained by normalizing the data using the maximum neutralization for each titer, $d_{ij}=\max y_{i \cdot} - y_{ij}$
  • $d_{ij}$ is assumed to follow a truncated normal distribution, $d_{ij}\sim N (\delta_{ij}, \tau),\;I(d_{ij}>0)$, where $i=1, \ldots ,n,\;j=1,\ldots,k$
  • $\delta_{ij}$ was calculated assuming a two dimensional map

Antigenic map of HIV


Map of influenza A


Focusing on 'important' mutations

Summary

  • Bayesian MDS is a useful tool in the analysis of large multivariate datasets
    • Probabilistic
    • Interpretable
    • Can include covariates
  • Other applications
    • Delfia data
    • CyToF

Within-host antibody responses

  • We can also use neutralisation assays to investigate how neutralisation changes within a single HIV-infected individual
  • We can perform a time-shift experiment, where virus and sera from different timepoints are combined


$ log_{10} (y) \sim t_v + t_p + f(t_v, t_p) $

Within-host antibody responses

Frost et al. PNAS (2005)

Variation between individuals

Frost et al. PNAS (2005)

Capturing selection pressure

  • A common bioinformatic approach to detecting selection is to compare $dN$ and $dS$:
    • $dN$: the rate of nonsynonymous or amino acid changing mutations
    • $dS$: the rate of synonymous or amino acid preserving mutations
  • The relative rates of $dN$ and $dS$ are informative about different types of selection

Evolution rate and antibody escape

Frost et al. PNAS (2005)

Evolution rate and antibody escape

Frost et al. PNAS (2005)

Modelling escape from antibody responses


$ \begin{align} \dot{N_i} & = rN_i-\sum_{j}\beta_{i-j}N_iB_j+\frac{\mu}{2}(N_{i+1}+N_{i-1})-\mu N_i\\ \dot{B_i} & = \sum_{j}\alpha_{i-j}N_jB_i\\ \text{where}\\ \alpha_{d} & = exp(-d^2/2l_\alpha^2) \; \text{cross-stimulation}\\ \beta_{d} & = exp(-d^2/2l_\beta^2) \; \text{cross-reactivity} \end{align} $

Haraguchi and Sasaki (1997)

Model results

Frost et al. PNAS (2005)

Cross-reactivity and escape

Frost et al. PNAS (2005)

Changing responses over time

Modeling temporal changes


Neutralization of viral clones: unique mutations

Neutralization of clones: convergent mutations

Neutralisation in HIV-1 infected identical twins

Viral phylogeny

Neutralization variation between viruses


Neutralization of early clones from TW2

Neutralization of late clones from TW2

Within-host phylogenies


Summary

  • Data are consistent with a model whereby HIV escapes antibody responses through a large number of mutations of small effect
  • Data support the use of neutralisation titre as a measure of fitness
  • Why does a simple (c.f. Armita Nourmohammad's model) work?
    • Virus responds to local selection pressure
    • Evolution within an individual like a walk in sequence space that can be 'unravelled'
  • More detailed data reveal higher dimensionality

B cell repertoire data

  • Ultimately, we would like to know how changes in phenotype (neutralisation) and viral genotype are linked to changes in B cell repertoires
  • Selection pressures may manifest:
    • Changes in frequency of a clone
    • Somatic hypermutation

Phylogeny of Ig clonal lineages

Frost et al. Phil Trans Roy Soc B (2015)

What can we do with clones?

  • Analysis of tree shape of individual clones
    • Diversification
    • Asymmetry
    • Local branching index
    • Ancestral sequence reconstruction
  • Dynamics of multiple lineages

Aims

  • Primary focus is on somatic hypermutation and the evolution of clonal lineages
    • Divergence from germline
    • Diversity within clone
  • Hence, need to identify germline correctly

Existing methods

  • Alignment based methods
    • Use sequence similarity
    • Example
      • Align sequence against V
      • Align remainder against J
      • Align remainder against D
  • Model based methods
    • Use a higher-level representation of similarity
      • Hidden Markov Model
      • Conditional random fields

Phylogenetic approaches

  • We modified a method previously used to assign viruses to genotypes to analyse immunoglobulin sequences
    • A variant of phylogenetic placement (Matsen)
    • Joint estimates of breakpoints and phylogenetic placement of V and J regions using an evolutionary model
    • D region identified by the maximum local alignment score (using a codon alignment algorithm) to the junction region of the query sequence
  • We search for the best models using a genetic algorithm
  • By marginalising across models, we get a probabilistic assignment

Human IGHV (F+ORF) phylogeny

Frost et al. Phil Trans Roy Soc B (2015)

Evaluation: programs

  • IMGT/HighV-QUEST v. 1.3.1
  • IgBLAST v. 1.4.0
  • iHMMune-Align (1-06-2007)
  • SoDA v1.1
  • vdjalign
  • vdj
  • Cloanalyst

Evaluation: real data

  • Datasets from genotyped individuals
    • Stanford S22 (from a single genotyped individual)
    • A set of 6329 clonally unrelated IGH rearrangements, obtained from individuals homozygous for IGHV3-23*01 and IGHJ6*02 from Ohm-Laursen et al.
  • Clonal data
    • Two datasets derived from IgD+ IgM-CD38+ B cells (n=57 and n=106)
    • 11 sequences from an HIV-infected individual (N152) the source of the broadly neutralizing antibody 10E8

Evaluation: simulations

  • Simple rearrangements (n=12,060)
    • IGHV, IGHD and IGHJ *01 alleles were concatenated
  • Rearrangements plus insertions/deletions (n=10,000)
    • Random selection of germlines
    • Length distribution of indels taken from Jackson et al.
    • Base distribution of N-nucleotides taken from Jackson et al.
    • Verification process
      • Free of stop codons
      • Contained a CDR3 region recognizable by the regular expression proposed by D'Angelo et al.
      • In-frame J region with ‘[FW]G.G’ and ‘TVSS’ motifs
  • Rearrangements plus indels and mutations
    • As above, with mutations introduced under the S5F model

antibodyo.me

Performance: simple rearrangements

## Loading required package: ggplot2

plot of chunk unnamed-chunk-1

Performance: insertions and deletions

plot of chunk unnamed-chunk-2

Performance: 40 mutations

plot of chunk unnamed-chunk-3

Performance: 80 mutation

plot of chunk unnamed-chunk-4

Visualization

  • The output of (any) germline assignment program is complex and multivariate
  • To help understand the output, we developed visualisation tools for the output

Improvements

  • Germline identification is a combination of finding breakpoints and identifying similarities
  • Our algorithm spends a lot of time finding the breakpoints
    • Greatly speeds up if one uses 'presegmented' data

Hidden Markov Models

  • Hidden Markov Models have been used for repertoire analysis in a number of studies
    • iHMMune
    • VDJfasta
    • SoDA
    • PARTIS
    • repgenHMM
  • Limitation is the Markov assumption

Simple pattern matching

  • In contrast to the complex nature of HMM models, simple pattern matching rules can be used
    • CDR3: (TT[TC]|TA[CT])(TT[CT]|TA[TC]|CA[TC]|GT[AGCT]|TGG)(TG[TC])(([GA][AGCT])|TC)[AGCT]([ACGT]{3}){5,32}TGGG[GCT][GCT]
      • Ab Mining Toolbox, D'Angelo et al. (2014)
    • J region: [FW]G[A-Z]G and T[LMT]VTVSS

Conditional random fields (CRFs)

  • Alternative to HMMs for segmenting data
  • Rather than use a 'hidden' state, features are calculated from the data
    • Downstream/upstream nucleotides/amino acids
    • More complex motifs
  • Malhotra et al. used conditional random fields to pre-segment IGH genes, and found that performance of iHMMune improved

CRFs and segmenting data

  • Used simulations of 10000 sequences with rearrangements/indels as before
  • Labelled each nucleotide by its corresponding amino acid, plus the previous amino acid
  • Included the CDR3 regular expression, plus the J region regular expressions
  • Used 90% for training, 10% for validation

Performance

plot of chunk unnamed-chunk-5

Summary

  • Our method provides a probabilistic assignment of reassorted immunoglobulin sequences to germline genes that is more robust to hypermutation than other methods
    • Can be used as a pre-processor for clone identification

Thanks!