13:04:22 Awesome. Alright, well thanks everyone for being here for the talk today, um, I guess I just want to start by thanking the organizers and the staff here at KTP for organizing a great program so I'm both really excited and I think a little nervous to be 13:04:35 giving an in person talk today after about 18 months on zoom so try to bear with me if it is I try to regain my, my sea legs. 13:04:44 So my plan for today was to tell you a little bit about some of our recent work looking at evolutionary dynamics that take place in the human gut microbiome. 13:04:53 So, the gut microbiome is one of my favorite examples of a diverse and complex microbial community. I'm illustrating here with some really beautiful fluorescent microscopy images produced by Jessica Mark Welch's some of our collaborators. 13:05:16 This is showing us the spatial organization of about, 10, different species of bacteria growing in the lumen of a mouse, so you can sort of see their spatial organization relative to each other, relative to food particles another debris floating by in 13:05:22 the lumen and installing. 13:05:25 Of course, as we got into last week one of the main challenges and understanding, large communities like the gut microbiome that their structure and function emerge from the collected behavior of a large number of ecologically interacting 13:05:39 know in the human gut. Yeah, yeah sorry this is picture of a mouse we don't have good pictures from humans unfortunately but 13:05:46 yeah it's a little bit bigger I always try to be on the sort of underestimate side here but yes 10 to 13 to the 14 depending on how you any I think what I really liked about these images is to really sort of highlights this sort of enormous complexity 13:05:58 you see as soon as you zoom in and start to look at the at the microscopic level. I think it's pretty well illustrated by these this year. 13:06:05 Um, but of course a second key challenge that we talked a little bit about on Friday and I want to get into in this talk is it the individual actors in this game aren't necessarily fix that could actually evolve over time by acquiring heritable mutations 13:06:15 in there. 13:06:17 And as we know this process can happen extremely rapidly and microbes with some, our favorite examples and pathogens and laboratory experiments, showing evolutionary changes sweeping through local microbial populations on timescales ranging from yours 13:06:29 months and sometimes even days, and of course that's dope recent Delta variant is a good recent example of this. 13:06:37 So if this sort of behaviour sufficiently widespread you could imagine you would have really important implications for efforts to understand or someday manipulate diverse microbial communities like the gut microbiome is it means that these ecological 13:06:48 It means that these ecological interactions that I'm drawing in here with arrows could be constantly shifting under our feet right and Jonathan provided some really nice examples of this fact occurring and simple two and three speeches that he talked 13:07:01 about on on Friday. But in contrast to the the ecological dynamics that take place at the species level, and kind of illustrating and different colors here, we actually know relatively little about the evolutionary dynamics that occur within individual 13:07:12 species in really large microbial communities like the gut microbiome, to the extent that even many basic phenomenal logical questions still aren't really well understood. 13:07:22 So, give you some examples where the relevant timescales for within host evolution, a week, a month, whole lifetime, do we expect these strains, these resident population to actually evolve much at all. 13:07:34 Or, you can imagine that maybe a larger community is just be a better able to respond to environmental perturbations through purely ecological right 15 a button says that the species that are already there, right, rather than waiting around for one of 13:07:44 the residents drains to acquire some genetic mutation. 13:08:00 How important are factors like genetic drift horizontal gene transfer or immigration of new strains from outside the host. And finally, how do these are short term processes occurring within hosts connect to the longer term prophecies that shape genetic 13:08:14 variation process in the bigger human population, because I'd argue that without knowing the answers to these really really basic questions. Um, it's hard to know what sorts of models we should even be playing around with in the first place, or how we 13:08:22 It's hard to know what sorts of models we should even be playing around with in the first place, or how we'd go about rationally designing or interpreting data from from experiments at this early stage. 13:08:34 I'd also argue that even some really rough order of magnitude estimates from these messy and city settings and humans can be really useful for helping us prioritize feature theoretical or experiment studies. 13:08:38 So maybe give us a specific example, if you wanted to use something like Casey's synthetic any system to mimic the evolution of the gut microbiome, but what are the main features we'd have to right now we don't really know the answer to that question, 13:08:50 being able to go in and look at least once. 13:08:55 So to try to address these types of questions we've been developing approaches for going in and measuring short term evolutionary dynamics in the gut microbiome using shotgun minute genomic sequencing of human sequels. 13:09:08 So for those of you who might not be familiar with this kind of data this this involves extracting essentially the entire collection of bacterial genomes that are present in the given equal sample. 13:09:22 a big pool the short roughly 100 base parallel. Right. And so, in principle, the idea is that these data should contain enormous amount of information about the genetic diversity present and each one of these samples. 13:09:29 Since each read is effectively sampling from unique cell and population. 13:09:45 But the problem is extracting this information is still a pretty major challenge, you know. Ideally, we'd like to measure something like the the evolutionary relationships between different bacterial genomes of the same species, let's say once that caused 13:09:47 an orange here either. Those within the same physical sample or between two different vehicles samples over time or different people. 13:09:53 orange here either. Those within the same fecal sample or between two different freakin samples over time or different people. But as we all know it's it's an interesting difficult problem to reconstruct such closely related genomes purely on the basis of the short management. 13:10:03 So to try to get around this this problem, our general approach is to try to see if we can use theoretical models from population genetics, some intuition from laboratory evolution experiments to try to generate predictions directly for summary statistics 13:10:15 in the data that we can actually observe right typically frequencies of mutations in these big meta genomic see. 13:10:22 The idea is that we can try to run this process in reverse and see if we can ultimately pull out some reasonably quantitative estimates about these evolutionary forces, without first having to solve this difficult better genomic sort of a general idea. 13:10:37 Um, so my plan for today is to tell you a little bit about some of our very first steps towards towards this goal. 13:10:43 This first story that I want to talk about is joint work with ben de de Groot who's a has her own group down at UCLA, as well as Katie Pollard at UCSF and Oscar Hello jack who's here in the audience today. 13:10:56 So it's a first pass. No need to know we're really, if we could learn by analyzing a really large cohort of healthy human microbiome that had been previously sequence by the Human Microbiome Project. 13:11:08 And if you related studies to which which will get it. 13:11:11 And so, this cohort consisted of on the order of a few hundred hosts a subset of whom were sequence that two different time points, roughly six months apart. 13:11:19 So the idea then is that with this sort of experimental design these data should provide some information both about Short Term Evolution within hosts over this six month sampling interval, as well as longer term evolution across hosts in the in the broader 13:11:31 human population from the cartoon the way john here, we can see that in order to extract this information, you first have to get a handle on that the structure of the bacteria lineages that are present within a single species and just a single fecal sample. 13:11:46 This is a bit of a tricky problem since, in principle, the, the genetic structure of these resident populations is going to be saved by a combination of different factors. 13:11:54 So, one individual bacterial lineages are going to enter the host during initial colonization sometime childhood and potentially through subsequent migration events, later in life. 13:12:10 Then of course on the between host evolution, what is the range of timescales that well we'll talk about that right now just sort of conceptually, what are the things that right so we get initial lineages coming in during childhood, maybe through subsequent 13:12:22 migration events later in life. 13:12:24 Once these genomes are in there, they start to get shuffled around according to population genetic mutation selection recombination genetic drift. 13:12:32 And then of course these, these seeding lineages themselves are drawn from some broader global population, which is shaped by its own set of population jetting process, well we'll get into a little bit what those timescales barn in second. 13:12:44 What this means is that the collection of bacterial genomes that we observe any given equal sample today. Of course can reflect the mixture of all of these different process right we just got done saying we don't understand so much about them yet so it's 13:12:55 a little bit of a catch 22. 13:12:58 Moreover, as I mentioned we rarely get to observe these bacterial genomes directly, but only their their shadows so to speak filter to these big pools of short of genomic sequencing. 13:13:09 So, we can measure really well in these data. 13:13:13 Yeah. 13:13:15 Oh yeah microphone. 13:13:26 Yes, the question was, is anyone doing long read sequencing to try to deal with these problems with short reads, And the answer is, is, yes there's been a little bit less of it done in the past is becoming more and more common, you know the the study 13:13:38 I'll talk about in the middle here actually was done with one of these sort of long read techniques in this case it was a 10 x sort of synthetic long read sequencing where you barcode individual DNA fragments and little droplet then sequence it on an 13:13:51 alumina machines you can sort of put it back together after the case, it turned out, we will sort of see in that case that I would argue the long reads didn't help us so much there as you might expect because it's droplets see of course tend to have multiple 13:14:05 long reads and so it becomes kind of a computational problem of involving more recent stuff with like you know pack bio hi fi sequencing that then Callahan's done a lot of cool work with, I think, be a lot better there, as well as things like I see like 13:14:19 chromosomal confirmation capture where you actually get sort of very long range. 13:14:25 It's been very helpful. 13:14:26 The moment this study you know it's we're piggybacking on existing data so we're sort of stuck with the old hundred base pair long fragments that were there. 13:14:34 And we'll see why it was sort of helpful to start with this existing data. 13:14:39 Yeah, just kind of a technical question, what do you do when you have species that have closely related fragments of their genome but you know so you can phase the reeds between species you just throw that part out or something. 13:14:50 Yeah, exactly. Great question. So, before hopping that let me just quickly review sort of the easy case to make sure everybody's on the same page so we can measure really well and these are the data, or the frequencies of individual single be tied variants, 13:15:02 you know, for example, just by taking these short sequencing reads and line them up against reference genomes of the given species like this purple one here, and looking for a location where we observe a mixture of two different bases at that at that 13:15:13 location. 13:15:14 Right. The idea here in this example, the idea is, you know, there must have been a mutation between TNA and this particular site in the purple species, you know, and because the A's are present in three out of the five reads that lineup here. 13:15:25 We estimate that it must be present at about 60% frequency. In this particular species. And this particular equal sample some accounting noise here and how precisely. 13:15:37 But as I've you mentioned here of course, this is only sort of the simplest scenario you can imagine where you can precisely localize a read to a given species of course there are definitely many regions that are very difficult to localize the species 13:15:48 let's say an accessory gene or something like that. And so for this first part of the talk, in fact actually most of the talking and talking about today, imagine we're only looking at the most boring well behaved core genes in these where we are more 13:16:02 certain that that kind of funny business isn't going. 13:16:06 So you might argue okay when you're doing that you may be giving up a chance to observe maybe the things that are most likely to be under local election, illusion. 13:16:14 And I think that's definitely true. Um, but as we'll see, we'll be able to use these snips as sort of a tag or a marker actually infer what selections doing on other reasons that the genome even if we can't see them right so that's where that the general 13:16:27 And the ability to go in and look at those accessory jeans is like a sort of a huge open, open problem, you know, culturing and isolate sequencing now provides one really good way to get at that now that we can do a better job there but you just have 13:16:39 So we'll see their their dynamics, quite a bit about what's, what's actually happening. 13:16:51 like a list of genes that are unlikely to be undergoing a horizontal transfer or something like that. Yeah, so we sort of have a whole list of sort of filters and things we did designed to kind of minimize these effects one, one of which includes you 13:17:05 know looking at a bunch of isolates and seeing which genes Did you find an isolated different species. Other ones are based more on patterns of coverage variation within a sample and we can talk about these more towards the end but they're very important 13:17:16 but, um, I think, with a few tricks you can eliminate a lot of the really egregious examples of this. 13:17:22 but not eliminate them completely. That's that's important. 13:17:26 Okay. 13:17:29 Sounds good. Alright so what can we do with these sort of snippets the frequency estimates. 13:17:34 Well one of the simplest things we can do is actually aggregate the mutations across the genome and examine how the distribution of the single nucleotide variants varies across different hosts in our large court. 13:17:49 And so when you do this you end up observing just a huge variation in the overall levels of genetic diversity that are present in different resident populations and different hosts and different species. 13:17:59 So, here just to walk you through it. Here I'm showing you the total number of intermediate frequency single nucleotide variants that are in the backyard is mo goddess populations in about 700 different fecal samples from our from our panel right each 13:18:12 one is a little line here and the height is sort of telling you a little bit about our estimated uncertainty. 13:18:19 And for comparison I've gone in and shown you the distribution of frequencies of these mutations, for just three particular hosts from our from our panel. 13:18:28 Writing tickets it's obvious coming here these are actually snips on the core genome and actually only synonymous snips so like the most boring, non functional things you could hope to fish out here. 13:18:43 Using this sort of mapping approach on the previous slide, right. So, typically our definition of bacterial species is there no more than 5% diverged from each other, which means on a typical short read of 100 base pairs they differ by about five steps. 13:18:57 So on average that's enough to help you figure out where it goes to there of course exception corner cases that mess that up a lot but the average region is pretty well localized by, that kind of thing. 13:19:11 Okay. 13:19:17 Civilization erogenous times between the between the bugs, good, good question right so that's exactly what I wanted to would want to say next. So, as Daniel's hinting at you can actually do a relatively easy back of the envelope calculation here, based 13:19:31 on what we know about typical mutation rates and bacteria on the order of the end of the 910 the money and base pair regeneration and the typical generation time in the gut which isn't super well known but let's say it's between one and 10 generations 13:19:43 today. If you work out this calculation you can actually show that many of these hosts over here on the left, have way too many intermediate frequency variants to have accumulated over a single host lifetime. 13:19:57 Right, the exact age is a little difficult to say because we have about an order of magnitude uncertainty and both mutation rate and generation time so you can you can slice your agent years quite a bit from one direction or another. 13:20:05 But essentially we know it has to be at least an order of magnitude longer than 100. 13:20:10 And this means that we can't actually treat these populations as being descended from a single founder stream. 13:20:17 Yeah. 13:20:23 These can be as much as 10 to the five single nucleotide variants right so that's said, roughly, 10% of yeah exactly sorry I was supposed to read this out for you guys right so then what I want you to pick out here is that if you know few host have very 13:20:35 few variants right the whole distribution varies over three to four orders of magnitude. Few of them have like 10s of variants per mega base. Typical back to your genomes or maybe five minutes long, and some of them have have 10s of thousands right so 13:20:46 if you multiply it out yet they're about 5% diverge across there. 13:20:53 So huge huge very, like I said, some of these guys over here on the left. Too many snips to be accumulating over single host lifetime, suggesting that we can't treat each one of these hosts as being colonized by just a single founder stream. 13:21:06 Right, that's a bit of a bummer because it would have made our lives, much easier if we could just sort of treat every person here is having a drink. 13:21:25 Unfortunately this sort of opposite extreme also doesn't really appear to be the case either right so we can see from these two examples I've shown here and more generally from the shape of this curve. It's actually too much variability from host to host for us to be able to treat these as some large random sample from from the 13:21:29 large random sample from from the broader global population right like the analogy of like let's say a city like New York and feel the population genetics, but you can also imagine would be a way that you get colonized instead that what I want you to 13:21:42 take away from this is that the data are forcing us to consider sort of intermediate scenario which which they need and I have taken the calling all ego colonization somewhere between single and extreme versions and multi colonization, in which individual 13:22:03 can be colonized by just a few external strains 235 something like that, that enter and grow to reach appreciable frequencies within a host. And the idea here is that in this way that the large number of mutations that accumulated between these two strains 13:22:09 in the time before they enter the host are going to show up as these screen bumps in the mutation frequency spectrum. 13:22:15 Right, that's that's kind of. 13:22:22 For the most part, no. 13:22:26 I is this corrected somehow for the relative abundance of like for the size of this population and these different samples. Yeah. So all of these are looking at. 13:22:36 Let's see if I can go back to my snip page right so it's the frequency of a snip within the purple species so that of all the reads that line up to the purple species the fraction of them that have an A. 13:22:51 Within each of your different communities, wouldn't you expect that the population you're focusing on would be different relative precisely right so in this host it's maybe like 50%, a different hosts that could be 10% right and another one, it could 13:23:17 yet, right, but since we're measuring the relative frequency like a verse that estimate stays sort of stable as you change the abundance, the noise will change because you're feeling right when you expect the diversity to track. Yeah, that's, that's a good it's a good 13:23:17 good it's a good question you know it's something that people often brought up, it's something we looked into a little bit and he had to first approximation know there was not a strong correlation between the abundance of a species and whether it tended 13:23:28 to have sort of two strains, sorry, go back to my little cartoon two strains versus ones. Yeah. 13:23:33 Yeah, you might imagine the bigger species just tend to have more strands within them exactly couldn't really yeah that, to be fair, there isn't ascertain advice there which is that like they have to be present in at least one read for us to see them 13:23:46 in the first place. So we can't tell whether there are things that are floating down at like a frequency of one out of 100,000. 13:23:54 And in this data, it's actually quite hard to like, even the coverage threshold you have if you have 10 reads it's really hard to see something as president one read just because of mapping issues, things like that. 13:24:03 But I couldn't tell you that we've gone through and looked a little bit at some, some isolated data, produced by by Eric arms lab where they might have, let's say, 100 backdoor diesel goddess isolates from the same host. 13:24:15 And in those cases, you know, they have one or two examples of something like this where there's two external strains that are sitting around, you don't have any observed any evidence of like Singleton strains sitting around it. 13:24:26 Which is interesting, I would have thought there'd be some huge tail down there, at least at that sampling Definitely, yeah. 13:24:45 can stick. 13:24:47 Ben mechanistic Lee. 13:24:49 What do you think is leading to just two or three strains colonizing versus say many many more like how do you differentiate. Yeah, I think that that's a big question, right, I think both models you write down if you break down some verbal model of how 13:25:08 hosts are colonized you quickly either get to like a single stream coming in first and dominating or you get like a million of them coming in the fact you only have two is a bit puzzling to us, I'll show you some data a little bit later, suggesting one 13:25:23 potential explanation. 13:25:25 Does a quick follow up to that does this suggest something about your idea of how often migrations happen versus the initial colonization from your global population, or I guess equal violently how large and diverse your global population is. 13:25:38 Yeah, what terms of Wait wait like one or two slides will see some data, addressing. 13:25:53 And so here you're saying two strains per species and you said that you were picking, whether you're called boring genes who somehow ensure or believe that that gene is only found in that species, I'm just wondering what is there an analogous statement 13:26:15 of only two species per genus, somehow, interestingly no right so so especially these factories are a great example, there's like I don't know 10 different activities that hangout and peoples guts and actually many individual people will have more than 13:26:24 more than two backwards from that. Right, so it's a it's a different story than say like, Josh, elbow so on their experiment we're sort of a single species from each family sort of on my some niche in the gut there's some degree with these back roads 13:26:39 were multiple backdoor constraints but Hangouts. 13:26:46 I may have missed this but how are you all accounting for sequencing errors in calling these snips yeah exactly so that that that is a tough problem here I'm doing it in a very dumb way, kind of, as illustrated in these in these plots. 13:27:01 Basically we're worried about sequencing errors and so we say anything that's in this gray region here could maybe be a sequencing error, or mapping error, and we say, let's get rid of it instead what we're plotting over here is the number of things in 13:27:11 this intermediate range. Can we pick this range precisely to try to minimize the effect of sequencing, Ice Ice you're not gonna eliminate them entirely but but okay. 13:27:20 Right, so we're pretty short. In fact, you can even tell from the data like this little peek here is like sequencing air it's likely something real. It's actually not even get it getting counted in this particular number of steps per megabit because it 13:27:32 was an artist. 13:27:35 Okay. Thanks. Yeah. Oh, you see cases where you have two time points right so where you have like only one strain of species a time point one but now you have to strange time point two or even zero a time point one at a time, and point to. 13:27:52 Yeah, well we'll see some specific examples of that in just a sec. 13:27:56 Yeah, these are all great questions you guys are you guys are right on right on right on the track right so so as maybe it's getting to hear you have this sort of all ego causation scenario, it's kind of the worst possible scenario if you want to actually 13:28:07 detect genetic differences between lineages because you have to assign these mutations here in this distribution to one of these two lineages in our part. 13:28:19 Right. And this is going to be, you know, at best, it's going to be a difficult inverse problem. Because, in general, we're barely going to be able to catch these two mutations together on the same shorts, really lucky. 13:28:27 Right, but by by choosing these examples I chose these sort of on purpose because I want to try to convince you that even though this this is a really hard problem in the general case, essentially unsolvable in my opinion. 13:28:37 It also has some really easy special case right and so you could imagine in a host like host see here that two mutations that we sample at 99% frequency, totally different parts of the genome are actually highly likely to be present in the same sets of 13:28:51 cells, right just sort of by the pigeonhole principle right if two things are 99% frequency there's kind of no way to set it up such that a large number of cells don't contain both mutations. 13:29:00 And this this this motivates our our general approach here which is to try to leverage the considerable variation we observe across our large cohort and zero in only on those particularly easy samples, we can infer at least the dominant lineage and given 13:29:13 sample with a high degree of confidence. Right, so we're throwing away in this case like two thirds of 13:29:21 the important points that we can use this kind of intuition then to measure genetic differences between the dominant lineages in a way that we can start to make precise more precise mathematically. 13:29:30 Right. And the idea is to sort of model the process of sampling noise which is generating the finite number of sequencing read the lineup tool given site, then we look for really large shifts and frequency at that site from from one sample to. 13:29:41 And the idea is that these are longitudinal samples from the same host say over the six month time interval. This will represent something like a host wide sweep within the species of. 13:29:52 We could then try to choose our coverage thresholds and our frequency shift thresholds in such a way, and just discard samples that don't don't meet these criteria, until we arrive in a situation where we expect to be able to resolve just the single nucleotide 13:30:05 difference in a, in a genome white skin resolve and based on random chance from from sampling. 13:30:11 And so, it's clear from the examples I showed you that you're not every sample is going to meet the criteria necessary to do something like this. So, this is going to require some discard a lot of data and the general case. 13:30:21 But unfortunately for us the the structure of the human gut microbiome seems to have set itself up in such a way that I'm in a cohort of just a few hundred hosts if you sequence them deeply enough. 13:30:41 You can often use this approach to resolve small numbers of gender differences in thousands of different strains in let's say 40 or so the most prevalent species in this particular right and in doing so we can start to get the raw material necessary to 13:30:47 kind of bootstrap up our, our estimates about these evolution. 13:30:50 So now I just want to give you sort of a very quick bird's eye view of what kind of things that we can learn by applying this approach to the data from the human microbiome. 13:31:00 So by comparing strains taken from different hosts and these easily visible samples, we find that most of the genetic diversity circulating in most of these populations is relatively old in the sense that you know typical pairs of isolates and different 13:31:13 people are separated by thousands or 10s of thousands of steps right just like we saw on that slide looking at the diversity within people. 13:31:22 Right. And as for Dana brought up the natural thing you want to think about it's like what is this in years. Right. 13:31:28 Right. Depends what you want assume about generation times and things like that but let's say it ranges from, you know, a few hundred at the very very minimum. 13:31:34 Something like maybe thousands of typical estimates to 10s of thousands of you, if you're pessimistic. 13:31:41 Most of the variation is is synonymous suggesting that purifying selection is playing an important role here in removing deleterious mutations from these populations. 13:31:52 And finally the rates of analogous recombination appear to be sufficiently high that the correlations between snips at different locations in the genome tend to be decoupled from each other. 13:32:03 Once they're separated by more than a few beers. 13:32:15 So if you guys are interested in this last bit of jury review who's in the audience here today um has some really cool new results. Looking at the trying to quantify that the rates and dynamics of these recombination event. 13:32:18 See this by looking at sort of maybe the distribution of the double meeting frequencies. I'm conditioned on having a single, single meeting frequency. 13:32:27 So, consider this a plug to go go bother him if you're if you're interested in learning. 13:32:32 Just curious or any of these people like related to each other or living in the same households are they, how they sampled yeah so the HMP studies mostly college students from St Louis in Houston, I think if I remember correctly. 13:32:45 So yeah, so they're controlled for being related to each other and things like that. So when I say, in unrelated hosts. 13:32:52 Think of me as saying fact when I say different hosts they can be as saying unrelated host. And the second we'll talk a little bit about people in it. 13:33:05 All right. 13:33:08 Is this in Santa Barbara jurors this it a beach in the Bay Area. 13:33:13 Say it looks looks very key ITVS. Half Moon Bay. Okay. 13:33:18 Okay, so, so it's sort of bird's eye view these three features are actually pretty similar to what you observe if you go out and look at other species of environmental bacteria, nothing, nothing too special about them being in the gut microbiome at this 13:33:29 level. 13:33:31 As I mentioned before we can also use the same approach to look at genetic differences that accumulate within hosts over this six month sampling. 13:33:37 And as a first pass it's helpful to visualize these data actually aggregating our observations across the different species and examining the distribution of the total number of genetic differences between the two times. 13:33:49 Right, so just just to walk you through here these these blue points are showing you the observed within host data, they're drawn from about 800 different host species combinations, from about, say 36 of the most prevalent species in this particular community. 13:34:04 And for comparison this red distribution is showing what you get when you repeat this calculation and totally unrelated hosts going for family structure. 13:34:12 And so consistent with a lot of early work on this on this particular data set. This plot firms the two strains for the same host over time, they're much more closely related to each other, on average, then strains taken from different hosts. 13:34:24 So just to make it makes sense to treat these as relatively stable populations, for the purposes of thinking about evolution. 13:34:31 Um, but you can see by apply the data on this large scale like this this average is actually very misleading quantity. Right, so the average number of within host changes is on the order of a few hundred but you don't really observe any hosts with and 13:34:43 hit that average spot on. Right. 13:34:47 Instead, the data appear to be drawn from this broad and even multimodal distribution, which suggests they actually rise from mixture of two different processes that are occurring 13:34:57 right so in the vast majority of cases we don't observe any evidence for a host wide sweep over the six month interval in the HMP study, but in a small fraction of cases about about 2% of the total, we find these resident populations that differ by 10s 13:35:10 of thousands. 13:35:15 Right, right in line with the typical numbers of differences you observe between those. So we call the string replacement events, important going forward. 13:35:18 Since you can show using the same back the envelope calculation we mentioned before they can only be caused by the invasion of some pre existing strain from the broader human population. 13:35:28 Right. So I think, forget whether it was actually two or three who asked about whether it's telling us something about the rate of migration events but this is showing sort of directly. 13:35:37 2% of the time, or one of these right this is in samples in which the they stay simple from time to time point. Of course there are also a subset of samples that go from simple to more complicated, that are you know maybe like this with the light purple 13:35:53 only gets halfway up, right, I'd say it's kind of an even breakdown between, between those different. 13:36:00 Okay, but we can also see again by applying the data. Quick question Yes, I was just gonna ask why is tend to the two not populated. 13:36:07 Why is there some timescale or process that does not permit that to be populated and then Sri has a question. Yeah, okay. So let me ask that question by way of my next little click, which is right. 13:36:25 See this one little island out here another sort of shoulder over here, a larger fraction of these populations about 10% of the total with a smaller number of nucleotide differences between them raging ranging from one up to a few answers. 13:36:45 And so we call these modification events just to distinguish them kind of operationally from placements over here on the right. But the idea of the important part is that these represent our putative evolutionary jeans, right, which we can just start 13:36:47 to make out now because you know we're a little more certain about our stream or sorry, our sweet detection methods. Down here, right so maybe to answer your question, you know, what gets you far in this distance is sort of how many evolutionary changes 13:36:56 you can accumulate within six months, which may be as, you know, limited right part of it that's part of the question is like how many accumulate, whereas these ones get them for free because these mutations happen on long ago. 13:37:12 It's just a strange bump. 13:37:14 So if you imagine you know if we had taken people and sampled them every 20 years you know this distribution might start to migrate over here towards the right more this one would stay in that same place and so you imagine they start to get involved a 13:37:24 little bit, or this particular study this short sampling times kind of nice that big. 13:37:30 Yeah. 13:37:31 Yes, very good question. I guess two related questions to what you just said so one, is there a species level variation between who belongs in the replacement versus the modification plate. 13:37:42 Like, if you look across the species that yeah when you're mapping is there's something interesting about. 13:37:48 There is actually so we didn't talk about it much in this in this paper but but reasons, it'll become clear later in the talk we needed to look into that a little bit more. 13:37:57 And yes, there does appear to be some variation in whether a species will have 13:38:04 fact I think most of the variation is tied to both of them at the same time, there's some species that are just more likely to have both replacements and modifications and some bit less. 13:38:14 In fact, our friend backdoor HIS WILL Goddess is one of the ones that tends to have a little bit less than average and it kind of breaks down at the at the file level so so back to Rudy's seem to have a little bit less for Mickey's a little bit more. 13:38:26 And I guess maybe that leads into the second part which is are there does, especially the modification events distribution does it tell you something about the relative doubling times in see to for your different species in terms of how you can estimate, 13:38:40 because I guess it's technically a product of mutation rate and doubling time in some way right like yeah well and more importantly sweet right so so if I was able to fish out isolates and I could count the mutations in them it would tell me exactly this. 13:38:59 In this case, in order for us to see it at all remember we had to look for a suite that looked like this where it went from, let's say less than 20% frequency to greater than 80% frequency in six months, so it needs to be really beneficial enough that. 13:39:12 Right. Yeah. 13:39:14 Um, I was wondering, is there. 13:39:17 Like, when you look at. 13:39:19 I have 100 snips and they're all at 99% prevalence. 13:39:24 I can think of that as I have one strain and 99% prevalence and then 1%, another strain or I could have this snip to distribute in the population. So when principle like a huge diversity of strains, but, you know, they all have the dominant form. 13:39:39 Is that, is there a way to distinguish that isn't that a question that doesn't make sense trying to answer Yeah, if you were looking at the ones that 50% or something, then you would see whether they evolve to kind of, yeah that's it, that's a good question. 13:39:55 Yeah, that's it, that's a good question. Let me hold off on that first a few slides will see some other data that will help address this a little bit better, but two time points, it's just very hard. 13:40:06 Yeah, I was gonna ask if you implied that the standard the one is because of the limited time of six months, but based on what you know about mutation rate is that clear or is it that you probably got to this limit. 13:40:21 Within a month but that's how much local evolutionary change you can support in the first place. Yeah, so, um, yeah these are great questions I think with this particular data set, we just don't have the sample size to be able to distinguish right Actually, 13:40:33 I said six months, they're actually a sample and it's totally different times within that interval. And so you can look a little bit like is there a correlation between length of time and number of steps, not not really right so i think predominantly 13:40:45 we're observing like a 01 noise process like is there a sweep or not. And then, how many snips come up together on that so I don't think you should think of these as being 10 independent sweeps, I think I'll show you some data second to show it's like, 13:40:56 once we 10 things at once. Right, so a lot of noise and 13:41:02 you try to look at the functional functional, other, you know, annotation of the regions where those snips are anything related to feed escape from features. 13:41:14 See Paula soccer aids, you know Jensen code and policy rights or something like this yeah give you a hint of what kind of selective pressure is acted upon genomes. 13:41:27 Yeah, we tried to look at this a little bit. 13:41:35 Yeah. Next slide. Maybe, maybe we'll give you some days to think about two. 13:41:37 We did not come up with anything that interesting. It was sort of hard for us to distinguish right of course you get sort of fish proteins and things like that but the sort of no hypothesis of what we were looking for. 13:41:48 We weren't like super confident in so we didn't really want to hang our hats on like precise annotation with a certain small, small data set, coupled with the fact that like in order to get these numbers in the first place. 13:41:57 We had to like basket most of the page proteins and things like that. So, yeah, nothing, nothing super obvious came out of this initial collection that's not saying there's not something in there but I think one could do a smarter job of being like okay 13:42:12 okay I know these are boring jeans I don't even want to look at them, you know, here's the subset of things that are totally functional that but did not. 13:42:29 People are interesting that I love to talk more side, I've never had a great super great way of kind of course grading function at that next level of cross species which I think is what you really need that. 13:42:39 Yeah. Okay. Um, so here the replacement events, I understand that is the snip sort of on that side of the distribution there putative replacement events but I just wanted to be clear, this is a lower bound of, of how many replacement events right because, 13:42:53 in principle, you could have replacements with from the global pool from strains that are 10 steps apart and they would fall it on the left side of your this Yes, great lead into my next slide back. 13:43:06 You might wonder I said. 13:43:11 I said that any better. Sorry guys. 13:43:14 I told you these repeated evolutionary changes, it didn't really tell you why right so you could you could argue okay maybe these aren't really evolutionary changes, they're just replacement events involving just really closely related strains from the 13:43:25 broader population. In fact, one can find such closely related strains sometimes one goes out and looks at least in principle, it's not it's not clear yet even in principle, whether it's possible to distinguish between those two kinds of it here actually. 13:43:39 We found it useful to borrow a concept from physics and examine the the statistical behavior of these genetic changes, under under time reversal symmetry. 13:43:48 Right, so it sounds kind of complicated. The idea is actually pretty, pretty simple. 13:43:52 The idea is that under a pure replacement scenario right no matter how far apart they are from each other, we'd expect that many of the population genetic properties of the differences, separating these success of strains should be statistically invariant 13:44:05 if we reverse the direction of time, since essentially we're just drawing a set of pre existing strains from some, some fixed rate. Right, so the strings themselves came in for specific time dependent reasons, but but things like the you know how many 13:44:17 synonymous and autonomous mutations are involved, shouldn't really depend on the, on the 13:44:23 contrast, most models of local evolution that you've write down tend to single out of preferred direction of time, because most new mutations are going to be biased away from the, from the global consensus. 13:44:36 That's that's sort of the idea right so so the test this idea. 13:44:40 One of the ways to do it. 13:44:41 We went in and took the entire ensemble of mutations associated with those modification events all the little blue dots and asked how often we observed the sweeping variant across other hosts in our large goal. 13:44:53 And this is sometimes though is the prevalence of a mutation so this would be the prevalence distribution of the mutations associated with those modifications. 13:45:00 Right, one of those hosts that had 10 snips from time to time point to have 10 little entries in this. 13:45:07 So, in this case you can convince yourself that if we reverse the direction of time here amounts to flipping the prevalence distribution about its central axis right you're slipping which one took over over which one. 13:45:19 And so when you do this you notice we can see this pronounced asymmetry here and the observed data does actually significantly higher than you expect just by chance. 13:45:26 If you randomly scramble the direction of time, individual host. And I think for me, this is one of the main ways we were able to convince ourselves that these modification events represent true evolutionary. 13:45:38 In fact, there's some additional analyses, you can do which I want to get into based on private, sharing a private marker snips between time points to further convince yourself that evolution is probably the more likely. 13:45:52 Okay, yeah, just a silly sanity check if you do this for the replay for the steps and the replacement events, then everything is Yeah. 13:46:02 Finally, we're really interested in seeing how these two, two properties of replacement a modification play out on on longer timescales and the six month time interval HMP study gets I think this stuff is question a little bit, you know, ideally, I would 13:46:15 sort of would love to have an example like the rich landscapes lines and the gut microbiome, where you have some single, you know, archive record of fecal samples from somebody over a couple decades. 13:46:25 You don't have that but but as a crew proxy we tried to get this question in kind of a similar way. By comparing the microbiome so about 200 adult twins, who'd been sequence in a previous heritability studies where they got microbiome is heritable. 13:46:40 And for our purposes that the idea here is that, if these two twins were colonized by similar strains in childhood, and the differences between them today which we could actually measure right should reflect a record of all the different events that accumulated 13:46:51 along these two lineages in the 20 to 40 years or so since their microbiome started to diverge. 13:46:58 So interestingly when we run this analysis on this new cohort, as well as a much smaller sample of pediatric twins as a control for this initial assumption here, we see that the rate of external string replacement jumps to about 90% amongst the adult 13:47:12 twins, very close to the observed unrelated host distribution. And I think, interestingly, sort of consistent with what you'd expect if you extrapolated this superset estimate from short timescales across this this longer sampling. 13:47:27 So I think this this real I'm pretty interesting. So this suggests that any benefits of local adaptation do these people have evolutionary events, fixing and he's modification events I told you before, I'm don't compound indefinitely and prevent future 13:47:41 Don't compound indefinitely and prevent future invasions from from occurring. Since we see that on a long enough timescale. These replacement events, typically went out for over local. 13:47:49 Yeah. 13:47:54 Yeah. So this. 13:47:56 Yeah, exactly. Unfortunately we're very limited by data people people do not give you that much sequencing of kids, they do a lot of sequencing of infants to establish transmission. 13:48:05 So this is one study I think it had maybe four or five sets of twins. 13:48:10 They range in age, there were some that were like six months some that were five or six years and some that were like 20 years. So, this estimate is just one of them all together. 13:48:21 Compatibility on longer time scales, if you if you take the sort of rates you're seeing for the number snips that are fixing in the individuals with a short time scales, and you've translated into how big gigantic distances you would expect on the, I 13:48:37 mean not just 10s of time. 13:48:47 10s of your time scales but thousand year time scales at least hundreds when global mixing those in the same ballpark or is it that you have much more happening locally, then you see on the far as long term differences. 13:48:53 Um, that is a good question. Um, 13:49:01 six months, times two will take you to a year and then times let's say 1000 years so 2000 times bigger than 10. Yeah, It's like vaguely consistent with the. 13:49:13 If you multiply this by 2000, you're kind of almost getting in this range. If you had to make me guess, I actually would say these are based on data I'll show you the second. 13:49:21 These are going to be sort of over estimate so the long term right because we end up see anyone see a lot is a reversion later on. So, 13:49:30 yeah, 13:49:50 it. 13:49:54 Yeah, yeah. These are all women I guess Hello. 13:49:58 Yes. Alright, so the question was, um, yeah we're thinking about things going on these longer time scales. 13:49:59 yeah, yeah, that is that is 13:50:04 Right. 13:50:05 Right. Um, you know, a natural question is why is the replacement events so much higher and adult twins right you could imagine a scenario where I think what this rules out is that this previous case was like a small subset of posts that are just like 13:50:18 extra susceptible and get replaced many many times, I don't know, they, they travel a lot, or something like that, because everybody eventually gets. 13:50:24 But the natural question is Is this because like everybody has like every year you take antibiotics or every couple years you take antibiotics each one has a chance of generating one of these flips, or not, I think that that's the crucial question what 13:50:36 we'll get into that in a second. Yeah. 13:50:38 Yeah, I have actually a question related to this slide which you scroll back to the older slide was even even older. 13:50:47 Yeah, this one. So this red bell bell shaped curve, which is between hos you try to compare it to the overall diversity which you see if you kind of sample replicates from databases. 13:51:00 In other words, what I'm trying to understand if it's representative for species wide, average divergence which, if I read it right as about 1%. Yeah, exactly right so this is definitely in line if you just take real isolates, get the same guy that was 13:51:15 an important. 13:51:17 So, roughly speaking between course diversity is representative of overall species diversity so it's not like, for the most part sampling from some different sub population or something. 13:51:30 Yeah, for most of these Bacteroidetes anyway for many, many of those pieces that's for their subsets. 13:51:35 Little bit. 13:51:38 That's kind of an open question. 13:51:42 For many of them like Andrew Miller's done some really cool work showing that you know some of these species are sort of specific to like humans vs chimps versus forgetting the last one. 13:51:53 Anyway, but others are not right. Some of the have spores and can throw travel between. So, the exact extent of that is it's not totally in down but yeah I'd say for the most part like don't be too surprised by this number this is just boring 1% divergence 13:52:08 What what fraction of like if I were to start a Supra typical super Of course today, what fraction of my microbiome would actually survive that in other words like do I need to think about this result that you showed us as something that only happens 13:52:21 because of contemporary medicine or should I think about it as something that would have happened 200 years ago also. Yeah, great question. Perfect transition to the next part of it for next part of the talk. 13:52:30 All right, right so so the. 13:52:33 Yeah, all these questions come up from this to time for analysis. It's true, right, I raise more questions I think it answers. The main thing I want you to take away from it is that 13:52:43 native populations of human gut microbiota can acquire genetic differences on these human relevant timescales right didn't really know that going in now I think this data shows that it does, and that it emerges via this this mixture of these events that 13:52:52 look like string replacement and events look like, then host evolution, it's important to be able to distinguish between the really crucial questions you know now that you know these changes can occur, right, you guys are raising all the good questions. 13:53:07 One of the population genetics this process actually actually look like. 13:53:11 How important are these factors that I mentioned before selection draft recombination and driving these changes that we are serving statistically here. 13:53:18 Are they always to make the media talent question. Are they always driven by sudden environmental changes like antibiotics or could they just be rising due to continual evolution against the host environment or other members of the actually think of this 13:53:30 as being perturbation driven evolution or or just sort of continuous. 13:53:36 What would these sweeps look like if we could zoom in and kind of look under the hood here, right, what are the typical selection strengths that are involved is always selection on novo mutations or gene transfer events or or selection previous thing 13:53:49 very, and finally to the answers to these questions matter depending on whether we're talking about events that look like screen replacement or events look like modification right there's some additional things you can learn about these questions by by 13:54:00 really picking apart this to time point data, but it's good to really answer these questions we need some some denser longitudinal. 13:54:07 Yeah, Ben. 13:54:20 Okay, 13:54:23 before you move on, that actually the thing that I thought was most interesting here was this two to three strains per species. 13:54:33 It's not one. 13:54:35 And most of them. It's not a lot. It's some like small number. Not to me. 13:54:42 My feeling is that doesn't have anything to do with evolution that's that's that's ecology, somehow, and I was wondering if you had any thoughts on that like so it's like one idea. 13:54:51 This is where my question becomes like a comment, and you can tell me what you think you know when ideas that there is this kind of consistent ecology in the gut, that we're getting SF feed out in these fecal samples, but there might be some gradient 13:55:04 of something like, how much free oxygen there is along the gut. and so there could be this. 13:55:11 You know splitting of the same niche and that ecology by between a strain that is more adapted to very low oxygen vers versus you know micro robot conditions, you have any thoughts on that or like how we would try and determine if that was the case. 13:55:26 Yeah, that's a great question. I think that the data I'm about to show you, provides us I think a little bit more evidence exactly in the direction of your hypothesis. 13:55:35 At this stage, just from the samples I showed you so far. Of course the competing hypothesis is just, well, you know, some kind of colonization effect like to students got in there. 13:55:43 They're basically neutrally equivalent and so they're just kind of sitting in there until something happens right so some kind of bottleneck driven effect. 13:55:56 This sort of data you can't totally distinguish between the two but the one I'll show you in a second I think really, really tells us what a strong. 13:56:02 Yes. 13:56:02 Thanks. So I completely agree. I think that some of the most interesting takeaways for me to about the selection though I'm sort of just trying to think a bit more deeply about possible false negatives, I guess, and I mean do you think you know so I guess 13:56:18 you're not really focusing or maybe, I guess you don't have the power to look at soft sweeps or things that aren't actually completely fixing, but there has been a few papers saying that that's sort of how evolution proceeds in a lot of these systems 13:56:32 that and I guess, do you think like you're missing out on parts of the selection story or do you have reason to believe that that's, you know, probably not what's happening and this is more complete. 13:56:42 Yes, and we'll see some data and the next slide that that's an example of just that effect. Another interesting observation I'll just point out is that of course, yeah, Jay and Tammy had this really beautiful study and backwards for jealous looking at 13:57:06 phenomena, doing Iceland, and they see again a lot of repetition that since they're able to look more at the gene level they can even show that they have enough recurrent mutations to say that it's positive selection just based on that, that data. Interestingly, if you sort of go in and look at their mutations. Most of the ones that are under the most 13:57:11 of the ones that are under the most current selection aren't involved in these kind of sweep events they catch them at actually relatively low frequencies in the sample maybe two, three out of, out of 40. 13:57:21 So, so yes i there we could be missing tons of adaptive evolution going on, sort of, at lower frequencies all we're talking about here are things that are sort of sweep through a large enough fraction the population not a complete suite, right but but 13:57:35 a large fraction. And do you think that you might also be missing out on our bronze or gene clusters that you sort of filtered out in the bioinformatics process, because you're talking about a lot of gene families that of course makes sense to exclude 13:57:50 but at the same time they might also be good target selection in this process right. Yeah, well, let me show the next slide and then we can maybe go back to that question. 13:58:01 Okay, so we need some dense longitudinal sampling right so just to give you an example here's, here's some data that we're currently wrapping up analyzing from from collaboration with like Snyder's lab, showing the abundances of different species and 13:58:13 the trajectories of individual mutations within species from a single host who was sampled 19 different times before, during and after a two week course of broad spectrum antibiotics right so this gets to get the obvious question is a DOCSIS like rather 13:58:28 than 13:58:28 Just, just to orient you here right so each one of these colored lines is showing you the frequency population frequency trajectory in a species of a different single nucleotide mutation, these three of the 36 species variable to track and host, and for 13:58:54 I've just gone ahead and shown you the overall relative abundance trajectory of each species above and gray or comparison on a large scale. Alright so this is a, you know, n equals one study right single host. 13:58:59 But even with data from just a single host the sort of time course nature actually ends up telling us quite a bit about how these microbial communities respond and recover from antibiotics at the genetic level. 13:59:10 Right, so before telling you about that I want to take a step back and just quickly remind you what we've learned about antibiotic perturbations and the gut microbiome from previous species level comparison. 13:59:27 the idea that really severe cases antibiotics can urge our native flora and allow nasty things like like C. diff. 13:59:28 Um, but actually, people are maybe a little more surprised learned, like a typical or dose tends to produce more resilient response. And this is the illustrator here with some some beautiful early work by let's def lesson and David Relman back from 2011. 13:59:41 And so what lesson David found using Cipro in this case is that ordinary oral antibiotics can drive really big changes in the species composition of a person's microbiome on timescales of days, but actually much, many of these communities recover much 13:59:56 of their baseline composition. In the next few weeks after treatment as. 14:00:02 And this is typically visualized by, by, you know projecting the species abundances down onto this sort of PCA like plot here, but in fact you can really observe this effect directly from the species abundance trajectories, over time, right, big perturbation 14:00:13 during antibiotics, this part looks a lot like that part by. 14:00:18 Right. Moreover, the recoveries personalized and in the sense that two different hosts tend to recover closer to their initial baseline compositions, rather than towards some generic human microbiome. 14:00:29 So observations like these tell us that at least at the species level gut microbiota can be pretty resilient to brief environmental perturbations caused by it but I guess it's all he was getting to these data rates and really natural questions about how 14:00:43 this species level of resilience is actually implemented at the stream. 14:00:48 Right, so like our species like this blue one here which decline and abundance during a robotics and back. Are they able to do so because some new strain and that species colonizers from outside the host or as the original strain able to persist in some 14:00:59 kind of spatial refuge or hidey hole and then expand again once you know conversely species like this purple one here that that maintain high been during treatment, do they do so because they already have less sensitivity to antibiotics or do they rapidly 14:01:13 evolve antibiotics. During that that that fight. 14:01:18 Sure just about the timescale yeah is that, telling about the ecology of the rebound or is that just how long antibiotics were administered, a little bit of both. 14:01:28 I'll show you some more data from our time course it speaks that a little more directly so in this case the relaxation I think occurs pretty fast ours relaxes on a bit longer scale longer than the half life of Dr cycle anyway so that that one separated 14:01:41 It's like lifting weights that that one separated out. But yeah, in general, both of these are implicated right you'd like antibiotics shift some species but then if other people depend on them through cross feeding it takes a little while to recover. 14:01:53 Okay. 14:01:54 Right. So now that we have these sort of mentioned omics tools and data we can we can find a look under the hood this kind of answer very basic questions. 14:02:03 Okay, so we're turning to our data set here, um, you know, consistent with what you might expect these data show that antibiotics can drive rapid changes in individual species and community. 14:02:14 And actually the much higher rate than we observed in healthy hosts from it, two slides ago. 14:02:18 But, but now that we can look under the hood, we can, you know, follow these mutations directories and see that actually most of the time these changes don't really appear to be consistent with the simple extinction and recolonisation picture that we 14:02:32 often have in mind going a priori right so for just to give you an example here this is the piece find goalie population here experiences the genetic shift that could be consistent with the population bottleneck during treatment. 14:02:40 But of course, it actually many other species we observed genetic changes happening within species that are able to maintain relatively high abundance. 14:02:53 Throughout this easier, and this is the case both for the sea bacterium elegans example up at the top, as well as the fast Clark the bacterium example, the middle. 14:02:56 In fact, both of them if you look at it, they don't really change much at all and relative abundance over the same time that their genetic composition is. 14:03:06 Okay, this extinction recolonisation picture doesn't really seem to be the whole story. 14:03:12 But these need to also seem to be kind of inconsistent with the simplest bottles of antibiotic resistance evolution that you might be you might write down as we're sort of embodying by this cartoon model of a selective sweep, starting from a from a new 14:03:25 mutation. It's clear just you know if you compare the cartoon and the data by that the data look really different from the card. 14:03:31 Right, so instead because we're actually able to follow these mutations directories backwards in time, we're actually able to see that many of the sweeping variants were already president, low or intermediate frequencies. 14:03:42 months before antibiotic treatment and experience really rapid expansions, during or after treatment. 14:03:51 Okay. 14:04:05 In this case, actually we can use the slopes of these mutations directories here try to estimate the typical selection strings involved. And when you. 14:04:08 Yeah. 14:04:10 No I don't, I don't understand the logic of that because you've got these strange so that the stream there that stream the crucial thing could be that it picks up some other mutation, or some variant within that stream that comes up, and so most of what 14:04:23 that stream was there before isn't isn't there and it is it somewhat How do you distinguish between what's the Nova and what and what isn't right so innovations are big enough that you're having they all have invitations coming in all the time right all 14:04:39 possible. Yeah, so I'm not sure there. Yeah, so I think the question is this distinction between selection on hapa types versus selection on patients and technically all I'm showing you here are mutations we identified to change over the time course and 14:04:53 I said those mutations are present. Early on, at some intermediate frequency. I didn't tell you that I'm actually don't know how the double negatives work here but you could imagine that you know, sorry, maybe let me back up just to make sure everybody's 14:05:07 on the same page, the inference here is that when you see a large number of mutations traveling together like this. This means they're all linked together on the same sets itself. 14:05:15 Right. So this gets to Michael's question maybe a little bit earlier. I think this is being a totally scrambled diverse population versus just like two strains, they're changing relative answer here in this case. 14:05:28 Right now I've got your quote right so Daniels asking then you have these two strains sitting at once at 5% frequency the others at 95%. 14:05:35 Maybe the low frequency one acquired some communication that took over within the strain and then rose up, and that's what's driving. 14:05:42 Yeah, totally could happen, right, all we're saying is that the snips that sweep were actually present at an intermediate different sort of puts in numbers if it's a single mutation they will have them all the time that's that's somehow want us to do 14:06:12 yeah, so I think there's I think there's sort of two interesting things here right so so one is that. 14:06:17 Yeah, two interesting things here. One is that we see this happen at decent fraction of the time in the minority stream. So if it's really de novo mutations actually there five and 95% frequency expected to happen more often than the opposite. 14:06:31 Right, we see that little again that's as an entertainment bias there but we see a lot of these minority guys jump up, right, and not as many ones where it's like sitting around at 5050, and it gets totally right so that's a little weird, um, The other 14:06:46 parts. I'm going to forget what I was gonna say, 14:06:52 Okay, I forget, I remember I'll, I'll get back to you. Oh yes the other part yes about the Nova mutations right they're occurring very fast right but in order to see them on these timescales they have to sweep through the population. 14:06:59 And that's sweep time, it's worth pointing out can be actually quite long in a really big population. It's a 1% benefit patient than the sweep time is like one over x times log in sn is really 10 to the 14, you work that out, it works out to like weeks 14:07:18 of sweeping right so the fact that you're we're only looking for changes that happen on these sort of timescales, maybe necessarily forced us to look at very weird subset of. 14:07:27 That's it. Yeah. Thanks Andrew. 14:07:32 I guess related to your interpretation and Daniel's question. 14:07:37 There should be a mutation that goes from a very very low frequency lower than this sort of sub population Clayton so it seems like one thing you could do is go back and sequence, super densely before and after the antibiotic to see if you can identify 14:07:54 that one thing that you really can't find before and you can find afterwards. Yeah. And in fact, that's actually what was done in this particular data set I didn't tell you but they they spent a lot of money on the table. 14:08:12 Earlier in the day, they very tightly clustered around this, anything you don't see that many examples that are consistent with like a total to. Tammy has a question for us I want to say that like maybe these mutations aren't adapted mutations right maybe 14:08:18 the whole land of alumina sequencing. So for some of these species we really can say of all the things that are high frequency here, what are their frequencies. 14:08:27 if there's like multiple strains or substrates within a person's microbiome because I mean we see the like pretty large cohorts right of mutations coming up, especially in this view bacterium here. 14:08:39 So these are some substrates that maybe are just slower growing right they're more like four sisters. 14:08:45 Maybe they're able to reach higher abundance, while the antibiotic is really present. 14:08:51 And, and then once the antibiotic is removed their, their out competed by the fast borrowers, I'm just wondering if that's a reasonable hypothesis for this data. 14:09:00 Yes, I think that's exactly how I would interpret this this too and that's actually the point I want to make that sorry just, just to get everybody up to speed because no Tammy I think you're exactly you're exactly on the right track. 14:09:12 Right, so, sort of, right we see these really strong selection pressures, you can write whatever estimate their fitness effects they're they're pretty strong. 14:09:18 I think the interesting thing to me is actually despite these really strong selection pressures hundred percent per day 30% per day way bigger than we see in things like the Lenski lines. 14:09:27 A few of these variants actually managed to take over their respective populations. And if you know now we can follow them forward in time, we find that many actually returned back towards their baseline levels in the next couple of months after treatment 14:09:40 as well. 14:09:41 Right. 14:09:42 Moreover, it's not like they they just like go back down and go extinct they actually go back down almost sort of stabilize at values pretty close to where they where they started out. 14:09:53 Yeah. 14:10:06 Yeah. So, this is pretty interesting and I'm wondering, and based off. also what time he was saying. 14:10:13 Could this represents some sort of vague, trade off, where essentially you have strains that are resilient to the antibiotic. 14:10:22 But otherwise, ecologically kind of crappy. So, before the antibiotic they're suppressed by some other stream that's competing with with them. When you give the antibiotic this sort of stronger, the stronger competitor dies out, but then you know I mentioned 14:10:54 it comes back again and then this Brazilian stream gets suppressed again is that what do you think is going on in a lot of these yeah yeah so so whatever I'll skip over this slide where I convince you these two examples are not are not special case that 14:11:00 we see this a lot of different species in this particular host and the sort of hypothesis that auction and anytime you're talking about here is actually. 14:11:04 Maybe there's this is telling us that there's quite a bit of additional ecological structure within these species, which is acting to stabilize these mutation frequencies against these brief environmental perturbations caused by antibiotics or it's the 14:11:17 interpretation is okay maybe these things were here not because they're neutral and just drifting like that, but because actually the minority stream has an advantage while rare and it's able to pop up to 5% frequency, relative to the other one. 14:11:30 Antibiotics somehow changes the environment shifts that okay the frequency goes up, and then when treatments removed, it's able to go back down and not get eliminated but but stabilize it back in its, its original about. 14:11:42 Right, so yes i think that that's definitely the sort of hypothesis of what I would think is going on here. I just point out yet. Tammy NJ, found a really nice example of this in their backyards particular study they were no isolates they were actually 14:11:55 able to go in and show that this is, this is really what's going on right here we just have it from observational data. 14:12:01 So I think together you know these different lines of evidence are sort of telling us yeah there might be a lot of additional ecological structure within species here. 14:12:09 I think what these new, new examples tell us is that this behavior can happen not just for pairs of very distinctly related strains like a bacterium elegant example which they have like 10,000 steps between them big differences in gene content, but also 14:12:28 for cases like this fiscal Arctic bacterium example below. There's only seven steps here across the genome and likely diversified within the host or it's somewhere in its local, which I think we're just some interesting questions about you know how you 14:12:36 can get this sort of stable ecological structure preserved across such big variation genetic distance, right, it's not totally clear, most models you write down could give you a lot of this kind of stuff, but not a lot of this kind of stuff. 14:12:48 You write down a model that gives you a lot of this kind of stuff, you almost get too many species come out coming up for free so it puts you in this interesting intermediate regime I think for us theorists to think about, and I hope I want to end my 14:13:04 Okay. 14:13:06 talk by by telling about one such mom. 14:13:10 Right, so this is zero to 100%, and this is a log scale I think going from like 100% down to 10 to the minus three so this guy's around 10 to the minus two. 14:13:21 Yeah, they're both around 10 their minds to. 14:13:24 Yeah. 14:13:36 Yeah, so we're trying to throw those out here in this, this particular data set, we actually did have a little bit a long range linkage information but unfortunately, it's not longer than like the link scale of like an operation on it plasmid. 14:13:42 Someone was ignorant, would you pick up plasmids or other kind of mobile things in produce to this get thrown out the first step of sequencing and all this stuff. 14:13:49 So, yeah, this data set know there's some really nice studies out there using IC chromosome confrontational capture stuff that lets you look at that and i think that's that's super cool. 14:13:57 So how should I interpret seven snips, as in terms of phenotype yeah again so this maybe gets to Tami's comment too. I don't believe that any of these snips are actually like functionally relevant for the adaptation that's going on here I think how we 14:14:11 should think about this the example I like to give is sort of like Plato's cave right where you're mainly watching like the shadows of like something interesting going on somewhere else in here we're almost using that to our to our advantage right that 14:14:22 because we can catch linked snips, you can estimate things like the selective pressure on the whole applet type, even though we can observe the cause of the mutation itself. 14:14:32 And what it tells us then is that it's not even really an interesting question to talk about its selection pressures on these individual steps because they're all linked together, right so it helps us sort of zero in. 14:14:44 I ask is, you made a distinction between closely related, and far related. And if it's not a phenotypic. I just interpreted that as saying that closely related phenotypes but oh sorry closely related genetically. 14:14:57 Okay, yeah. And there's ways we sort of tell us apart right, which we can talk about afterwards involving private. 14:15:05 Okay. 14:15:06 So I want to switch gears, we're running out of time. Alright switch gears a little bit. 14:15:10 So far we've actually been talking about evolution happening within individual species, almost independent the fact that they're together in a community right we sort of have collected that feedback. 14:15:20 I think a really important question going forward is whether these evolutionary changes actually matter at all for things that I probably colleges care about it for the ecological or functional structure of. 14:15:31 In other words do these evolutionary changes tend to alter ecological interactions between species as they're observed to do and some simple laboratory experiments bottom micro organisms and Jonathan's talk spoke to this on Friday to. 14:15:43 Where are these evolutionary changes only marginally relevant, the species level right perhaps because it really big community speech tend to inhabit very separate logical. 14:15:53 One challenge is that while some isolated examples of these kind of interactions have been identified as their global impact is about a lot more difficult to quantify because we expect the answer to vary a lot depending on the specific mutation we're 14:16:03 looking at the identity of the species involved and what other species that that focus species has to 14:16:11 know I think interestingly, this sort of melanoma approach you've been talking about provides a way to maybe try to get this question, at least set a statistical level, and by asking where are these examples of evolution we detect with our method, are 14:16:22 associated with bigger shift and species composition between the same two time points right which we can measure it exactly the same, better genomic needs. 14:16:29 I just to give you an example of what we might expect to see here. 14:16:32 Here's one particular host from RHMP cohort, in which we detected string replacement events within two species in this particular host and we can see that a lot of other species ended up going extinct and a few new ones invaded over the same. 14:16:48 There's another host which where we didn't affect any evolutionary any genetic changes, and we also observed a much more modest shift and species abundances. 14:16:57 So these are obviously just to cherry picked examples I picked out for you but one of our former undergrad summer undergraduates lien Rosenfeld, some really great work last summer to show that when you look across this broader HMP cohort. 14:17:08 We find some preliminary evidence that both that these genetic changes between species both string replacement events and evolutionary modifications early statistically associated with bigger ships in community is competition between the same time. 14:17:23 So here I'm just showing you some really crude Jensen Shannon distances and posts in which we've detected at least one modification or replacement. 14:17:31 As you can see there's a ton of overlap in these distributions here, we observe a small systematic trend towards slightly higher distances and post devolution modification event and slightly higher ones and host with a with a stranger, maybe as we can, 14:17:43 you know, quantify this trend by comparing it to another model in which we we randomly scramble these genetic changes across different eligible species, post. 14:17:53 This allows us to start to quantify the relationship between these two properties, while still allowing for the sort of complicated correlations and the ecological shifts on their own which we don't we don't really understand. 14:18:07 Okay, so I think, interestingly here. 14:18:10 You know, we see this trend is choosing committee structure don't appear to be solely driven by frequency increases in the focal species, as you might expect from this kind of simple cartoon sky and beads and that's the one that increases the frequency. 14:18:24 Here I'm showing you the the measured fold changed relative abundance for each of the species in which we detect it. 14:18:30 And what I want you to notice here is that many species have small or even negative changes in relative abundance over the same time interval on which they acquire this genetic. 14:18:40 This is maybe similar to the observations from the antibiotic time course, a data set, where we saw these genetic changes weren't really easily predictable just based on the species below. 14:18:52 Instead, the changes appear to be a bit more global in nature, and that the genetic changes with sorry director of within species is associated with more frequent extinction events in other species in the community, even in cases where the frequency the 14:19:06 focal species declines by. 14:19:06 And so here's a sort of one example to give you something concrete to look at. Here's one particular host in which we detected a modification event in one backward Easter course population, which actually then declined that little bit in frequency, mostly 14:19:18 stay the same. 14:19:19 And when we all saw the two other species went extinct. At the same time, one from the same backdoor these family which I'm calling in blue here, and one from a completely different microbiome. 14:19:30 And we're currently trying to think about ways in which would make sense to try to quantify or the species that go extinct systematically more or less related to the species that that acquired that. 14:19:42 Yeah. You seem to be asking this question maybe I missed it about the change the evolutionary change is causing a change in ecological structure and the idea here, the implicit assumption here is that when I have a species go extinct there's some change 14:19:53 in the ecological structure. Yes, sorry we're defining a logical structure to be the species composition. There's many ways one can define a function. 14:20:13 But what could be happening is that whatever metabolic process that guy was taking care of beforehand is now taken care of by another species, and the sort of metabolic properties of the system are invariant. Yes. Yeah, right. Okay, maybe I should use better maybe she's better terminology there's sort of ecological 14:20:17 maybe I she's better terminology there's sort of ecological structure and multiple levels that sort of the species, the taxonomy composition level, at the functional level maybe you would find that all of these ones have anomalously short functional distance 14:20:31 right so for example if you looked at, you know, the time, the time dependence of the number of reads mapping to specific enzymes, which people have of course done. 14:20:37 Yep, that might be stable. 14:20:39 Yeah, exactly. Yeah, people have done that a little bit in the gut, I was not been as convinced by those data as another systems because they mostly find that like yeah, the rhizome content per cell is pretty stable over time. 14:20:51 But, but one could do a better job there right in front of course screen. Yeah. 14:20:56 Casey, kind of, I don't know, maybe I ever heard this conversation he mentioned that, that there's some effort to measure the chemical environment at the same time as doing the sequencing is that stuff that's been done in the past or is is brand new, 14:21:08 and we just don't have the answers at all. This is. Yeah, I mean like might have a little mix kinds of data sets, you're just like, you know what is the pH and the oxygen concentration just the dominant stuff not Yeah, I think that this sort of problem 14:21:19 here yeah people are definitely doing these experiments that sort of problem is I see this that you know you want to make contact with human observational data you really want to do those measurements in humans, that's a really hard to most of the really 14:21:30 good measurements are in my said I think you can learn really good things by that but I think you're setting a slightly different 14:21:37 unrelated to that but I guess I'm trying to follow your argument, are you making the argument that there may be a causal direction from extinctions to the, yes, no, or. 14:21:48 That's the question. Okay. 14:21:50 space that's opened up, given that extinction is preceded you exactly right so so I think this is the Yes, the key question here which is, you know, with the observational data I showed you. 14:22:15 It's really difficult to identify the causal direction this effect. Right I drawn these cartoons that may be implied that you know like this first scenario that genetic changing the species changes ecological structure, one could imagine that similar 14:22:25 correlations could also rise indirectly if you had a common environmental change. 14:22:34 And so I think you know the only way I've tried to think about this very carefully locates reversal things we can maybe do. I think the only way to really establish the causal direction this effect is to set up replay experiments like like Jonathan talked 14:22:45 about right where you isolate these things you repeat them together and you show this genetic change really did that, you know, ideally what would one would want to set up such experiments and systems like EC talked about last week. 14:22:58 Obviously the downside is you have to do this in humans to shift in vitro environments or. 14:23:02 Yeah, 14:23:05 as a sort of follow up, are the data sets rich enough that you see similar extinction events across horse. And can you then trace what rises in corresponding to a particular extinction. 14:23:16 So rather than focusing on what rises, if you focus on extinction events, can you ask the spectrum of replacements that occur following any given species going extinct. 14:23:25 Yes, that sort of speaks to sort of what we are hoping to do right like is it that you get a genetic change and only guys in your same family go extinct because you're sort of more metabolically similar yeah so those are questions were sort of hoping 14:23:38 to quantify here. 14:23:40 I'm not sure the data set. There's more data now available than what we were able to run here, maybe there's enough of a sample size but I'm a little. 14:23:48 Yeah, a little leery there. 14:23:50 I think you could do the same thing experimentally by kind of selecting for a big library mutations. 14:23:57 Yeah. 14:24:00 Okay. I'm, find the last little bit I want to get to right the other way one can maybe try to gauge at least the plausibility of this first scenario is to ask whether you tried to with theoretically right you can ask whether whether it's similar behaviors 14:24:10 just generically emerge, really simple models of evolving microbial communities where we don't have to worry about these things right where we know all of the ecological interactions and environmental conditions are headed. 14:24:21 And so to try to answer these questions we've also been spending a lot of time developing a new theory for trying to predict how evolution influences both the logical and logical to occur in kind of one of the simplest possible models of the microbiome 14:24:35 microbiome more cells compete for some suitable resources, they're just continuously supplied by. 14:24:40 Of course this is an example of one of the resource models that oncology introduced last week. 14:24:46 Civilization version of the model so you know, we think about assume that individual cells take up these different resources at different genetically encoded rates as they can then acquire mutations that alter the resource uptake rates. 14:24:59 Right, so if you squint, maybe you can take my word that this this sort of maybe it's a minimum model of a host microbiome that neglecting really potentially important features, very simplified growth dynamics, like Lex things like cross feeding or spatial 14:25:11 structure in the gut. So in all of which are interesting and eventually hopefully you 14:25:18 know, several people in this room have done some really great work. Over the past five years or so, showing how one can use these kinds of models to understand purely ecological policies like community assembly, particularly in scenarios involving large 14:25:30 numbers of coasting streets, right what you get at this model just by increasing number of metabolites in the environment. 14:25:37 And the key lesson I take away from this work is that you know one can learn quite a bit about these large communities by considering completely random collections of screens. 14:25:45 And once you draw these resource uptake rates from some, some common statistical. 14:25:50 Not to try to understand the impact of evolution we've been playing around with some ways of trying to extend this kind of thinking, to try to focus on the very first steps that evolution takes in a, in a newly assembled. 14:26:01 You could imagine this is maybe the regime you want to be in if you want to start to make contact with experiments of the type that that Casey talked about together a random collection of strains and see what happens on. 14:26:11 So the idea here is to, you know, one could imagine assembling random communities of varying initial complexity and then following the states have a whole ensemble of first step mutations that alter the resource uptake rates of these strains in a different 14:26:23 way. 14:26:24 And then by averaging over these two ensembles you know who's there and what happens to them. Well, the goal is to drive something like, like a scaling relation, or to quantify how the typical business benefits or the ecological impact that these First 14:26:35 Nations scale it's a function of let's say the taxonomic diversity or the metabolic overlap and in the surrounding get a sense of what's what's possible. 14:26:45 Okay, so of course there's many different ways and modeling mutations we don't actually have that many experimental constraints on them yet. but what I want to leave you with, I think, interestingly some some preliminary computer simulations using really 14:26:57 simple kind of knockout mutation scheme bind with a global energy budget, similar to that that is talked about the rd reveal some pretty interesting counterintuitive behavior that's at least qualitatively similar to some of the behavior we saw this microbiome 14:27:12 data. 14:27:14 Um So the basic idea with this sort of mutation scheme here is that by, you know, eliminating a pathway, a cell is able to reallocate some it's limited resources to other metabolites that. 14:27:24 And this could maybe provide the fittest benefit if it happens to occur in the right in the right conditions. 14:27:29 And so here I'm just showing you two different realizations of this model for communities with 15 and 50 coexisting species respectively, both with 50 metabolites. 14:27:40 And what this is supposed to show us is that it's relatively easy to find sets of parameters and such a model in which a large fraction of all successful knockout mutations actually managed to coexist with their parents straight. 14:27:52 This is maybe not so surprising in the in this topic sample here right 15 species 50 resources you can imagine there's a large number of metabolic niches left to exploit. 14:28:02 I think really interestingly, we see this process, ongoing diversification process continues to happen, even in saturated ecosystems like this bottom where several other species have to be driven to extinction at the same time, this mutation invades and 14:28:16 coexist with its rights as maybe sort of qualitatively similar to some of these sort of stable ecological sub structures within species that we saw on the antibiotic time course study. 14:28:28 Yes. Super simple model stable community Yep, exactly no no complicated chaotic. 14:28:35 Moreover, we can all see that that these changes often lead to smaller or even negative changes in the abundance of the focal species, right, which we can characterize very precisely this model as Daniel said, No, no chaos or anything, which is least 14:28:48 sort of reminiscent of the observational data that Layton was looking for. 14:28:52 And so what we're currently working on some analytical calculations using using replica theory tricks to try to see if we can understand the mechanisms that are generating this sort of counterintuitive behavior analytically which we think should be possible 14:29:03 now given the, the very simple way in which we set up. 14:29:08 Yeah. 14:29:09 Yes, but he'll. 14:29:12 As far as I remember, coexistence in this model where, where you have this broker budget is is fairly simple to understand it, because as long as you're within this context. 14:29:25 So on and so so yeah sorry I should have been a little, I should have been a little more precise so the individual strains, do not have the same energy budget, but when a strain gets a mutation. 14:29:35 It reallocates stuff within its own energy budget. Yeah, thank you for bringing it up very crucial difference. Yeah, so it's not like Ned's model where you can have infinite super saturation. 14:29:45 In this model, you really can only have it most 50 species coexist. 14:29:51 On 50 resources. I see so this, so the mutant and the parent of the same budget so they have the same budget right so you can imagine mutations that just increase your budget. 14:29:59 Those are relatively you can show they always outcompete their parents, but these are mutations that eliminate a pathway redistribute your resources but same budget. 14:30:12 No, this is showing that a decent fraction of the time they managed to invade some unique even, you know, displacing other people. 14:30:30 This is really only limited to the knockout limitations because they are sufficiently, the ancestral strain and descendants reigns becomes sufficiently different. 14:30:30 Did you try to do this experiment where imitations just slightly modified coefficients, she in or are in your case. Yeah, because in this case I would expect that it's unlikely to get a quote exists because there is a very strong nice overlap with the 14:30:46 changes are small, precisely yeah so we chose this mutation scheme precisely to get this question so you can show theoretically right if you have an infinitesimal change and you're our vector, you always be the ecological effects come in at next order 14:31:03 right so you need to change, you need to look at it sort of financially. We like this knockout one because in these communities. 14:31:06 You know everybody's using you know some large number of resources and so an individual knockout is also kind of a small change at some level, but it's also big that it really does sort of knock out a whole pathway and exactly where that crossover is 14:31:19 how big you have to get, it's kind of something we still want to play around. 14:31:25 Okay. 14:31:27 All right, so I'm earning over time here so I just want to wrap up. 14:31:31 I'm happy to talk more about this yeah during during our cookie break. 14:31:36 So to summarize here I showed you some data to convince you that native populations of human gut bacteria can acquire can evolve with unhealthy hosts on human relevant timescales through mixture of, you know, external string replacement and evolution 14:31:49 of Resident strains. 14:31:51 I also showed you a little bit of data in the second part to maybe convince you that the ecological and evolutionary process in these communities are inherently intertwined. 14:32:00 On the one hand we saw this data showing that the genetic responses to antibiotics, provide some further support for you know Ben and Tami's hypothesis here that there may be a lot of additional ecological structure within species in these complex, which 14:32:12 I think is super interesting and immediately raises questions about why these dishes are already filled by some dedicated species, which, you know, this is the human gut there's plenty of species to go around. 14:32:21 Seems like it'd be pretty easy to do that. 14:32:24 It seems like it'd be pretty easy to do that. We also found some preliminary evidence that genetic turnover within species is correlated, not necessarily causative of global shifts in species double composition in the same two time points. 14:32:35 Well, not necessarily in the balance of the disease itself. 14:32:39 And finally we saw the least some of these qualitative behaviors could be recapitulated in really really simple embarrassingly simple resource competition models evolving in the high diversity, which may be makes some exciting opportunities to try to 14:32:51 use theory to understand the mechanisms behind this really can control. 14:32:57 Okay so without I just want to thank my collaborators here particularly Nita my co author on the HMP study in the beginning, Morteza rude Gar who generated all the data for the biotic time course, middle, and then of course Layton Rosenfeld who spearheaded 14:33:09 that correlation analysis I, I talked to you about at the end. 14:33:13 And, Yeah, be happy to talk more questions. 14:33:32 At were those snips are located on the chromosomes or is there any biases with respect to say origin of replication which might involve some reconciliation processes which tend to happen near or you know for application. 14:33:49 Yes, we looked at this a little bit in the context of like these sorts of pictures so actually the one example I told you I didn't really mention this during the talk, but, you know, this one has 10,000 snips scattered across the genome. 14:34:01 This one has seven snips scattered across. 14:34:03 This one has 100 snips So somewhere in between, but actually they're all in the same whatever five right so it's exactly sort of one of the horizontal gene transfer kind of events that you've looked at before where we think you know this this strain acquired 14:34:16 some chunk it was diverged at the typical rate that you see between two different strains but just on a subset of the genome.