Transmission Bottleneck Size Estimation from Pathogen Deep-Sequencing Data, with an Application to Human Influenza A Virus

Ashley Sobel Leonard, Daniel Weissman, Benjamin Greenbaum, Elodie Ghedin, Katia Koelle


January 19, 2017


The bottleneck governing infectious disease transmission describes the size of the pathogen population transferred from a donor to a recipient host. Accurate quantification of the bottleneck size is of particular importance for rapidly evolving pathogens such as influenza virus, as narrow bottlenecks would limit the extent of transferred viral genetic diversity and, thus, have the potential to slow the rate of viral adaptation. Previous studies have estimated the transmission bottleneck size governing viral transmission through statistical analyses of variants identified in pathogen sequencing data. The methods used by these studies, however, did not account for variant calling thresholds and stochastic dynamics of the viral population within recipient hosts. Because these factors can skew bottleneck size estimates, we here introduce a new method for inferring transmission bottleneck sizes that explicitly takes these factors into account. We compare our method, based on beta-binomial sampling, with existing methods in the literature for their ability to recover the transmission bottleneck size of a simulated dataset. This comparison demonstrates that the beta-binomial sampling method is best able to accurately infer the simulated bottleneck size. We then apply our method to a recently published dataset of influenza A H1N1p and H3N2 infections, for which viral deep sequencing data from inferred donor-recipient transmission pairs are available. Our results indicate that transmission bottleneck sizes across transmission pairs are variable, yet that there is no significant difference in the overall bottleneck sizes inferred for H1N1p and H3N2. The mean bottleneck size for influenza virus in this study, considering all transmission pairs, was Nb = 196 (95% confidence interval 66-392) virions. While this estimate is consistent with previous bottleneck size estimates for this dataset, it is considerably higher than the bottleneck sizes estimated for influenza from other datasets.