Genotype imputation using the Positional Burrows Wheeler Transform.

Details

Ressource 1Download: 33196638_BIB_89CBA5F9A38B.pdf (1990.96 [Ko])
State: Public
Version: Final published version
License: CC BY 4.0
Serval ID
serval:BIB_89CBA5F9A38B
Type
Article: article from journal or magazin.
Collection
Publications
Institution
Title
Genotype imputation using the Positional Burrows Wheeler Transform.
Journal
PLoS genetics
Author(s)
Rubinacci S., Delaneau O., Marchini J.
ISSN
1553-7404 (Electronic)
ISSN-L
1553-7390
Publication state
Published
Issued date
11/2020
Peer-reviewed
Oui
Volume
16
Number
11
Pages
e1009049
Language
english
Notes
Publication types: Journal Article ; Research Support, Non-U.S. Gov't
Publication Status: epublish
Abstract
Genotype imputation is the process of predicting unobserved genotypes in a sample of individuals using a reference panel of haplotypes. In the last 10 years reference panels have increased in size by more than 100 fold. Increasing reference panel size improves accuracy of markers with low minor allele frequencies but poses ever increasing computational challenges for imputation methods. Here we present IMPUTE5, a genotype imputation method that can scale to reference panels with millions of samples. This method continues to refine the observation made in the IMPUTE2 method, that accuracy is optimized via use of a custom subset of haplotypes when imputing each individual. It achieves fast, accurate, and memory-efficient imputation by selecting haplotypes using the Positional Burrows Wheeler Transform (PBWT). By using the PBWT data structure at genotyped markers, IMPUTE5 identifies locally best matching haplotypes and long identical by state segments. The method then uses the selected haplotypes as conditioning states within the IMPUTE model. Using the HRC reference panel, which has ∼65,000 haplotypes, we show that IMPUTE5 is up to 30x faster than MINIMAC4 and up to 3x faster than BEAGLE5.1, and uses less memory than both these methods. Using simulated reference panels we show that IMPUTE5 scales sub-linearly with reference panel size. For example, keeping the number of imputed markers constant, increasing the reference panel size from 10,000 to 1 million haplotypes requires less than twice the computation time. As the reference panel increases in size IMPUTE5 is able to utilize a smaller number of reference haplotypes, thus reducing computational cost.
Keywords
Alleles, Computational Biology/methods, Forecasting/methods, Gene Frequency/genetics, Genome-Wide Association Study/methods, Genotype, Haplotypes/genetics, Humans, Models, Theoretical, Polymorphism, Single Nucleotide/genetics
Pubmed
Web of science
Open Access
Yes
Funding(s)
European Research Council (ERC) / 617306
Create date
16/01/2020 14:17
Last modification date
30/04/2021 6:12
Usage data