XSI-a genotype compression tool for compressive genomics in large biobanks.
Details
Serval ID
serval:BIB_1BC1E292DED4
Type
Article: article from journal or magazin.
Collection
Publications
Institution
Title
XSI-a genotype compression tool for compressive genomics in large biobanks.
Journal
Bioinformatics
ISSN
1367-4811 (Electronic)
ISSN-L
1367-4803
Publication state
Published
Issued date
02/08/2022
Peer-reviewed
Oui
Volume
38
Number
15
Pages
3778-3784
Language
english
Notes
Publication types: Journal Article ; Research Support, Non-U.S. Gov't
Publication Status: ppublish
Publication Status: ppublish
Abstract
Generation of genotype data has been growing exponentially over the last decade. With the large size of recent datasets comes a storage and computational burden with ever increasing costs. To reduce this burden, we propose XSI, a file format with reduced storage footprint that also allows computation on the compressed data and we show how this can improve future analyses.
We show that xSqueezeIt (XSI) allows for a file size reduction of 4-20× compared with compressed BCF and demonstrate its potential for 'compressive genomics' on the UK Biobank whole-genome sequencing genotypes with 8× faster loading times, 5× faster run of homozygozity computation, 30× faster dot products computation and 280× faster allele counts.
The XSI file format specifications, API and command line tool are released under open-source (MIT) license and are available at https://github.com/rwk-unil/xSqueezeIt.
Supplementary data are available at Bioinformatics online.
We show that xSqueezeIt (XSI) allows for a file size reduction of 4-20× compared with compressed BCF and demonstrate its potential for 'compressive genomics' on the UK Biobank whole-genome sequencing genotypes with 8× faster loading times, 5× faster run of homozygozity computation, 30× faster dot products computation and 280× faster allele counts.
The XSI file format specifications, API and command line tool are released under open-source (MIT) license and are available at https://github.com/rwk-unil/xSqueezeIt.
Supplementary data are available at Bioinformatics online.
Keywords
Software, Biological Specimen Banks, Genomics, Data Compression, Genotype
Pubmed
Web of science
Open Access
Yes
Create date
04/07/2022 13:27
Last modification date
23/01/2024 7:21