XSI-a genotype compression tool for compressive genomics in large biobanks.

Détails

Ressource 1Télécharger: 35748697_BIB_1BC1E292DED4.pdf (899.13 [Ko])
Etat: Public
Version: Final published version
Licence: CC BY 4.0
ID Serval
serval:BIB_1BC1E292DED4
Type
Article: article d'un périodique ou d'un magazine.
Collection
Publications
Institution
Titre
XSI-a genotype compression tool for compressive genomics in large biobanks.
Périodique
Bioinformatics
Auteur⸱e⸱s
Wertenbroek R., Rubinacci S., Xenarios I., Thoma Y., Delaneau O.
ISSN
1367-4811 (Electronic)
ISSN-L
1367-4803
Statut éditorial
Publié
Date de publication
02/08/2022
Peer-reviewed
Oui
Volume
38
Numéro
15
Pages
3778-3784
Langue
anglais
Notes
Publication types: Journal Article ; Research Support, Non-U.S. Gov't
Publication Status: ppublish
Résumé
Generation of genotype data has been growing exponentially over the last decade. With the large size of recent datasets comes a storage and computational burden with ever increasing costs. To reduce this burden, we propose XSI, a file format with reduced storage footprint that also allows computation on the compressed data and we show how this can improve future analyses.
We show that xSqueezeIt (XSI) allows for a file size reduction of 4-20× compared with compressed BCF and demonstrate its potential for 'compressive genomics' on the UK Biobank whole-genome sequencing genotypes with 8× faster loading times, 5× faster run of homozygozity computation, 30× faster dot products computation and 280× faster allele counts.
The XSI file format specifications, API and command line tool are released under open-source (MIT) license and are available at https://github.com/rwk-unil/xSqueezeIt.
Supplementary data are available at Bioinformatics online.
Mots-clé
Software, Biological Specimen Banks, Genomics, Data Compression, Genotype
Pubmed
Web of science
Open Access
Oui
Création de la notice
04/07/2022 13:27
Dernière modification de la notice
23/01/2024 7:21
Données d'usage