The GIAB genomic stratifications resource for human reference genomes.

Détails

ID Serval
serval:BIB_2C2A1B038DFA
Type
Article: article d'un périodique ou d'un magazine.
Collection
Publications
Institution
Titre
The GIAB genomic stratifications resource for human reference genomes.
Périodique
Nature communications
Auteur⸱e⸱s
Dwarshuis N., Kalra D., McDaniel J., Sanio P., Alvarez Jerez P., Jadhav B., Huang W.E., Mondal R., Busby B., Olson N.D., Sedlazeck F.J., Wagner J., Majidian S., Zook J.M.
ISSN
2041-1723 (Electronic)
ISSN-L
2041-1723
Statut éditorial
Publié
Date de publication
19/10/2024
Peer-reviewed
Oui
Volume
15
Numéro
1
Pages
9029
Langue
anglais
Notes
Publication types: Journal Article
Publication Status: epublish
Résumé
Despite the growing variety of sequencing and variant-calling tools, no workflow performs equally well across the entire human genome. Understanding context-dependent performance is critical for enabling researchers, clinicians, and developers to make informed tradeoffs when selecting sequencing hardware and software. Here we describe a set of "stratifications," which are BED files that define distinct contexts throughout the genome. We define these for GRCh37/38 as well as the new T2T-CHM13 reference, adding many new hard-to-sequence regions which are critical for understanding performance as the field progresses. Specifically, we highlight the increase in hard-to-map and GC-rich stratifications in CHM13 relative to the previous references. We then compare the benchmarking performance with each reference and show the performance penalty brought about by these additional difficult regions in CHM13. Additionally, we demonstrate how the stratifications can track context-specific improvements over different platform iterations, using Oxford Nanopore Technologies as an example. The means to generate these stratifications are available as a snakemake pipeline at https://github.com/usnistgov/giab-stratifications . We anticipate this being useful in enabling precise risk-reward calculations when building sequencing pipelines for any of the commonly-used reference genomes.
Mots-clé
Humans, Genome, Human, Software, Genomics/methods, Sequence Analysis, DNA/methods, Benchmarking, High-Throughput Nucleotide Sequencing/methods
Pubmed
Open Access
Oui
Création de la notice
28/10/2024 14:31
Dernière modification de la notice
29/10/2024 7:21
Données d'usage