Assessing species coverage and assembly quality of rapidly accumulating sequenced genomes.

Détails

Ressource 1Télécharger: giac006.pdf (2659.38 [Ko])
Etat: Public
Version: Final published version
Licence: Non spécifiée
ID Serval
serval:BIB_1A9AE144B7E8
Type
Article: article d'un périodique ou d'un magazine.
Collection
Publications
Institution
Titre
Assessing species coverage and assembly quality of rapidly accumulating sequenced genomes.
Périodique
GigaScience
Auteur⸱e⸱s
Feron R., Waterhouse R.M.
ISSN
2047-217X (Electronic)
ISSN-L
2047-217X
Statut éditorial
Publié
Date de publication
25/02/2022
Peer-reviewed
Oui
Volume
11
Pages
giac006
Langue
anglais
Notes
Publication types: Journal Article ; Research Support, Non-U.S. Gov't
Publication Status: ppublish
Résumé
Ambitious initiatives to coordinate genome sequencing of Earth's biodiversity mean that the accumulation of genomic data is growing rapidly. In addition to cataloguing biodiversity, these data provide the basis for understanding biological function and evolution. Accurate and complete genome assemblies offer a comprehensive and reliable foundation upon which to advance our understanding of organismal biology at genetic, species, and ecosystem levels. However, ever-changing sequencing technologies and analysis methods mean that available data are often heterogeneous in quality. To guide forthcoming genome generation efforts and promote efficient prioritization of resources, it is thus essential to define and monitor taxonomic coverage and quality of the data.
Here we present an automated analysis workflow that surveys genome assemblies from the United States NCBI, assesses their completeness using the relevant BUSCO datasets, and collates the results into an interactively browsable resource. We apply our workflow to produce a community resource of available assemblies from the phylum Arthropoda, the Arthropoda Assembly Assessment Catalogue. Using this resource, we survey current taxonomic coverage and assembly quality at the NCBI, examine how key assembly metrics relate to gene content completeness, and compare results from using different BUSCO lineage datasets.
These results demonstrate how the workflow can be used to build a community resource that enables large-scale assessments to survey species coverage and data quality of available genome assemblies, and to guide prioritizations for ongoing and future sampling, sequencing, and genome generation initiatives.
Mots-clé
Base Sequence, Chromosome Mapping, Ecosystem, Genome, Genomics/methods, High-Throughput Nucleotide Sequencing, Sequence Analysis, DNA, BUSCO assessments, arthropod genomes, biodiversity genomics, genome assembly, genome quality database, reproducible workflow
Pubmed
Web of science
Open Access
Oui
Financement(s)
Fonds national suisse / Carrières / PP00P3_170664
Fonds national suisse / Carrières / PP00P3_202669
Fondation Novartis / 18B116
Création de la notice
26/02/2022 16:02
Dernière modification de la notice
21/03/2023 6:47
Données d'usage