GENCODE: producing a reference annotation for ENCODE.

Détails

Ressource 1Télécharger: BIB_F594806247A7.P001.pdf (579.43 [Ko])
Etat: Public
Version: de l'auteur⸱e
ID Serval
serval:BIB_F594806247A7
Type
Article: article d'un périodique ou d'un magazine.
Collection
Publications
Institution
Titre
GENCODE: producing a reference annotation for ENCODE.
Périodique
Genome Biology
Auteur⸱e⸱s
Harrow J., Denoeud F., Frankish A., Reymond A., Chen C.K., Chrast J., Lagarde J., Gilbert J.G., Storey R., Swarbreck D., Rossier C., Ucla C., Hubbard T., Antonarakis S.E., Guigo R.
ISSN
1465-6914[electronic], 1465-6906[linking]
Statut éditorial
Publié
Date de publication
2006
Peer-reviewed
Oui
Volume
7 Suppl 1
Pages
S4.1-S4.9
Langue
anglais
Notes
Publication types: Evaluation Studies ; Journal Article ; Research Support, N.I.H., Extramural ; Research Support, Non-U.S. Gov't
Publication Status: ppublish
Résumé
BACKGROUND: The GENCODE consortium was formed to identify and map all protein-coding genes within the ENCODE regions. This was achieved by a combination of initial manual annotation by the HAVANA team, experimental validation by the GENCODE consortium and a refinement of the annotation based on these experimental results. RESULTS: The GENCODE gene features are divided into eight different categories of which only the first two (known and novel coding sequence) are confidently predicted to be protein-coding genes. 5' rapid amplification of cDNA ends (RACE) and RT-PCR were used to experimentally verify the initial annotation. Of the 420 coding loci tested, 229 RACE products have been sequenced. They supported 5' extensions of 30 loci and new splice variants in 50 loci. In addition, 46 loci without evidence for a coding sequence were validated, consisting of 31 novel and 15 putative transcripts. We assessed the comprehensiveness of the GENCODE annotation by attempting to validate all the predicted exon boundaries outside the GENCODE annotation. Out of 1,215 tested in a subset of the ENCODE regions, 14 novel exon pairs were validated, only two of them in intergenic regions. CONCLUSION: In total, 487 loci, of which 434 are coding, have been annotated as part of the GENCODE reference set available from the UCSC browser. Comparison of GENCODE annotation with RefSeq and ENSEMBL show only 40% of GENCODE exons are contained within the two sets, which is a reflection of the high number of alternative splice forms with unique exons annotated. Over 50% of coding loci have been experimentally verified by 5' RACE for EGASP and the GENCODE collaboration is continuing to refine its annotation of 1% human genome with the aid of experimental validation.
Mots-clé
Chromosome Mapping, Computational Biology/methods, Computational Biology/standards, Expressed Sequence Tags, Genes, Genome, Human, Genomics/methods, Genomics/standards, Humans, Proteins/genetics, Pseudogenes, RNA, Messenger/analysis, Reference Standards, Sequence Analysis, DNA, Sequence Analysis, RNA
Pubmed
Web of science
Open Access
Oui
Création de la notice
24/01/2008 16:52
Dernière modification de la notice
20/08/2019 17:22
Données d'usage