Foundational reviews
- Osterman A., Overbeek R. 2003. Missing genes in metabolic pathways: a comparative genomics approach. Curr Opin Chem Biol 7: 238-251.
- Galperin M. Y., Koonin E. V. 2010. From complete genome sequence to 'complete' understanding? Trends Biotechnol 28: 398-406.
- Sorokina M., Stam M., Medigue C., Lespinet O., Vallenet D. 2014. Profiling the orphan enzymes. Biol Direct 9: 10.
- Danchin A, Ouzounis C, Tokuyasu T, Zucker JD. No wisdom in the crowd: genome annotation in the era of big data - current status and future prospects. Microb Biotechnol. 2018;11(4):588-605. doi:10.1111/1751-7915.13284
Annotation pipelines: issues and solutions
- Eberhardt RY, Haft DH, Punta M, Martin M, O'Donovan C, Bateman A. AntiFam: a tool to help identify spurious ORFs in protein annotation. Database (Oxford). 2012 Mar 20;2012:bas003. doi: 10.1093/database/bas003. PMID: 22434837
- Klimke W., O'Donovan C., White O., Brister J. R., Clark K., Fedorov B., Mizrachi I., Pruitt K. D., Tatusova T. 2011. Solving the Problem: Genome Annotation Standards before the Data Deluge. Stand Genomic Sci 5: 168-193.
- Richardson E. J., Watson M. 2013. The automatic annotation of bacterial genomes. Brief Bioinform14: 1-12.
- Overbeek R., Begley T., Butler R. M., Choudhuri J. V., Chuang H. Y., Cohoon M., de Crécy-Lagard V., Diaz N., Disz T.& other authors. 2005. The subsystems approach to genome annotation and its use in the project to annotate 1000 genomes. Nucleic Acids Res 33: 5691-5702.
- Médigue C, Calteau A, Cruveiller S, Gachet M, Gautreau G, Josso A, Lajus A, Langlois J, Pereira H, Planel R, Roche D, Rollin J, Rouy Z, Vallenet D. MicroScope-an integrated resource for community expertise of gene functions and comparative analysis of microbial genomic and metabolic data. Brief Bioinform. 2019 Jul 19;20(4):1071-1084. doi: 10.1093/bib/bbx113. PMID: 28968784.
- Yu T, Cui H, Li JC, Luo Y, Jiang G, Zhao H. Enzyme function prediction using contrastive learning. Science. 2023 Mar 31;379(6639):1358-1363. doi: 10.1126/science.adf2465. Epub 2023 Mar 30. PMID: 36996195.
Errors in annotations
- Schnoes A. M., Brown S. D., Dodevski I., Babbitt P. C. 2009. Annotation error in public databases: misannotation of molecular function in enzyme superfamilies. PLoS Comput Biol 5: e1000605.
- Percudani R., Carnevali D., Puggioni V. 2013. Ureidoglycolate hydrolase, amidohydrolase, lyase: how errors in biological databases are incorporated in scientific papers and vice versa. Database 2013
- de Crécy-Lagard V. 2014. Variations in metabolic pathways create challenges for automated metabolic reconstructions: Examples from the tetrahydrofolate synthesis pathway. Comput Struct Biotechnol J 10: 41-50.
- Zallot R, Harrison KJ, Kolaczkowski B, de Crécy-Lagard V. Functional Annotations of Paralogs: A Blessing and a Curse. Life (Basel). 2016 Sep 8;6(3). pii: E39. doi: 10.3390/life6030039. PMID: 27618105 (Links to an external site.).
- Griesemer M, Kimbrel JA, Zhou CE, Navid A, D'haeseleer P. Combining multiple functional annotation tools increases coverage of metabolic annotation. BMC Genomics. 2018 Dec 19;19(1):948. doi: 10.1186/s12864-018-5221-9. PMID: 30567498
- Rembeza E, and Engqvist M. K. M. Experimental investigation of enzyme functional annotations reveals extensive annotation error bioRxiv,doi: https://doi.org/10.1101/2020.12.18.423474 p. 2020.12.18.423474, Jan. 2020.
- Lobb B, Tremblay BJ, Moreno-Hagelsieb G, Doxey AC. An assessment of genome annotation coverage across the bacterial tree of life. Microb Genom. 2020 Mar;6(3):e000341. doi: 10.1099/mgen.0.000341. PMID: 32124724.
- Rembeza E, Engqvist M. K. M. Experimental and computational Investigation of enzyme functional annotations uncovers misannotation in the EC 1.1.3.15 enzyme class. PLoS Comput Biol 2021, 17 (9), e1009446. https://doi.org/10.1371/journal.pcbi.1009446 PMID: 34555022.
Protein Families of unknown Function
- Punta M, Coggill PC, Eberhardt RY, Mistry J, Tate J, Boursnell C, Pang N, Forslund K, Ceric G, Clements J, Heger A, Holm L, Sonnhammer EL, Eddy SR, Bateman A, Finn RD. The Pfam protein families database. Nucleic Acids Res. 2012 Jan;40(Database issue):D290-301. doi: 10.1093/nar/gkr1065. PMID: 22127870
- Goodacre NF, Gerloff DL, Uetz P. Protein domains of unknown function are essential in bacteria. MBio. 2013 Dec 31;5(1):e00744-13. doi: 10.1128/mBio.00744-13.PMID: 24381303. (Links to an external site.)
- Ellens KW, Christian N, Singh C, Satagopam VP, May P, Linster CL. Confronting the catalytic dark matter encoded by sequenced genomes. Nucleic Acids Res. 2017 Nov 16;45(20):11495-11514. doi: 10.1093/nar/gkx937. PMID: 29059321 (Links to an external site.).
- Ghatak S, King ZA, Sastry A, Palsson BO. The y-ome defines the 35% of Escherichia coli genes that lack experimental evidence of function. Nucleic Acids Res. 2019 Mar 18;47(5):2446-2454. doi: 10.1093/nar/gkz030. PMID: 30698741
- Vanni C, Schechter MS, Acinas SG, Barberán A, Buttigieg PL, Casamayor EO, Delmont TO, Duarte CM, Eren AM, Finn RD, Kottmann R, Mitchell A, Sánchez P, Siren K, Steinegger M, Gloeckner FO, Fernàndez-Guerra A. Unifying the known and unknown microbial coding sequence space. Elife. 2022 Mar 31;11:e67667. doi: 10.7554/eLife.67667. PMID: 35356891.
- Rocha JJ, Jayaram SA, Stevens TJ, Muschalik N, Shah RD, Emran S, Robles C, Freeman M, Munro S. Functional unknomics: Systematic screening of conserved genes of unknown function. PLoS Biol. 2023 Aug 8;21(8):e3002222. doi: 10.1371/journal.pbio.3002222. PMID: 37552676;
BV-BRC Tutorials