The KEGG database contains three main components for genome/metagenome annotation:
• the collection of internally annotated gene catalogs for the complete genomes (called KEGG organisms) and additional protein sequences in the KEGG GENES database
• the knowledge base of high-level functions represented as the molecular interaction, reaction and relation networks in the KEGG PATHWAY, BRITE and MODULE databases, and
• the knowledge base of molecular-level functions associated with ortholog groups in the KO database, where most KO entries are defined in a context-dependent manner as nodes of the KEGG molecular networks.
In general, KO entries (identified by K numbers) also represent sequence similarity groups. Thus, the sequence similarity search of a query genome against KEGG GENES is a search for most appropriate K numbers, and the assigned set of K numbers can be used to reconstruct KEGG pathway maps, BRITE hierarchies and KEGG modules, enabling interpretation of high-level functions