Annotating Metabolic Processes

This document provides instructions and guidelines for an annotation task prepared specifically by the National Centre for Text Mining (NaCTeM) for the BioCreative IV’s User Interactive Task. The task involves the annotation of elements of metabolic processes described in text, namely, participating entities and action terms signifying the processes.

This tutorial includes everything that is necessary to complete this or similar tasks in Argo.


Metabolic reactions or processes are the building blocks of metabolic pathways, which have received little attention from the biomedical NLP community compared to signalling pathways. Whilst the latter are centred on protein-protein or ligand-receptor interactions, metabolic pathways primarily consist of a series of biochemical reactions. For this task, the curators will be asked to annotate named entities and action terms relevant to metabolic processes.
We define metabolic process taking the definition from the interaction types ontology in the Comparative Toxicogenomic Database (CTD), i.e., “the biochemical alteration of a molecule’s structure, excluding changes in expression, stability, folding, localization, splicing and transport”.

The task

The task involves the annotation of chemical compounds (CCs), genes or gene products (GGPs) and expressions signifying a metabolic process (triggers). Both CCs and GGPs may play the role of reactant (entity undergoing the alteration), product (entity into which the reactant is changed) or modifier (entity driving the alteration) in a metabolic process. Each metabolic process is signified by an action term which is a span of text that best expresses the process in text. It can be a verb, verb nominalisation, adjective or adverb, e.g., phosphorylation, generates, acetylated. Additionally, a unique identifier from external resources will be assigned to each of the above-mentioned annotations. The resources are ChEBI for CCs, UniProt for GGPs and CTD for action words.