Argo supports BioC format by introducing the BioC type system as well as two processing components, BioC Reader and BioC Writer. The components are capable of (de)serialising BioC collections from/to the BioC Type System.
About BioC
The BioC format is encoded in XML and consists of a collection of documents, each split into passages and optionally sentences. These elements may contain stand-off annotations with optional text-bound locations as well as n-ary relations between annotations and other relations. Virtually all elements may declare a list of key-value pairs for storing arbitrary data.
The format is actively promoted by the BioCreative Interoperability Initiative whose aim is to enhance the reusability of tools and resources.
Resources
The following files are BioC-encoded corpora used in the BioNLP Shared Task series.
Corpus | Training set | Development set | Entities | Events | Equivalent entities | Modifications | Coreferences |
GE’11 | 908 | 259 | Yes | Yes | Yes | Yes | No |
EPI | 600 | 200 | Yes | Yes | Yes | Yes | No |
ID | 152 | 46 | Yes | Yes | Yes | Yes | No |
GE’13 | 222 | 249 | Yes | Yes | Yes | Yes | Yes |
CG | 300 | 100 | Yes | Yes | Yes | Yes | No |
PC | 260 | 90 | Yes | Yes | Yes | Yes | No |
We also provide the BioC-encoded version of NaCTeM’s Metabolites corpus.
Workflows
Two of our BioC-compliant modules are realised as workflows in Argo. They are named BioC Event Extraction and BioC Metabolic processes. Each of them includes the BioC Reader and Writer components that allow users to upload their BioC files for processing as well as retrieve the results in the same format.
Before running the workflows, please consult the tutorials page on how to perform the following in Argo:
- set up component parameters,
- upload and download documents,
- run workflows, and
- track the progress of processing workflows.
Follow the steps below for running either of the workflows.
Step 1. Sign in
If you have not done so yet, create an account in Argo and sign in to it. Although it is possible to use Argo without registering, any user-created data (workflows and documents) will be automatically deleted at the end of a visit.
Tip: Registering an account in Argo requires a valid email address for verification that is sent immediately after creating an account. Please check your email spam/junk folders if you do not receive the verification email within minutes.
Step 2. Upload BioC documents
Upload the BioC XML files that you wish to be processed by the workflow.
Step 3. Create copies of the public workflow
Each of the workflows is publicly available for reading only. In order to be able to change their settings (for example, to specify an input BioC file) you have to make a copy of the public workflow. Edit the workflow and save its copy. It is advisable to name the copy with something distinguishable.
Step 4. Change the workflow’s settings
Configure the BioC Reader and BioC Writer components of the workflow by specifying the input and output files, respectively. If it is the BioC Event Extraction workflow that you are running, please configure the EventMine component too, by choosing the appropriate task-specific model (e.g., Cancer Genetics 2013).
Step 5. Run the workflow
Run the workflow. On large BioC collections, it might take a while for a process to be completed. Once a process’ status is shown as Finished, the output BioC file should be ready for download.
Web services
We also developed BioC-compliant web services for recognising concepts in the Comparative Toxicogenomics Database (CTD). They are accessible at the following locations:
- Chemicals: http://nactem.ac.uk/CTDWebService/ctd/chem
- Genes: http://nactem.ac.uk/CTDWebService/ctd/gene
- Diseases: http://nactem.ac.uk/CTDWebService/ctd/disease
- Action terms: http://nactem.ac.uk/CTDWebService/ctd/action_term
They can be tested using the facility provided by organisers of the BioCreative IV CTD track.