Curation of COPD Phenotypes: Using Argo

This document is part of Curation of COPD Phenotypes.

Introduction

The curation activities will be carried out using the Web-based workbench Argo. Developed as a generic text mining (TM) framework, Argo was not tailored for any specific application or use case. It caters to three user types: (1) technical users who develop customised text mining solutions in the form of automatic workflows; (2) information content providers who wish to exploit text mining techniques in order to serve up semantically enriched data; and (3) domain experts who help build knowledge resources by manually (or semi-automatically) curating information.

As solution providers for the task at hand, NaCTeM has built text mining workflows specifically for the semi-automatic annotation of COPD phenotypes. PhenomeNet, a knowledge base that consolidates phenotypic information covering various model organisms, is an example of an information content provider whose services could benefit from the results of text mining. This exercise for BioCreative V’s User Interactive Task, however, is focussed on the use of Argo by domain experts for biocuration. Thus, only instructions relevant to such users are included below.

Getting started

We provide below a brief description of how one gains access to Argo’s functionalities, and an overview of its main interface.

Registration and signing in

For this exercise, a user is required to have an account in Argo. Although a Guest account is readily provided, it is very much restricted and data created using it will be stored only until the end of a visit. Thus, it will not be sufficient for the multi-stage COPD phenotypes curation task.

  1. Start the workbench by clicking on the “Launch” button on the Argo landing page.
  2. On the new page that appears, click on “Guest” at the upper right-hand corner and choose “Register”.
  3. Supply the required details, making sure that the email address specified is a valid one.
  4. Verify your email address using the link provided in the verification message which you should receive immediately.

    Tip: Please check your email spam/junk folders if you do not receive the verification email within minutes.

  5. Once your email address has been verified, you should be able to sign in to Argo.

The main interface

Argo’s main interface consists of three panels. Please do take note of them, although for the curation task at hand, you will be mostly working only with the Documents panel.

Working with documents

Argo’s Documents panel (or file browser if you like) has features typical of any file manager. It can hold files and directories alike, with buttons for the following: displaying an updated listing of a directory’s content (“Refresh”), creating new directories (“Create”), uploading documents from the user’s own machine (“Upload”), downloading documents onto your local machine (“Download”), deleting unwanted documents (“Delete”), sharing documents with other Argo users (“Share”), and manipulating annotations contained in certain documents (“Edit”).

We recommend that you try out the exercises below to familiarise yourself with the file browser’s features.

Navigation

By default, each Argo user is provided with a “My Documents” directory. Under this, you should find a “COPD-practice” subdirectory that we have pre-uploaded for you, containing several files with the *.xmi extension.

  • File selection: Clicking once on a file or directory name selects it. Multiple selection is also supported: if you select a file, hold down the Shift button on your keyboard and click another file down the list, all files in between will also be selected. If you select a file and hold down the Ctrl button while clicking on other files, only the particular files that you clicked on will be selected.
  • Pagination: The file browser can display at most 20 files/folders at a time. The “COPD-practice” folder contains a hundred files. Observe how Argo splits the full file listing into several pages, which you can navigate through by using the left/right arrow buttons at the bottom.

Common operations

  • Folder creation: Select your “My Documents” folder and create your own subdirectory under that by clicking on the “Create” button. Name the new folder as you like.
  • File upload: Upload any document, regardless of file type or extension, from your local machine onto the new folder. To do this, you first have to select the newly created folder before clicking on the “Upload” button. Note that currently, only one file at a time can be uploaded.
  • Refresh listing: Click on the “Refresh” button if at any point it seems that the file listing is not displaying the files you have just uploaded.
  • File download: Choose any file and click on the “Download” button. This should save a copy of that file onto your local machine. As with file uploading, the system currently supports downloading of only one file at a time.
  • Deletion: Try deleting the file(s) you uploaded. Note that for this operation, you can delete several files at a time through multiple file selection, as described previously.

Initiating annotation tasks

The “Edit” button is provided specifically for opening files containing annotations. While the file browser can hold any type of file, this button works only with files in the XML Metadata Interchange (XMI) format, the encoding that is natively used by Argo. Files in your “COPD-practice” folder are examples of such files.

  • Launching the Editor: Select any number of XMI files under your “COPD-practice” folder and click on the “Edit” button. This will open a new browser window showing the interface of the Manual Annotation Editor (or simply the Editor), Argo’s curation tool.

    Tip: Please ensure your browser allows for opening pop-up windows.

  • Newly spawned processes: Congratulations! You have just started a new annotation task. Every click on the “Edit” button is considered by Argo as a new annotation task (or process). Access your other browser window that has the main Argo interface and open your Processes panel. Observe that a new process has been added (with progress at 66%); this was automatically added by Argo when you clicked on the “Edit” button to work on annotations. We shall revisit the correspondence between the Editor and processes in the succeeding section.

Launching the Editor in the manner described above, i.e., through file selection within your Documents panel, works only with XMI files. However, this does not mean that Argo can handle only this format. In fact, Argo supports many other formats (e.g., plain text, tab-separated values, BioC), but annotating files in these formats would require designing some workflows, which is outside the scope of this tutorial.

The Manual Annotation Editor

The Editor is Argo’s annotation tool, which you can access in the browser window that would have been opened upon following the steps in the previous exercise. It has five user-interactive areas.

Screencast: Quick start

We have prepared a screencast that demonstrates the Editor’s most essential annotation functionalities. These are all, however, presented in more detail in the written instructions below.

Filtering

  1. Select any document from the Editor’s data set navigator panel.
  2. Observe the different annotation types (and the corresponding colours) shown in the label selector panel. Try unticking any of the boxes and notice how the document viewer is accordingly updated. Upon unticking a certain label’s box, annotations of the same colour should disappear; they should however re-appear upon ticking the box again.

Selecting and inspecting

  1. Click on the annotations tree tab of the right-hand side panel.
  2. In the document viewer, click once on any of the highlighted text spans to select an annotation. Notice how a flashing yellow border appears around the currently selected annotation.
  3. Also observe how the the annotations tree switches its focus to the entry that corresponds to the currently selected annotation. This entry is similarly displayed in yellow.
  4. Expand the specific tree in focus. This should show the features of the currently selected annotation, including its character offsets (specific location within the document in terms of number of characters).
  5. Select a different annotation but this time using the annotation tree (rather than the document viewer). You will notice that the document viewer will switch its focus to the corresponding text span annotation, shown with a flashing yellow border.

Either of the document viewer and the annotation tree can be used to select annotations, as there is a correspondence between them anyway.

Manipulating

The following operations can be done on any selected annotation.

  1. Duplicating: If you find a text span annotation that you think is correct (in terms of both boundaries and type/label), and you think all instances of this text span should be similarly annotated, click on the “Annotate similar” button to automatically generate duplicate annotations.
  2. Deleting: If an annotation is completely wrong, you can delete it by clicking on the “Delete” button.
  3. Moving: If you wish to adjust the boundaries of a text span annotation, e.g., in cases where tokens were being incorrectly included or missed, click on the “Move” button and then highlight (by dragging your mouse over) the correct text span.
  4. Relabelling: If you think an annotation’s type/label was incorrectly assigned (e.g., a drug name was labelled as a gene), click on the “Change label” button. This will bring up a pop-up window with a list of labels to choose from. This list is initially long as it includes all annotation types defined in Argo. However, by typing “uk.ac.nactem.uima.phenotypes” in the text field at the top of the window, the choices will be narrowed down to only the types of interest to our curation task.

Creating

Argo supports the creation of both text span and complex annotations.

Text span annotations

Perhaps the simplest form of textually grounded annotations, text span annotations require only two pieces of information: their locations within text (often in terms of character offsets) and their semantic label.

  1. Highlight the span of text that you wish to annotate using a click-drag-release motion with your mouse.

    Tip: If the text span that you wish to annotate consists of only a single word, you can simply double-click anywhere within the word to highlight it.

  2. A pop-up window will be displayed, asking you to choose which annotation type/label should be applied to the text span. Just like in the case of relabelling (described above), the initial list will be long but you can easily restrict the choices by typing “uk.ac.nactem.uima.phenotypes” on the text field above. This will display only labels with that prefix. Note that you do not have have type this prefix each time you create a new annotation as the Editor keeps it, at least until you decide to close the Editor window.
  3. Click on the label that you wish to assign to the text span.

Complex annotations (Relations)

Some annotations are structured and build upon other annotations, therefore more complex. One example of such is an annotation capturing the relationship between, for example, COPD and the drug theophylline. To simplify the task, we shall assume that all text span annotations that could be possibly included in complex annotations have been pre-annotated. That is, all basic COPD-relevant concepts (e.g., medical conditions, signs or symptoms, drugs, genes/proteins) will have been already captured using text span annotations, before the annotation of COPD relations.

  1. Click on the “Create” button above the Editor’s document viewer. This will prompt you to select an annotation type. This time, select Relation (uk.ac.nactem.uima.phenotypes.Relation). Observe how a new Relation entry was added to the annotations tree.
  2. Expand the new Relation entry. You will find that it has two (currently null-valued) features called mention1 and mention2.
  3. Click once on mention1, and notice how a new “Add” button has appeared above the document viewer. Clicking on this button brings up a small pop-up window prompting you to choose between creating a new feature structure or using an existing one. Select the latter option (“Use existing feature structure”).
  4. Another pop-up window will appear, this time displaying the existing annotations you can choose from. Select your desired text span annotation and then click on the “OK” button. Take note of how the previously null-valued mention1 feature now points to the text span annotation that you just selected.
  5. Repeat steps 3 and 4 to fill in the value of mention2.

    Tip: If you wish to change the annotation assigned to a complex annotation’s feature (e.g., you think that an already-filled mention1 slot should point to some other text span annotation), do the following: (1) click on the feature’s name, e.g., mention1, and then click on the “Clear” button that you should find above the document viewer, to dissociate the unwanted text span from the relation annotation; (2) do the replacement by selecting the presumably more correct text span in the same manner as described above.

Ontology-linking

The Editor allows you to link annotations to external resources such as ontologies, by means of semi-automatic identifier assignment. For the task of curating COPD phenotypes, this feature is currently available for annotations corresponding to names of drugs (ChEBI), proteins (UniProt), medical conditions (UMLS) and signs or symptoms (UMLS). Note that for UMLS, you will be asked to log in, which means you should have already registered for a licence before this exercise. It’s easy to apply for one.

    1. Select an annotation that you would like to link to an ontology (or whose pre-assigned identifier you would like to change). Locate it within the annotation tree. Note that the IDs automatically assigned by text mining will have the concept’s preferred name to give you an indication of whether the correct ID has been linked.
    2. Expand the annotation tree in focus. You should find an “id” field with a “Choose” button to its right.
    3. Click on the “Choose” button. A pop-up window should appear, displaying a ranked list of matches for the currently selected annotation. Clicking on any item in this list displays the external resource’s stored information about it, helping the curator to disambiguate amongst multiple choices.
    4. Once you’ve found the (matching) item whose identifier you would like to assign to the annotation, click on the “OK” button to close the pop-up window. The value of the “id” field of the annotation should now be updated.

      Tip 1: If none of the suggested choices corresponds to the correct ID, please feel free to use the relevant resource’s own search functionality, either within the same pop-up window in Argo, or by opening UMLS/ChEBI/UniProt in a new browser. You should be able to paste in a value in the “id” field.

      Tip 2: If you want to assign more than one ID to a text span annotation, you will have to do it manually as the “Choose” button allows you to assign only one ID per text span. To do so, use the relevant resource’s own search functionality to find the IDs you want to assign, and concatenate the multiple IDs together, delimited by semi-colons, e.g., “C00001:name1:name1;C00002;C00003″. Paste this value into the “id” field.

    Writing changes to disk

    The interface does not have any “Save” button as all changes you make are automatically saved in-memory, although not yet written to disk. Recall that launching the Editor makes Argo spawn a new process in the Processes panel, and that its progress will appear as 66%.

      1. If at any point during your annotations, you feel ready to write your annotations to disk, click on the “Finish Editing” button at the upper right-hand corner of the Editor.
      2. You will be asked to confirm this action. Upon doing so, the browser window will close.
      3. Revisit your Processes panel and observe how the progress of the corresponding process now incrementally increases to 100%, followed by the status eventually changing to “Finished”. When this happens, it means that the writing of your annotations to disk has been completed.

        Tip: If you decided to click on “Finish Editing” without having gone through all of the documents in your selected data set (e.g., you had to stop halfway through a document but wanted to make sure your changes so far have been written to disk), you will still be able to pick up from where you left off by opening the remaining documents for annotation from the Documents panel in the same manner as before, i.e., using the “Edit” button.

    Pausing and resuming

    In case you’re not yet ready to write your annotations to disk, but for whatever reason the annotation task was paused (e.g., your internet connection was lost, you closed your browser, or you simply wanted to take a break), Argo allows you to resume what you were doing.

          1. Unless you clicked on the “Finish Editing” button (as described above), deleted processes yourself, or the Argo server crashed, the process corresponding to your annotation task will continue running and should appear in your Processes panel. Access the main Argo interface and open your Processes panel. Click on the process of interest to select it.
          2. On the right-hand side panel, you should find a “Launch Manual Annotation Editor” button. Clicking on this button will open the Editor in a new browser window. You should be able to verify that the annotations you have previously done are still there.

            Tip: As Argo does not know exactly where you left off, we recommend that you take note of the document that you last looked at before closing the Editor window.