Author Archives: Rafal Rak

Argo at OpenAIRE-COAR Conference

Sophia Ananiadou, director of NaCTeM, will give a talk about Argo at the OpenAIRE-COAR conference, to be held from May 21- 23, 2014 in Athens, Greece, at the Acropolis Museum.

The conference will explore how open access infrastructures are being practically implemented around the world and will consider how repositories intersect with other library and research services, together with services that enhance them, such as text-mining.

OpenAIRE is a complex scholarly communication infrastructure developed to support open access, and measure funding impact in Europe. It is funded by the European Commission and involves 33 European Countries.

Argo at BioCreative IV

Argo will be demonstrated during several sessions at the BioCreative IV workshop held in Washington, DC, between 7 and 9 October, 2013. The demonstrations will involve workflows supporting BioC format, curation capabilities, and the recognition of chemicals and bioprocesses.The relevant reports can be found in the workshop proceedings:

  • Rafal Rak, Riza Batista-Navarro, Andrew Rowley, Makoto Miwa, Jacob Carter and Sophia Ananiadou, NaCTeM’s BioC Modules and Resources for BioCreative IV
  • Rafal Rak, Riza Batista-Navarro, Andrew Rowley, Jacob Carter and Sophia Ananiadou, Customisable Curation Workflows in Argo
  • Riza Theresa Batista-Navarro, Rafal Rak and Sophia Ananiadou, Chemistry-specific Features and Heuristics for Developing a CRF-based Chemical Named Entity Recogniser

Argo in BioCreative IV User Interactive Task

Argo is participating in a biocuration process as part of the BioCreative IV User Interactive Task. The BioCreative organisers aim to promote Web-based, text mining platforms to support research community in performing resource curation activities. Argo is one of nine systems that take part in the challenge this year.

The task, set up by NaCTeM and approved by the BioCreative organisers, involves the annotation of metabolic processes. Several biocurators will quantitatively and qualitatively evaluate Argo by setting up workflows and performing manual annotation of PubMed abstracts supported by automatic processing available in Argo. The feedback received from the curators will guide the future development of Argo to better accommodate manual curation requirements.

We have prepared a tutorial with detailed information, annotation guidelines and instructions for using Argo in this task.

New Features in Manual Annotation Editor

The built-in Manual Annotation Editor component has undergone a series of changes. The span-of-text annotation editor is now faster and more flexible. We have also improved the navigability and extended a range of actions that can be performed on annotations.

The list of features include:

  • Fast rendering of span-of-text annotations regardless of the size of documents.
  • Manually creating span-of-text annotations and adjusting their boundaries.
  • Colour coding of span-of-text annotations.
  • Full support for nested and intersecting span-of-text annotations.
  • Creating complex annotations that consist of attributes (of primitives types such as integers, strings, boolean, etc.) and references to other annotations.
  • Structural consistency of annotations is ensured by type systems (annotation schemata).
  • Easy navigation between annotated documents.
  • Annotation can be paused and resumed at a later visit.

Creating Annotation with SPARQL

One of the biggest challenges in developing UIMA workflows is the incompatibility of components that support different type systems, and yet, could exchange conceptually similar annotation structures. For instance, the output Sentence type of a sentence detector may be incompatible with the input Sentence type of a named entity recogniser only because the two seemingly the same types were defined in two different type systems. A less trivial source of incompatibility is when two conceptually equivalent types are structurally different, for instance, coreference phenomenon can be encoded as a chain (a linked list) or as an array.

We have developed SPARQL Annotation Editor, a processing component that allows a developer to manipulate annotations (and thus convert types) by using SPARQL queries. Using a widely adopted query language makes this solution more approachable and encourages ad-hoc conversions that would otherwise have to be done programmatically.

Type system alignment using SPARQL will be presented at the 7th Linguistic Annotation Workshop & Interoperability with Discourse that takes place in Sofia, Bulgaria, on 8 August. An online tutorial will follow shortly.

The details are covered in the following paper which will appear in the workshop proceedings:

Rak, R. and Ananiadou, S. (To appear). Making UIMA Truly Interoperable with SPARQL. In: Proceedings of the 7th Linguistic Annotation Workshop & Interoperability with Discourse

Argo at ACL 2013

We are pleased to announce that Argo will be demonstrated at the ACL 2013 conference to be held in Sofia, Bulgaria, in August. The demonstration is scheduled for the 5th of August, 19:45-21:00. Please consult the daily program for details.

The details of the demonstration will appear in the conference proceedings:

Rak, R., Rowley, A., Carter, J. and Ananiadou, S. (To appear). Development and Analysis of NLP Pipelines in Argo. In: Proceedings of the System Demonstration Session at The 51st Annual Meeting of the Association for Computational Linguistics, Association for Computational Linguistics