Converting types

Interoperability problem

The interoperability of components in Argo is ensured by using common annotation types. For instance, a named entity annotator component may require Sentence annotations as input which can be delivered by a sentence detector component. The two component are interoperable because they have a common definition of Sentence, that is the Sentence type comes from the same type system. However, it may sometimes be necessary to combine components that produce/expect annotation types coming from different type systems. To continue with the example, the sentence detector may produce annotations of the type org.example.Sentence, whereas the named entity annotator may expect annotations of the type com.example.Sequence. In this situation the two components are not interoperable.

Argo features two components that are capable of transcribing/mapping from one type to another that can be placed between non-interoperable components. They are Type Mapper and SPARQL Annotation Editor. The Type Mapper is very simple to use, but it has its limitations. The SPARQL Annotation Editor, on the other hand, allows for constructing very expressive mappings, but requires familiarity with SPARQL, a graph query language.

Type Mapper

The Type Mapper component takes, as parameter, a user-defined mapping. The mapping definition has a very simple syntax which in its simplest case looks like the following:

com.example.Person -> org.example.NamedEntity

In the above example, for each annotation of the type com.example.Person the component will create a corresponding annotation of the type org.example.NamedEntity.

The transcription automatically involves all common features of the two types. For instance, if both com.example.Person and org.example.NamedEntity extend uima.tcas.Annotation, the features begin, end, and sofa (refer to the feature list of uima.tcas.Annotation for explanation) will be copied between annotations.

It is possible to include a condition in a mapping definition. For example, the following mapping will transcribe only those com.example.Person annotations that start after the 100th character:

com.example.Person where begin > 100 -> org.example.NamedEntity

More advanced functionality of the Type Mapper includes feature paths and the custom transcription of individual features.

# Comments are preceded with the "#" character.
# 1. Transcribe Persons beginning after character 100.
# 2. Fill in the category feature of NamedEntities with the string "Person".
# 3. Fill in the metaData/confidence feature path that begins in the NamedEntity
#    with the value of the confidence feature of Person.
com.example.Person where begin > 100 -> org.example.NamedEntity,
 "Person" -> category,
 confidence -> metaData/confidence;

Statements inside a single mapping definition are separated with a comma, whereas multiple mapping definitions are separated with a semicolon.

SPARQL Annotation Editor

Coming soon…

In the meantime check out the following paper:

Rak, R. and Ananiadou, S. (2013). Making UIMA Truly Interoperable with SPARQL. In: Proceedings of the 7th Linguistic Annotation Workshop & Interoperability with Discourse