Developing Components

Introduction

This guide describes how to create Argo components using Maven archetypes, and how to package these components for installation into the Argo platform.

Argo components are simply Apache UIMA components, however certain rules (specified within this document) must be adhered to for them to be fully-compliant with the Argo platform.

The projects produced by the Maven archetypes make use of the Apache uimaFIT library which, amongst other things, allows component metadata to be included with a component’s source code and then extracted into an automatically generated UIMA XML metadata descriptor file.

Creating a new project

Creating a new project using Eclipse

  1. From the top menu navigate to FileNewOther… and a new dialog window will appear.
    New Project

  2. Open up the Maven item in the list, select Maven Project and then press Next.
    Select a Wizard
     
  3. On the next page, simply press Next.
    New Maven Project

  1. From the list of archetypes, select either argo-reader-archetype (for creating an Argo Reader) or argo-analysis-engine-archetype (for creating an Argo Analysis Engine).  If these archetypes do not appear in the list, then they first must be added (see below).

    4a - Select AE archetype

    To add the Maven archetypes to Eclipse:

    • Press the Add Archetype.. Button.

    • To add the Argo Reader archetype enter the following values and press OK:

Archetype Group Id

uk.ac.nactem.argo

Archetype Artifact Id

argo-reader-archetype

Archetype Version

1.2

5b - Add Reader archetype

  • To add the Argo Analysis Engine archetype enter the following values and press OK:

Archetype Group Id

uk.ac.nactem.argo

Archetype Artifact Id

argo-analysis-engine-archetype

Archetype Version

1.2

Add AE archetype

  1. Once the Maven archetype has been selected, a number of property values are required before the sample project can be generated.

    Both of the Argo archetypes require 4 basic properties:

Group Id

The Maven group id of the new component
(e.g. uk.ac.nactem.argo.components)

Artifact Id

The Maven artifact id of the new component

(e.g. magic-analysis-engine, document-reader)

Version

The Maven version of the new component

(e.g. 0.0.1-SNAPSHOT, 1.0)

Package

The Java package to contain the new component’s main class

(e.g. uk.ac.nactem.argo.components.magic)

The Argo Reader archetype also requires the property:

readerClassName

The name of the new reader’s main Java class
(e.g. DocumentReader)

Complete Reader archetype values

The Argo Analysis archetype requires the property:

analysisEngineClassName

The name of the new analysis engine’s main Java class
(e.g. MagicAnalysisEngine)

Complete AE archetype values

  1. Press the Finish button and the new project, containing either a sample Argo reader or analysis engine, will be visible within the Eclipse workspace.

7b - Reader project structure

7a - AE project structure

Creating a new project from the command line

Apache Maven must be installed before projects can be created from the command line.  Please see the previous section ‘Creating a new project in Eclipse’ for an explanation of the archetype parameters (e.g. readerClassName) in the following commands.

  • To create a new project containing a sample Argo reader:

mvn archetype:generate \

-DarchetypeGroupId=uk.ac.nactem.argo \

-DarchetypeArtifactId=argo-reader-archetype \

-DarchetypeVersion=1.2 \

-DgroupId=<Group Id> \

-DartifactId=<Artifact Id> \

-Dversion=<Version> \

-Dpackage=<Package> \

-DreaderClassName=<Reader Class Name>

  • To create a new project containing a sample Argo analysis engine:

mvn archetype:generate \

-DarchetypeGroupId=uk.ac.nactem.argo \

-DarchetypeArtifactId=argo-analysis-engine-archetype \

-DarchetypeVersion=1.2 \

-DgroupId=<Group Id> \

-DartifactId=<Artifact Id> \

-Dversion=<Version> \

-Dpackage=<Package> \

-DanalysisEngineClassName=<Analysis Engine Class Name>

Using Type Systems within a component

When using a type system within either an Argo reader or analysis engine:

  • Add the type systems’s maven artifact to the main dependencies list within the component’s POM file.
    Type System dependencies

  • Add the type systems’s maven artifact to the dependencies list within the pear profile in the component’s POM file, but this time setting it’s scope to provided.  This is to prevent the type system being included in the component’s PEAR file; the type system will be independently installed into Argo from its own PEAR file.  If a type system is also included inside a component’s PEAR file then workflows containing this component will most likely fail during execution.
    Type Systems Pear Dependencies

  • It maybe necessary to add the path to the type system’s descriptor into the component’s META-INF/org.apache.uima.fit/types.txt file.  This is only required if the type system artifact doesn’t already include this file (The version of the U-Compare type system on GitHub does include it however, for demonstration purposes, the type system is redeclared within the components generated from the Maven archetype).  The types.txt file is used by the uimaFIT mechanism for automatically detecting type systems.

    Types.txt

Building PEAR files for Argo installation

Argo components (and type systems) are installed by Argo administrators from PEAR files.

To produce an Argo-compatible PEAR file, from a project generated using the provided Maven archetypes, requires running the Maven goal install using the supplied pear profile.

Building a PEAR file in Eclipse

An Eclipse launch configuration needs to be created, which will use maven to produce the PEAR file.  Creating the launch configuration only needs to be done once per project in an Eclipse workspace.

  • Right-click on the project in Package Explorer and navigate to Run AsMaven build…

    10 - navigate to launch

  • A dialog window will appear, entitled ‘Edit Configuration’.  In this window change the settings:

Name

A distinguishable identifier (e.g. document-reader [PEAR])

Goals

install

Profiles

pear

Maven build...

Building a PEAR file from the command line

  • Navigate to the project folder – this will contain the Maven POM file.

cd /path/to/project

  • Use Maven to produce the PEAR file.

mvn install -P pear

Using components within uimaFIT pipelines

Argo components created from the provided Maven archetypes and developed using the guidelines in this document should be fully compatible with uimaFIT pipelines, without any changes being required.

Renaming or moving the main component class

It is acceptable to rename or move the main class of a component to another Java package, after a project has been generated from one of the Maven archetypes, however a change is required in the Maven POM file.

The new fully qualified name (e.g. uk.ac.nactem.argo.components.ChangedName) of the main component Java class must be given as the value of the uima.component.class property inside of the pear profile within the Maven POM file.

11 - component class

Accessing the Argo file system from a component

At present, only components which are running on the same machine as the Argo server installation have access to files stores within Argo.

For distributed workflows, Argo will automatically run reader components on the Argo server (giving them access to the file system), but if a regular analysis engine requires file access, it must have its uima property of multipleDeploymentAllowed set to false.  This will ensure that the component is not distributed, and is executed upon the Argo server.

 So, to allow an analysis engine access to the Argo file system, the uimaFIT @OperationalProperties annotation must be added to the component’s class with its multipleDeploymentAllowed attribute set to false.

12 - consumer

Note about Argo consumers

Argo traditionally supports 3 types of components (readers, analysis engines and consumers) which are essentially the 3 types of components initially supported by the Apache UIMA framework.

Apache UIMA recommends that consumers no longer be developed; an analysis engine can perform exactly the same role.  Apache uimaFIT, a UIMA-based library used by the Maven archetypes for Argo, doesn’t support consumers at all – for example, they cannot be used within a uimaFit workflow.

The recommendation, when developing Argo components, is to follow the approach of UIMA and uimaFIT and consider consumers to be deprecated – this is why there is no Maven archetype to create consumer components.