Interface CollectionProcessingManager

  • All Known Implementing Classes:
    CPMImpl

    public interface CollectionProcessingManager
    A CollectionProcessingManager (CPM) manages the application of an AnalysisEngine to a collection of artifacts. For text analysis applications, this will be a collection of documents. The analysis results will then be delivered to one ore more CasConsumers.

    The CPM is configured with an Analysis Engine and CAS Consumers by calling its setAnalysisEngine(AnalysisEngine) and addCasConsumer(CasConsumer) methods. Collection processing is then initiated by calling the process(CollectionReader) or process(CollectionReader,int) methods.

    The process methods take a CollectionReader object as an argument. The Collection Reader retrieves each artifact from the collection as a CAS object.

    Listeners can register with the CPM by calling the addStatusCallbackListener(StatusCallbackListener) method. These listeners receive status callbacks during the processing. At any time, performance and progress reports are available from the getPerformanceReport() and getProgress() methods.

    A CPM implementation may choose to implement parallelization of the processing, but this is not a requirement of the architecture.

    Note that a CPM only supports processing one collection at a time. Attempting to reconfigure a CPM or start a new processing job while a previous processing job is occurring will result in a UIMA_IllegalStateException. Processing multiple collections simultaneously is done by instantiating and configuring multiple instances of the CPM.

    A CollectionProcessingManager instance can be obtained by calling UIMAFramework.newCollectionProcessingManager().

    • Method Detail

      • getAnalysisEngine

        AnalysisEngine getAnalysisEngine()
        Gets the AnalysisEngine that is assigned to this CPM.
        Returns:
        the AnalysisEngine that this CPM will use to analyze each CAS in the collection.
      • getCasConsumers

        CasConsumer[] getCasConsumers()
        Gets the CasConsumerss assigned to this CPM.
        Returns:
        an array of CasConsumers
      • removeCasConsumer

        void removeCasConsumer​(CasConsumer aCasConsumer)
        Removes a CasConsumer from this CPM.
        Parameters:
        aCasConsumer - the CasConsumer to remove
        Throws:
        UIMA_IllegalStateException - if this CPM is currently processing
      • isSerialProcessingRequired

        boolean isSerialProcessingRequired()
        Gets whether this CPM is required to process the collection's elements serially (as opposed to performing parallelization). Note that a value of false does not guarantee that parallelization is performed; this is left up to the CPM implementation.
        Returns:
        true if and only if serial processing is required
      • setSerialProcessingRequired

        void setSerialProcessingRequired​(boolean aRequired)
        Sets whether this CPM is required to process the collection's elements serially* (as opposed to performing parallelization). If this method is not called,* the default is false. Note that a value of false does not guarantee that parallelization is performed; this is left up to the CPM implementation.
        Parameters:
        aRequired - true if and only if serial processing is required
        Throws:
        UIMA_IllegalStateException - if this CPM is currently processing
      • isPauseOnException

        boolean isPauseOnException()
        Gets whether this CPM will automatically pause processing if an exception occurs. If processing is paused it can be resumed by calling the resume(boolean) method.
        Returns:
        true if and only if this CPM will pause on exception
      • setPauseOnException

        void setPauseOnException​(boolean aPause)
        Sets whether this CPM will automatically pause processing if an exception occurs. If processing is paused it can be resumed by calling the resume(boolean) method.
        Parameters:
        aPause - true if and only if this CPM should pause on exception
        Throws:
        UIMA_IllegalStateException - if this CPM is currently processing
      • addStatusCallbackListener

        void addStatusCallbackListener​(StatusCallbackListener aListener)
        Registers a listsner to receive status callbacks.
        Parameters:
        aListener - the listener to add
      • removeStatusCallbackListener

        void removeStatusCallbackListener​(StatusCallbackListener aListener)
        Unregisters a status callback listener.
        Parameters:
        aListener - the listener to remove
      • process

        void process​(CollectionReader aCollectionReader)
              throws ResourceInitializationException
        Initiates processing of a collection. CollectionReader initializes the CAS with Documents from the Colection. This method starts the processing in another thread and returns immediately. Status of the processing can be obtained by registering a listener with the addStatusCallbackListener(StatusCallbackListener) method.

        A CPM can only process one collection at a time. If this method is called while a previous processing request has not yet completed, a UIMA_IllegalStateException will result. To find out whether a CPM is free to begin another processing request, call the isProcessing() method.

        Parameters:
        aCollectionReader - the CollectionReader from which to obtain the Entities to be processed
        Throws:
        ResourceInitializationException - if an error occurs during initialization
        UIMA_IllegalStateException - if this CPM is currently processing
      • isProcessing

        boolean isProcessing()
        Determines whether this CPM is currently processing. This means that a processing request has been submitted and has not yet completed or been stop()ped. If processing is paused, this method will still return true.
        Returns:
        true if and only if this CPM is currently processing.
      • isPaused

        boolean isPaused()
        Determines whether this CPM's processing is currently paused.
        Returns:
        true if and only if this CPM's processing is currently paused.
      • resume

        void resume​(boolean aRetryFailed)
        Resumes processing that has been paused.
        Parameters:
        aRetryFailed - if processing was paused because an exception occurred (see setPauseOnException(boolean)), setting a value of true for this parameter will cause the failed entity to be retried. A value of false (the default) will cause processing to continue with the next entity after the failure.
        Throws:
        UIMA_IllegalStateException - if processing is not currently paused
      • resume

        void resume()
        Resumes processing that has been paused.
        Throws:
        UIMA_IllegalStateException - if processing is not currently paused
      • getPerformanceReport

        ProcessTrace getPerformanceReport()
        Gets a performance report for the processing that is currently occurring or has just completed.
        Returns:
        an object containing performance statistics
      • getProgress

        Progress[] getProgress()
        Gets a progress report for the processing that is currently occurring or has just completed.
        Returns:
        an array of Progress objects, each of which represents the progress in a different set of units (for example number of entities or bytes)