ABLE 2.0.0 07/02/2003 10:25:01

com.ibm.able.beans
Class AbleAbstractImport

java.lang.Object
  |
  +--com.ibm.able.AbleObject
        |
        +--com.ibm.able.beans.AbleAbstractImport
All Implemented Interfaces:
AbleBean, AbleDataBufferManager, AbleDataSource, AbleEventListener, AbleEventListenerManager, AbleEventQueueManager, AbleEventQueueProcessor, AblePropertyChangeManager, AbleSerializable, java.io.Serializable
Direct Known Subclasses:
AbleDBImport, AbleImport

public abstract class AbleAbstractImport
extends AbleObject
implements AbleDataSource, java.io.Serializable

This abstract class provides common interfaces to import data sources for Able Beans.

An Import bean's primary function is to read data from a data source, and parse each record into the outputBuffer array when processed. AbleAbstractImport can load all the records into memory or optionally cache a quantity of records. The number of records cached is specified by the bufferSize. This object handles the cacheing. Each time all records in the data source have been processed, it sends an end-of-file event and increments the numEpochs value.

An Import uses an AbleImportData object to handle the I/O. Meta-data must be provided in order for an AbleImportData to create field variables. When the data source is first opened, it is scanned to determine the number of records. On this first pass it also computes min/mean/max values for continuous fields, creates symbol to index mappings for categorical fields, and creates number to index mappings for discrete fields. To force additional datasources within the same agent to use the same definition, set computeStatistics to false.

An AbleAbstractImport can be used to generate an AbleFilter bean which will translate the data in the manner specified in the meta-data definition file. Field usage can be initialized from a data definition file (a *.dfn file) for text import beans. It can also be specified interactively on the customizer's data panel for import objects such as database imports whose metadata does not include field usage.

When an Import is processed, it populates the outputBuffer array with elements from the data source. If the data consists solely of continuous fields, a double array is used; otherwise, a String array is populated. Records may be processed sequentially from the data source, or in random sequence. When buffering is used, the records are randomly retrieved from within each buffer. After all records in the buffer have been processed, the next buffer of records is retrieved.

See Also:
Serialized Form

Field Summary
protected  int bufferRecordIndex
          Current record in the buffer file being processed.
protected  int bufferSize
          The maximum number of records to read in a block from this data source.
protected  boolean computeStatistics
          A boolean indicating that metadata is to be opened and field statistics are to be computed when the data source is opened.
protected  boolean cycleRelative
          A flag indicating the cycleSize is relative to the file size, ie, a multiplier.
protected  double cycleSize
          When cycleRelative is false, cycleSize is the raw number of records to process in a cycle.
static java.lang.String defaultName
          Value assigned to name by default.
protected  boolean eof
          When the last record in the file has been processed, eof is true.
protected  java.util.Vector fieldList
          A Vector of AbleField objects describing the data source.
protected  AbleImportData importData
          The AbleImportData object referenced by this import.
protected  long numEpochs
          The number of times this data source has processed all records it contains.
protected  java.util.Vector numericData
          A Vector of double arrays containing records from the database table.
protected  int numFieldsPerRecord
          The number of fields in a record from a data source.
protected  long numRecords
          The total number of records in this data source.
 double[] outNum
          A double array used in calculating the output buffer.
 java.lang.Object[] outSym
          A String array used in calculating the output buffer.
protected  int[] randomIndices
          An array of indices used when records are randomly accessed.
protected  boolean randomizeData
          Determines whether to output records from the data source in random or sequential order.
protected  long recordIndex
          Current record in the entire data file being processed.
protected  long recordsRead
          The number of the records read from the start of the data source.
protected  java.util.Vector textData
          A Vector of String arrays containing records from the database table.
 
Fields inherited from class com.ibm.able.AbleObject
changed, chgSupport, comment, dataFlowEnabled, destBufferConnections, eventQueue, fileName, inputBuffer, listeners, logger, name, outputBuffer, parent, propertyConnectionMgr, sourceBufferConnections, state, stateChgSupport, trace
 
Constructor Summary
AbleAbstractImport()
          Construct a default AbleAbstractImport object.
AbleAbstractImport(java.lang.String name)
          Construct an AbleAbstractImport object with specified name.
 
Method Summary
 void close()
          Close the data source, disable data flow, and set its state to Unititiated.
static java.lang.String Copyright()
          Determine the copyright of this class.
 void endOfFile()
          Notify any listeners that we are at the end of the file, and increment the epoch count numEpochs.
 boolean eof()
          Return whether the data source is at end of file.
protected  java.util.Vector getAgentFieldList()
          Get the default fieldList for this object's container agent.
 int getBufferSize()
          Return the buffer size.
 boolean getComputeStatistics()
          Return the value of the computeStatistics setting.
 long getCurrentRecordIndex()
          Get the index of the last record in the entire data file processed.
 double getCycleSize()
          Return the raw cycle size setting.
 java.lang.String getCycleSizeAsString()
          Return the raw cycle size formatted appropriately for the cycleRelative flag.
 java.util.Vector getFieldList()
          Return a Vector of AbleField objects defining each field in the data source.
 java.util.Vector getFieldList(java.lang.String usageType)
          Return a Vector of AbleField objects with the specified usage.
 void getNextRecordBlock()
          Read the next bufferSize records from the data source.
 int getNormalizedRecordSize()
          Return the size of the record after categorical and discrete fields are expanded.
 int getNumberOfOutputFields()
          Return the number of fields per record in the data source.
 long getNumEpochs()
          Retrieve the number of passes over the data, or epochs.
 long getNumRecords()
          Return the number of records in the data source.
 long getRecordsRead()
          Return the current count of records read from the beginning of the data source.
 long getStepsPerCycle()
          Calculate and return the number of steps in a cycle from the raw cycle size, using the cycleRelative flag.
 void init()
          Open the data source.
 boolean isAllNumericData()
          Return true if all fields are "continuous", and false if any are "discrete" or "categorical" (i.e.
 boolean isCycleRelative()
          Return whether the raw cycle size is to be interpreted as a factor of the number of records in the data source, or as an absolute number of records.
 boolean isRandomizeData()
          Return whether records are processed in random sequence or not.
 boolean isReady()
          Indicate whether the importData is ready to provide data.
 void open()
          Open the data source if it is ready.
 void process()
          Process gets the next record from the data source, and places its contents in the outputBuffer.
 void processAbleEvent(AbleEvent e)
          Process an AbleEvent sent by another Able bean.
 void processTimerEvent()
          Process a timer expiration event synchronously; that is, on the same thread as the caller.
 void quitAll()
          Close an open data source.
 void reset()
          Set processing options to default values, and re-initialize (reopen) the the data source.
 void setBufferSize(int size)
          Set the buffer size, which determines whether to load the entire data source (=0) or just pieces of it (>0) into memory.
 void setComputeStatistics(boolean computeStatistics)
          Set the value of the computeStatistics flag.
 void setCycleSize(double cycleSize, boolean relative)
          Set the cycle size and definition for its use.
protected  void setDefaults()
          Set processing options to default values.
 void setFieldList(java.util.Vector fieldList)
          Set a Vector of AbleField objects defining each field in the data source.
 void setRandomizeData(boolean state)
          Set the randomize flag so records are processed in random sequence.
 
Methods inherited from class com.ibm.able.AbleObject
addAbleEventListener, addDestBufferConnection, addPropertyChangeListener, addPropertyConnection, addSourceBufferConnection, addStateChangeListener, dataChanged, firePropertyChange, flushAbleEventQueue, getAbleEventListeners, getAbleEventProcessingEnabled, getAbleEventQueueSize, getComment, getDestBufferConnections, getFileName, getInputBuffer, getInputBuffer, getInputBufferAsStringArray, getInputBufferContents, getLogger, getName, getOutputBuffer, getOutputBuffer, getOutputBufferAsStringArray, getOutputBufferContents, getParent, getPropertyConnectionManager, getSleepTime, getSourceBufferConnections, getState, getTraceLogger, handleAbleEvent, hasInputBuffer, hasOutputBuffer, init, isAbleEventPostingEnabled, isAbleEventProcessingEnabled, isChanged, isConnectable, isDataFlowEnabled, isTimerEventProcessingEnabled, notifyAbleEventListeners, process, processBufferConnections, processNoEventProcessingEnabledSituation, quitEnabledEventProcessing, removeAbleEventListener, removeAllAbleEventListeners, removeAllBufferConnections, removeAllConnections, removeAllPropertyConnections, removeDestBufferConnection, removePropertyChangeListener, removePropertyConnection, removeSourceBufferConnection, removeStateChangeListener, restartEnabledEventProcessing, restoreFromFile, restoreFromFile, restoreFromSerializedFile, restoreFromStream, resumeAll, resumeEnabledEventProcessing, saveToFile, saveToFile, setAbleEventProcessingEnabled, setChanged, setComment, setDataFlowEnabled, setFileName, setInputBuffer, setInputBuffer, setLogger, setName, setOutputBuffer, setOutputBuffer, setParent, setSleepTime, setState, setTimerEventProcessingEnabled, setTraceLogger, sourceConnectionsOK, startEnabledEventProcessing, suspendAll, suspendEnabledEventProcessing
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 
Methods inherited from interface com.ibm.able.AbleBean
getComment, getLogger, getName, getParent, getState, getTraceLogger, init, isChanged, process, removeAllConnections, resumeAll, setChanged, setComment, setLogger, setName, setParent, setState, setTraceLogger, suspendAll
 
Methods inherited from interface com.ibm.able.AbleDataBufferManager
addDestBufferConnection, addSourceBufferConnection, getDestBufferConnections, getInputBuffer, getInputBuffer, getInputBufferAsStringArray, getInputBufferContents, getOutputBuffer, getOutputBuffer, getOutputBufferAsStringArray, getOutputBufferContents, getSourceBufferConnections, hasInputBuffer, hasOutputBuffer, isConnectable, isDataFlowEnabled, processBufferConnections, removeAllBufferConnections, removeDestBufferConnection, removeSourceBufferConnection, setDataFlowEnabled, setInputBuffer, setInputBuffer, setOutputBuffer, setOutputBuffer
 
Methods inherited from interface com.ibm.able.AbleEventListener
handleAbleEvent
 
Methods inherited from interface com.ibm.able.AbleEventListenerManager
addAbleEventListener, dataChanged, getAbleEventListeners, notifyAbleEventListeners, removeAbleEventListener
 
Methods inherited from interface com.ibm.able.AbleEventQueueManager
flushAbleEventQueue, getAbleEventProcessingEnabled, getAbleEventQueueSize, getSleepTime, isAbleEventPostingEnabled, isAbleEventProcessingEnabled, isTimerEventProcessingEnabled, quitEnabledEventProcessing, restartEnabledEventProcessing, resumeEnabledEventProcessing, setAbleEventProcessingEnabled, setSleepTime, setTimerEventProcessingEnabled, startEnabledEventProcessing, suspendEnabledEventProcessing
 
Methods inherited from interface com.ibm.able.AbleEventQueueProcessor
processNoEventProcessingEnabledSituation
 
Methods inherited from interface com.ibm.able.AblePropertyChangeManager
addPropertyChangeListener, addPropertyConnection, getPropertyConnectionManager, removeAllPropertyConnections, removePropertyChangeListener, removePropertyConnection
 
Methods inherited from interface com.ibm.able.AbleSerializable
getFileName, restoreFromFile, restoreFromFile, saveToFile, saveToFile, setFileName
 

Field Detail

defaultName

public static final java.lang.String defaultName
Value assigned to name by default.

numFieldsPerRecord

protected int numFieldsPerRecord
The number of fields in a record from a data source.

bufferSize

protected int bufferSize
The maximum number of records to read in a block from this data source. A value of 0 means all records should be read. If bufferSize turns out to be larger than the number of records in the data source, it is reset to the number of records in the data source.

numRecords

protected long numRecords
The total number of records in this data source.

recordsRead

protected long recordsRead
The number of the records read from the start of the data source.

numEpochs

protected long numEpochs
The number of times this data source has processed all records it contains.

importData

protected AbleImportData importData
The AbleImportData object referenced by this import.

randomizeData

protected boolean randomizeData
Determines whether to output records from the data source in random or sequential order.

randomIndices

protected int[] randomIndices
An array of indices used when records are randomly accessed.

computeStatistics

protected boolean computeStatistics
A boolean indicating that metadata is to be opened and field statistics are to be computed when the data source is opened.

fieldList

protected java.util.Vector fieldList
A Vector of AbleField objects describing the data source.

textData

protected transient java.util.Vector textData
A Vector of String arrays containing records from the database table. The Vector contains all records if bufferSize is 0.

numericData

protected transient java.util.Vector numericData
A Vector of double arrays containing records from the database table. This vector is populated only if all fields in the data source are continuous.

recordIndex

protected long recordIndex
Current record in the entire data file being processed. If randomize is on, this is the index to the random array.

bufferRecordIndex

protected int bufferRecordIndex
Current record in the buffer file being processed. If randomize is on, this is the index to the random array.

eof

protected boolean eof
When the last record in the file has been processed, eof is true.

cycleSize

protected double cycleSize
When cycleRelative is false, cycleSize is the raw number of records to process in a cycle. When true, the raw cycleSize is multiplied by the number of records in the data source to obtain the number of records to process.

cycleRelative

protected boolean cycleRelative
A flag indicating the cycleSize is relative to the file size, ie, a multiplier. When false, it indicates cycleSize is an absolute number of records.

outNum

public transient double[] outNum
A double array used in calculating the output buffer.

outSym

public transient java.lang.Object[] outSym
A String array used in calculating the output buffer.
Constructor Detail

AbleAbstractImport

public AbleAbstractImport()
                   throws AbleException
Construct a default AbleAbstractImport object.

AbleAbstractImport

public AbleAbstractImport(java.lang.String name)
                   throws AbleException
Construct an AbleAbstractImport object with specified name.
Parameters:
name - A String containing the name used to identify this bean.
Method Detail

init

public void init()
          throws AbleException
Open the data source. This calls open.
Specified by:
init in interface AbleBean
Overrides:
init in class AbleObject
See Also:
open()

open

public void open()
          throws AbleException
Open the data source if it is ready. Calculate the number of records, the number of fields per record, set the state to Active, and enable data flow. Leave the number of records specified by bufferSize in the textData and numericData Vectors, with the cursor positioned before the first record. Objects extending AbleAbstractImport should:
  1. override open
  2. construct and set values on its importData object
  3. call super.open(), this method, which opens the importData object
After an import bean is opened, its computeStatistics flag is set to false, its state is changed to Active, and dataFlow is set to enabled. Called by init.
See Also:
init()

getAgentFieldList

protected java.util.Vector getAgentFieldList()
                                      throws AbleException
Get the default fieldList for this object's container agent.

The result is the fieldList from the active data source if there is one, or the first opened data source otherwise. It will be an empty Vector if the container has no open data sources. If this object is not in a container, return the object's current fieldList.


process

public void process()
             throws AbleException
Process gets the next record from the data source, and places its contents in the outputBuffer. When the end of the data source is reached, it sends an endOfFile event, increments the number of epochs, and restarts the data source at its first record. If processing is successful, the outputBuffer will be populated, recordIndex incremented, and a dataChange event distributed.
Specified by:
process in interface AbleBean
Overrides:
process in class AbleObject
Following copied from interface: com.ibm.able.AbleBean
Throws:
AbleException - If an error occurs.

processTimerEvent

public void processTimerEvent()
                       throws AbleException
Process a timer expiration event synchronously; that is, on the same thread as the caller.

This method is called by our AbleEventQueue whenever the following conditions are all true:

This method calls process populate the output buffer with the next data record.

Specified by:
processTimerEvent in interface AbleEventQueueProcessor
Overrides:
processTimerEvent in class AbleObject
See Also:
AbleObject.setSleepTime(long), AbleObject.setTimerEventProcessingEnabled(boolean), AbleObject.startEnabledEventProcessing()

reset

public void reset()
           throws AbleException
Set processing options to default values, and re-initialize (reopen) the the data source. Meta-data is recreated if computeStatistics is set to true.
Specified by:
reset in interface AbleBean
Overrides:
reset in class AbleObject
Following copied from interface: com.ibm.able.AbleBean
Throws:
AbleException - If an error occurs.

setDefaults

protected void setDefaults()
                    throws AbleException
Set processing options to default values. No timer processing or event posting or processing. Also set recordsRead and numEpochs counters to 0.

quitAll

public void quitAll()
             throws AbleException
Close an open data source.
Specified by:
quitAll in interface AbleBean
Overrides:
quitAll in class AbleObject
See Also:
close()

close

public void close()
           throws AbleException
Close the data source, disable data flow, and set its state to Unititiated.

getNumberOfOutputFields

public int getNumberOfOutputFields()
Return the number of fields per record in the data source.
Specified by:
getNumberOfOutputFields in interface AbleDataSource

getNumRecords

public long getNumRecords()
Return the number of records in the data source.
Specified by:
getNumRecords in interface AbleDataSource

getNextRecordBlock

public void getNextRecordBlock()
                        throws AbleException
Read the next bufferSize records from the data source. Convert to numeric if the data source is all numeric. Regenerate the random indices if randomize is on.

setBufferSize

public void setBufferSize(int size)
                   throws AbleException
Set the buffer size, which determines whether to load the entire data source (=0) or just pieces of it (>0) into memory.

getBufferSize

public int getBufferSize()
Return the buffer size.

getRecordsRead

public long getRecordsRead()
Return the current count of records read from the beginning of the data source.

getCurrentRecordIndex

public long getCurrentRecordIndex()
Get the index of the last record in the entire data file processed. Regardless of the randomize setting, this is the absolute index into the data source for the current record. A value of -1 indicates no record has been read.

processAbleEvent

public void processAbleEvent(AbleEvent e)
                      throws AbleException
Process an AbleEvent sent by another Able bean. Extending imports can override this method with desired operations.
Specified by:
processAbleEvent in interface AbleEventQueueProcessor
Overrides:
processAbleEvent in class AbleObject
Following copied from interface: com.ibm.able.AbleEventQueueProcessor
Parameters:
theAbleEvent - The event to process.
Throws:
AbleException - If an error occurs.

getNumEpochs

public long getNumEpochs()
Retrieve the number of passes over the data, or epochs.
Specified by:
getNumEpochs in interface AbleDataSource

getFieldList

public java.util.Vector getFieldList()
Return a Vector of AbleField objects defining each field in the data source.
Specified by:
getFieldList in interface AbleDataSource

setFieldList

public void setFieldList(java.util.Vector fieldList)
Set a Vector of AbleField objects defining each field in the data source.
Specified by:
setFieldList in interface AbleDataSource
Parameters:
fieldList - A Vector of fields. The Vector may be empty, but not null.

getFieldList

public java.util.Vector getFieldList(java.lang.String usageType)
                              throws AbleException
Return a Vector of AbleField objects with the specified usage. Usage types are listed in AbleData.UsageType(String).
Parameters:
usageType - A String denoting usage type.

getNormalizedRecordSize

public int getNormalizedRecordSize()
                            throws AbleException
Return the size of the record after categorical and discrete fields are expanded. It accumulates the normalized size for each field in the fieldList.

A categorical field, for example, is encoded in 1-of-N format in which one field in a boolean vector is used to indicate the value present. The expanded field thus is the same length as the number of unique categorical values.


isAllNumericData

public boolean isAllNumericData()
Return true if all fields are "continuous", and false if any are "discrete" or "categorical" (i.e. symbols).
Specified by:
isAllNumericData in interface AbleDataSource

eof

public boolean eof()
Return whether the data source is at end of file.

isReady

public boolean isReady()
                throws AbleException
Indicate whether the importData is ready to provide data. This means the fieldList is constructed and data source is open.
Specified by:
isReady in interface AbleDataSource

endOfFile

public void endOfFile()
Notify any listeners that we are at the end of the file, and increment the epoch count numEpochs.

setRandomizeData

public void setRandomizeData(boolean state)
Set the randomize flag so records are processed in random sequence.

isRandomizeData

public boolean isRandomizeData()
Return whether records are processed in random sequence or not.

getCycleSize

public double getCycleSize()
Return the raw cycle size setting.

getCycleSizeAsString

public java.lang.String getCycleSizeAsString()
Return the raw cycle size formatted appropriately for the cycleRelative flag.

isCycleRelative

public boolean isCycleRelative()
Return whether the raw cycle size is to be interpreted as a factor of the number of records in the data source, or as an absolute number of records.

getStepsPerCycle

public long getStepsPerCycle()
Calculate and return the number of steps in a cycle from the raw cycle size, using the cycleRelative flag.
Specified by:
getStepsPerCycle in interface AbleDataSource

setCycleSize

public void setCycleSize(double cycleSize,
                         boolean relative)
Set the cycle size and definition for its use.
Parameters:
cycleSize - A double value.
relative - A boolean indicating how to interpret the cycleSize. If true, the number of steps in a cycle is the cycleSize multiplied by the number of records in the data source. If false, the number is the absolute number of records to process in a cycle.

getComputeStatistics

public boolean getComputeStatistics()
Return the value of the computeStatistics setting.

setComputeStatistics

public void setComputeStatistics(boolean computeStatistics)
Set the value of the computeStatistics flag.

Copyright

public static java.lang.String Copyright()
Determine the copyright of this class.
Returns:
A String containing this class's copyright statement.


ABLE 2.0.0 07/02/2003 10:25:01

(C) Copyright IBM Corporation 1999, 2003