Class DatabaseLoader

java.lang.Object
weka.core.converters.AbstractLoader
weka.core.converters.DatabaseLoader
All Implemented Interfaces:
Serializable, BatchConverter, DatabaseConverter, IncrementalConverter, Loader, OptionHandler, RevisionHandler

public class DatabaseLoader extends AbstractLoader implements BatchConverter, IncrementalConverter, DatabaseConverter, OptionHandler
Reads Instances from a Database. Can read a database in batch or incremental mode.
In inremental mode MySQL and HSQLDB are supported.
For all other DBMS set a pseudoincremental mode is used:
In pseudo incremental mode the instances are read into main memory all at once and then incrementally provided to the user.
For incremental loading the rows in the database table have to be ordered uniquely.
The reason for this is that every time only a single row is fetched by extending the user query by a LIMIT clause.
If this extension is impossible instances will be loaded pseudoincrementally. To ensure that every row is fetched exaclty once, they have to ordered.
Therefore a (primary) key is necessary.This approach is chosen, instead of using JDBC driver facilities, because the latter one differ betweeen different drivers.
If you use the DatabaseSaver and save instances by generating automatically a primary key (its name is defined in DtabaseUtils), this primary key will be used for ordering but will not be part of the output. The user defined SQL query to extract the instances should not contain LIMIT and ORDER BY clauses (see -Q option).
In addition, for incremental loading, you can define in the DatabaseUtils file how many distinct values a nominal attribute is allowed to have. If this number is exceeded, the column will become a string attribute.
In batch mode no string attributes will be created.

Valid options are:

 -url <JDBC URL>
  The JDBC URL to connect to.
  (default: from DatabaseUtils.props file)
 -user <name>
  The user to connect with to the database.
  (default: none)
 -password <password>
  The password to connect with to the database.
  (default: none)
 -Q <query>
  SQL query of the form
   SELECT <list of columns>|* FROM <table> [WHERE]
  to execute.
  (default: Select * From Results0)
 -P <list of column names>
  List of column names uniquely defining a DB row
  (separated by ', ').
  Used for incremental loading.
  If not specified, the key will be determined automatically,
  if possible with the used JDBC driver.
  The auto ID column created by the DatabaseSaver won't be loaded.
 -I
  Sets incremental loading
Version:
$Revision: 11199 $
Author:
Stefan Mutter (mutter@cs.waikato.ac.nz)
See Also:
  • Constructor Details

    • DatabaseLoader

      public DatabaseLoader() throws Exception
      Constructor
      Throws:
      Exception - if initialization fails
  • Method Details

    • globalInfo

      public String globalInfo()
      Returns a string describing this Loader
      Returns:
      a description of the Loader suitable for displaying in the explorer/experimenter gui
    • reset

      public void reset() throws Exception
      Resets the Loader ready to read a new data set
      Specified by:
      reset in interface Loader
      Overrides:
      reset in class AbstractLoader
      Throws:
      Exception - if an error occurs while disconnecting from the database
    • resetStructure

      public void resetStructure()
      Resets the structure of instances
    • setQuery

      public void setQuery(String q)
      Sets the query to execute against the database
      Parameters:
      q - the query to execute
    • getQuery

      public String getQuery()
      Gets the query to execute against the database
      Returns:
      the query
    • queryTipText

      public String queryTipText()
      the tip text for this property
      Returns:
      the tip text
    • setKeys

      public void setKeys(String keys)
      Sets the key columns of a database table
      Parameters:
      keys - a String containing the key columns in a comma separated list.
    • getKeys

      public String getKeys()
      Gets the key columns' name
      Returns:
      name of the key columns'
    • keysTipText

      public String keysTipText()
      the tip text for this property
      Returns:
      the tip text
    • setUrl

      public void setUrl(String url)
      Sets the database URL
      Specified by:
      setUrl in interface DatabaseConverter
      Parameters:
      url - string with the database URL
    • getUrl

      public String getUrl()
      Gets the URL
      Specified by:
      getUrl in interface DatabaseConverter
      Returns:
      the URL
    • urlTipText

      public String urlTipText()
      the tip text for this property
      Returns:
      the tip text
    • setUser

      public void setUser(String user)
      Sets the database user
      Specified by:
      setUser in interface DatabaseConverter
      Parameters:
      user - the database user name
    • getUser

      public String getUser()
      Gets the user name
      Specified by:
      getUser in interface DatabaseConverter
      Returns:
      name of database user
    • userTipText

      public String userTipText()
      the tip text for this property
      Returns:
      the tip text
    • setPassword

      public void setPassword(String password)
      Sets user password for the database
      Specified by:
      setPassword in interface DatabaseConverter
      Parameters:
      password - the password
    • getPassword

      public String getPassword()
      Returns the database password
      Returns:
      the database password
    • passwordTipText

      public String passwordTipText()
      the tip text for this property
      Returns:
      the tip text
    • setSource

      public void setSource(String url, String userName, String password)
      Sets the database url, user and pw
      Parameters:
      url - the database url
      userName - the user name
      password - the password
    • setSource

      public void setSource(String url)
      Sets the database url
      Parameters:
      url - the database url
    • setSource

      public void setSource() throws Exception
      Sets the database url using the DatabaseUtils file
      Throws:
      Exception - if something goes wrong
    • connectToDatabase

      public void connectToDatabase()
      Opens a connection to the database
    • getStructure

      public Instances getStructure() throws IOException
      Determines and returns (if possible) the structure (internally the header) of the data set as an empty set of instances.
      Specified by:
      getStructure in interface Loader
      Specified by:
      getStructure in class AbstractLoader
      Returns:
      the structure of the data set as an empty set of Instances
      Throws:
      IOException - if an error occurs
    • getDataSet

      public Instances getDataSet() throws IOException
      Return the full data set in batch mode (header and all intances at once).
      Specified by:
      getDataSet in interface Loader
      Specified by:
      getDataSet in class AbstractLoader
      Returns:
      the structure of the data set as an empty set of Instances
      Throws:
      IOException - if there is no source or parsing fails
    • getNextInstance

      public Instance getNextInstance(Instances structure) throws IOException
      Read the data set incrementally---get the next instance in the data set or returns null if there are no more instances to get. If the structure hasn't yet been determined by a call to getStructure then method does so before returning the next instance in the data set.
      Specified by:
      getNextInstance in interface Loader
      Specified by:
      getNextInstance in class AbstractLoader
      Parameters:
      structure - the dataset header information, will get updated in case of string or relational attributes
      Returns:
      the next instance in the data set as an Instance object or null if there are no more instances to be read
      Throws:
      IOException - if there is an error during parsing
    • getOptions

      public String[] getOptions()
      Gets the setting
      Specified by:
      getOptions in interface OptionHandler
      Returns:
      the current setting
    • listOptions

      public Enumeration listOptions()
      Lists the available options
      Specified by:
      listOptions in interface OptionHandler
      Returns:
      an enumeration of the available options
    • setOptions

      public void setOptions(String[] options) throws Exception
      Sets the options. Valid options are:

       -url <JDBC URL>
        The JDBC URL to connect to.
        (default: from DatabaseUtils.props file)
       -user <name>
        The user to connect with to the database.
        (default: none)
       -password <password>
        The password to connect with to the database.
        (default: none)
       -Q <query>
        SQL query of the form
         SELECT <list of columns>|* FROM <table> [WHERE]
        to execute.
        (default: Select * From Results0)
       -P <list of column names>
        List of column names uniquely defining a DB row
        (separated by ', ').
        Used for incremental loading.
        If not specified, the key will be determined automatically,
        if possible with the used JDBC driver.
        The auto ID column created by the DatabaseSaver won't be loaded.
       -I
        Sets incremental loading
      Specified by:
      setOptions in interface OptionHandler
      Parameters:
      options - the options
      Throws:
      Exception - if options cannot be set
    • getRevision

      public String getRevision()
      Returns the revision string.
      Specified by:
      getRevision in interface RevisionHandler
      Returns:
      the revision
    • main

      public static void main(String[] options)
      Main method.
      Parameters:
      options - the options