Package adql.parser

Class QueryFixer

java.lang.Object
adql.parser.QueryFixer

public class QueryFixer extends Object
Tool able to fix some common errors in ADQL queries.

See fix(String) for more details.

Since:
2.0
See Also:
  • Field Details

    • grammarParser

      protected final ADQLGrammar grammarParser
      The used internal ADQL grammar parser.
    • mapRegexUnicodeConfusable

      protected final Map<String,String> mapRegexUnicodeConfusable
      All of the most common Unicode confusable characters and their ASCII/UTF-8 alternative.

      Keys of this map represent the ASCII character while the values are the regular expression for all possible Unicode alternatives.

      Note: All of them have been listed using Unicode Utilities: Confusables.

    • REGEX_DASH

      protected final String REGEX_DASH
      Regular expression matching all Unicode alternatives for -.
      See Also:
    • REGEX_UNDERSCORE

      protected final String REGEX_UNDERSCORE
      Regular expression matching all Unicode alternatives for _.
      See Also:
    • REGEX_QUOTE

      protected final String REGEX_QUOTE
      Regular expression matching all Unicode alternatives for '.
      See Also:
    • REGEX_DOUBLE_QUOTE

      protected final String REGEX_DOUBLE_QUOTE
      Regular expression matching all Unicode alternatives for ".
      See Also:
    • REGEX_STOP

      protected final String REGEX_STOP
      Regular expression matching all Unicode alternatives for ..
      See Also:
    • REGEX_PLUS

      protected final String REGEX_PLUS
      Regular expression matching all Unicode alternatives for +.
      See Also:
    • REGEX_SPACE

      protected final String REGEX_SPACE
      Regular expression matching all Unicode alternatives for .
      See Also:
    • REGEX_LESS_THAN

      protected final String REGEX_LESS_THAN
      Regular expression matching all Unicode alternatives for <.
      See Also:
    • REGEX_GREATER_THAN

      protected final String REGEX_GREATER_THAN
      Regular expression matching all Unicode alternatives for >.
      See Also:
    • REGEX_EQUAL

      protected final String REGEX_EQUAL
      Regular expression matching all Unicode alternatives for =.
      See Also:
  • Constructor Details

  • Method Details

    • fix

      public String fix(String adqlQuery) throws ParseException
      Try fixing tokens/terms of the given ADQL query.

      This function does not try to fix syntactical or semantical errors. It just try to fix the most common issues in ADQL queries, such as:

      • some Unicode characters confusable with ASCII characters (like a space, a dash, ...) ; this function replace them by their ASCII alternative,
      • any of the following are double quoted:
        • non regular ADQL identifiers (e.g. _RAJ2000),
        • ADQL function names used as identifiers (e.g. distance)
        • and SQL reserved keywords (e.g. public).

      Note: This function does not use any instance variable of this parser (especially the InputStream or Reader provided at initialisation or ReInit).

      Parameters:
      adqlQuery - The input ADQL query to fix.
      Returns:
      The suggested correction of the given ADQL query.
      Throws:
      ParseException - If any unrecognised character is encountered, or if anything else prevented the tokenization of some characters/words/terms.
    • replaceUnicodeConfusables

      protected String replaceUnicodeConfusables(String adqlQuery)
      Replace all Unicode characters that can be confused with other ASCI/UTF-8 characters (e.g. different spaces, dashes, ...) in their ASCII version.
      Parameters:
      adqlQuery - The ADQL query string in which Unicode confusable characters must be replaced.
      Returns:
      The same query without the most common Unicode confusable characters.
    • mustEscape

      protected boolean mustEscape(Token token, Token nextToken)
      Tell whether the given token must be double quoted.

      This function considers all the following as terms to double quote:

      • SQL reserved keywords
      • ,
      • unrecognised regular identifiers (e.g. neither a delimited nor a valid ADQL regular identifier)
      • and ADQL function name without a parameters list.
      Parameters:
      token - The token to analyze.
      nextToken - The following token. (useful to detect the start of a function's parameters list)
      Returns:
      true if the given token must be double quoted, false to keep it as provided.