Class CharacterUtils


  • public class CharacterUtils
    extends java.lang.Object
    Collection of utilities for character handling. Contains utilities for semi-automatically creating lexer rules.
    • Constructor Summary

      Constructors 
      Constructor Description
      CharacterUtils()
      Constructor for CharacterUtils.
    • Method Summary

      All Methods Static Methods Concrete Methods 
      Modifier and Type Method Description
      static java.util.ArrayList<org.apache.uima.internal.util.CharacterUtils.CharRange> getDigitRange()
      Generate an ArrayList of CharRanges for what Java considers to be a digit.
      static java.util.ArrayList<org.apache.uima.internal.util.CharacterUtils.CharRange> getLetterRange()
      Generate an ArrayList of CharRanges for what Java considers to be a letter.
      static void main​(java.lang.String[] args)  
      static void printAntlrLexRule​(java.lang.String name, java.util.ArrayList<org.apache.uima.internal.util.CharacterUtils.CharRange> charRanges)  
      static void printJavaCCLexRule​(java.lang.String name, java.util.ArrayList<org.apache.uima.internal.util.CharacterUtils.CharRange> charRanges)  
      static java.lang.String toHexString​(char c)
      Create a hex representation of the UTF-16 encoding of a Java char.
      static java.lang.String toUnicodeChar​(char c)
      Create a hex representation of the UTF-16 encoding of a Java char.
      • Methods inherited from class java.lang.Object

        clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
    • Constructor Detail

      • CharacterUtils

        public CharacterUtils()
        Constructor for CharacterUtils.
    • Method Detail

      • toUnicodeChar

        public static java.lang.String toUnicodeChar​(char c)
        Create a hex representation of the UTF-16 encoding of a Java char. This is the representation that's understood by Java when reading source code.
        Parameters:
        c - The char to be encoded.
        Returns:
        String Hex representation of character. For example, the result of encoding 'A' would be "A".
      • toHexString

        public static java.lang.String toHexString​(char c)
        Create a hex representation of the UTF-16 encoding of a Java char. This is the representation that's understood by the JavaCC lexer.
        Parameters:
        c - The char to be encoded.
        Returns:
        String Hex representation of character. For example, the result of encoding 'A' would be "0x0041".
      • getLetterRange

        public static java.util.ArrayList<org.apache.uima.internal.util.CharacterUtils.CharRange> getLetterRange()
        Generate an ArrayList of CharRanges for what Java considers to be a letter. I use this as input to Unicode agnostic lexers like ANTLR.
        Returns:
        ArrayList A list of character ranges.
      • getDigitRange

        public static java.util.ArrayList<org.apache.uima.internal.util.CharacterUtils.CharRange> getDigitRange()
        Generate an ArrayList of CharRanges for what Java considers to be a digit. I use this as input to Unicode agnostic lexers like ANTLR.
        Returns:
        ArrayList A list of character ranges.
      • printAntlrLexRule

        public static void printAntlrLexRule​(java.lang.String name,
                                             java.util.ArrayList<org.apache.uima.internal.util.CharacterUtils.CharRange> charRanges)
      • printJavaCCLexRule

        public static void printJavaCCLexRule​(java.lang.String name,
                                              java.util.ArrayList<org.apache.uima.internal.util.CharacterUtils.CharRange> charRanges)
      • main

        public static void main​(java.lang.String[] args)