Class ClassicTokenizerImpl

java.lang.Object
org.apache.lucene.analysis.classic.ClassicTokenizerImpl

class ClassicTokenizerImpl extends Object
This class implements the classic lucene StandardTokenizer up until 3.0
  • Field Summary

    Fields
    Modifier and Type
    Field
    Description
    static final int
     
    static final int
     
    static final int
     
    static final int
     
    static final int
     
    static final int
     
    static final int
     
    static final int
     
    static final int
     
    static final String[]
     
    private long
    Number of characters up to the start of the matched text.
    private int
    Number of characters from the last newline up to the start of the matched text.
    static final int
    This character denotes the end of file.
    static final int
    Lexical states.
    private int
    Number of newlines encountered up to the start of the matched text.
    private static final int[]
    Translates DFA states to action switch labels.
    private static final String
     
    private static final int[]
    ZZ_ATTRIBUTE[aState] contains the attributes of state aState
    private static final String
     
    private static final int
    Initial size of the lookahead buffer.
    private static final int[]
    Second-level tables for translating characters to character classes
    private static final String
     
    private static final int[]
    Top-level table for translating characters to character classes
    private static final String
     
    private static final String[]
    Error messages for ZZ_UNKNOWN_ERROR, ZZ_NO_MATCH, and ZZ_PUSHBACK_2BIG respectively.
    private static final int[]
    ZZ_LEXSTATE[l] is the state in the DFA for the lexical state l ZZ_LEXSTATE[l+1] is the state in the DFA for the lexical state l at the beginning of a line l is of the form l = 2*k, k a non negative integer
    private static final int
    Error code for "could not match input".
    private static final int
    Error code for "pushback value was too large".
    private static final int[]
    Translates a state to a row index in the transition table
    private static final String
     
    private static final int[]
    The transition table of the DFA
    private static final String
     
    private static final int
    Error code for "Unknown internal scanner error".
    private boolean
    Whether the scanner is currently at the beginning of a line.
    private boolean
    Whether the scanner is at the end of file.
    private char[]
    This buffer contains the current text to be matched and is the source of the yytext() string.
    private int
    Current text position in the buffer.
    private int
    Marks the last character in the buffer, that has been read from input.
    private boolean
    Whether the user-EOF-code has already been executed.
    private int
    The number of occupied positions in zzBuffer beyond zzEndRead.
    private int
    Current lexical state.
    private int
    Text position at the last accepting state.
    private Reader
    Input device.
    private int
    Marks the beginning of the yytext() string in the buffer.
    private int
    Current state of the DFA.
  • Constructor Summary

    Constructors
    Constructor
    Description
    Creates a new scanner
  • Method Summary

    Modifier and Type
    Method
    Description
    int
    Resumes scanning until the next regular expression is matched, the end of input is encountered or an I/O-Error occurs.
    final void
    Fills CharTermAttribute with the current token text.
    final void
    setBufferSize(int numChars)
     
    final boolean
    Returns whether the scanner has reached the end of the reader it reads from.
    final void
    yybegin(int newState)
    Enters a new lexical state.
    final int
     
    final char
    yycharat(int position)
    Returns the character at the given position from the matched text.
    final void
    Closes the input reader.
    final int
    How many characters were matched.
    void
    yypushback(int number)
    Pushes the specified amount of characters back into the input stream.
    final void
    yyreset(Reader reader)
    Resets the scanner to read from a new input stream.
    private final void
    Resets the input position.
    final int
    Returns the current lexical state.
    final String
    Returns the text matched by the current regular expression.
    private static int
    zzCMap(int input)
    Translates raw input code points to DFA table row
    private boolean
    Refills the input buffer.
    private static void
    zzScanError(int errorCode)
    Reports an error that occurred while scanning.
    private static int[]
     
    private static int
    zzUnpackAction(String packed, int offset, int[] result)
     
    private static int[]
     
    private static int
    zzUnpackAttribute(String packed, int offset, int[] result)
     
    private static int[]
     
    private static int
    zzUnpackcmap_blocks(String packed, int offset, int[] result)
     
    private static int[]
     
    private static int
    zzUnpackcmap_top(String packed, int offset, int[] result)
     
    private static int[]
     
    private static int
    zzUnpackRowMap(String packed, int offset, int[] result)
     
    private static int[]
     
    private static int
    zzUnpackTrans(String packed, int offset, int[] result)
     

    Methods inherited from class java.lang.Object

    clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
  • Field Details

    • YYEOF

      public static final int YYEOF
      This character denotes the end of file.
      See Also:
    • ZZ_BUFFERSIZE

      private static final int ZZ_BUFFERSIZE
      Initial size of the lookahead buffer.
      See Also:
    • YYINITIAL

      public static final int YYINITIAL
      Lexical states.
      See Also:
    • ZZ_LEXSTATE

      private static final int[] ZZ_LEXSTATE
      ZZ_LEXSTATE[l] is the state in the DFA for the lexical state l ZZ_LEXSTATE[l+1] is the state in the DFA for the lexical state l at the beginning of a line l is of the form l = 2*k, k a non negative integer
    • ZZ_CMAP_TOP

      private static final int[] ZZ_CMAP_TOP
      Top-level table for translating characters to character classes
    • ZZ_CMAP_TOP_PACKED_0

      private static final String ZZ_CMAP_TOP_PACKED_0
      See Also:
    • ZZ_CMAP_BLOCKS

      private static final int[] ZZ_CMAP_BLOCKS
      Second-level tables for translating characters to character classes
    • ZZ_CMAP_BLOCKS_PACKED_0

      private static final String ZZ_CMAP_BLOCKS_PACKED_0
      See Also:
    • ZZ_ACTION

      private static final int[] ZZ_ACTION
      Translates DFA states to action switch labels.
    • ZZ_ACTION_PACKED_0

      private static final String ZZ_ACTION_PACKED_0
      See Also:
    • ZZ_ROWMAP

      private static final int[] ZZ_ROWMAP
      Translates a state to a row index in the transition table
    • ZZ_ROWMAP_PACKED_0

      private static final String ZZ_ROWMAP_PACKED_0
      See Also:
    • ZZ_TRANS

      private static final int[] ZZ_TRANS
      The transition table of the DFA
    • ZZ_TRANS_PACKED_0

      private static final String ZZ_TRANS_PACKED_0
      See Also:
    • ZZ_UNKNOWN_ERROR

      private static final int ZZ_UNKNOWN_ERROR
      Error code for "Unknown internal scanner error".
      See Also:
    • ZZ_NO_MATCH

      private static final int ZZ_NO_MATCH
      Error code for "could not match input".
      See Also:
    • ZZ_PUSHBACK_2BIG

      private static final int ZZ_PUSHBACK_2BIG
      Error code for "pushback value was too large".
      See Also:
    • ZZ_ERROR_MSG

      private static final String[] ZZ_ERROR_MSG
      Error messages for ZZ_UNKNOWN_ERROR, ZZ_NO_MATCH, and ZZ_PUSHBACK_2BIG respectively.
    • ZZ_ATTRIBUTE

      private static final int[] ZZ_ATTRIBUTE
      ZZ_ATTRIBUTE[aState] contains the attributes of state aState
    • ZZ_ATTRIBUTE_PACKED_0

      private static final String ZZ_ATTRIBUTE_PACKED_0
      See Also:
    • zzReader

      private Reader zzReader
      Input device.
    • zzState

      private int zzState
      Current state of the DFA.
    • zzLexicalState

      private int zzLexicalState
      Current lexical state.
    • zzBuffer

      private char[] zzBuffer
      This buffer contains the current text to be matched and is the source of the yytext() string.
    • zzMarkedPos

      private int zzMarkedPos
      Text position at the last accepting state.
    • zzCurrentPos

      private int zzCurrentPos
      Current text position in the buffer.
    • zzStartRead

      private int zzStartRead
      Marks the beginning of the yytext() string in the buffer.
    • zzEndRead

      private int zzEndRead
      Marks the last character in the buffer, that has been read from input.
    • zzAtEOF

      private boolean zzAtEOF
      Whether the scanner is at the end of file.
      See Also:
    • zzFinalHighSurrogate

      private int zzFinalHighSurrogate
      The number of occupied positions in zzBuffer beyond zzEndRead.

      When a lead/high surrogate has been read from the input stream into the final zzBuffer position, this will have a value of 1; otherwise, it will have a value of 0.

    • yyline

      private int yyline
      Number of newlines encountered up to the start of the matched text.
    • yycolumn

      private int yycolumn
      Number of characters from the last newline up to the start of the matched text.
    • yychar

      private long yychar
      Number of characters up to the start of the matched text.
    • zzAtBOL

      private boolean zzAtBOL
      Whether the scanner is currently at the beginning of a line.
    • zzEOFDone

      private boolean zzEOFDone
      Whether the user-EOF-code has already been executed.
    • ALPHANUM

      public static final int ALPHANUM
      See Also:
    • APOSTROPHE

      public static final int APOSTROPHE
      See Also:
    • ACRONYM

      public static final int ACRONYM
      See Also:
    • COMPANY

      public static final int COMPANY
      See Also:
    • EMAIL

      public static final int EMAIL
      See Also:
    • HOST

      public static final int HOST
      See Also:
    • NUM

      public static final int NUM
      See Also:
    • CJ

      public static final int CJ
      See Also:
    • ACRONYM_DEP

      public static final int ACRONYM_DEP
      See Also:
    • TOKEN_TYPES

      public static final String[] TOKEN_TYPES
  • Constructor Details

    • ClassicTokenizerImpl

      ClassicTokenizerImpl(Reader in)
      Creates a new scanner
      Parameters:
      in - the java.io.Reader to read input from.
  • Method Details

    • zzUnpackcmap_top

      private static int[] zzUnpackcmap_top()
    • zzUnpackcmap_top

      private static int zzUnpackcmap_top(String packed, int offset, int[] result)
    • zzUnpackcmap_blocks

      private static int[] zzUnpackcmap_blocks()
    • zzUnpackcmap_blocks

      private static int zzUnpackcmap_blocks(String packed, int offset, int[] result)
    • zzUnpackAction

      private static int[] zzUnpackAction()
    • zzUnpackAction

      private static int zzUnpackAction(String packed, int offset, int[] result)
    • zzUnpackRowMap

      private static int[] zzUnpackRowMap()
    • zzUnpackRowMap

      private static int zzUnpackRowMap(String packed, int offset, int[] result)
    • zzUnpackTrans

      private static int[] zzUnpackTrans()
    • zzUnpackTrans

      private static int zzUnpackTrans(String packed, int offset, int[] result)
    • zzUnpackAttribute

      private static int[] zzUnpackAttribute()
    • zzUnpackAttribute

      private static int zzUnpackAttribute(String packed, int offset, int[] result)
    • yychar

      public final int yychar()
    • getText

      public final void getText(CharTermAttribute t)
      Fills CharTermAttribute with the current token text.
    • setBufferSize

      public final void setBufferSize(int numChars)
    • zzCMap

      private static int zzCMap(int input)
      Translates raw input code points to DFA table row
    • zzRefill

      private boolean zzRefill() throws IOException
      Refills the input buffer.
      Returns:
      false iff there was new input.
      Throws:
      IOException - if any I/O-Error occurs
    • yyclose

      public final void yyclose() throws IOException
      Closes the input reader.
      Throws:
      IOException - if the reader could not be closed.
    • yyreset

      public final void yyreset(Reader reader)
      Resets the scanner to read from a new input stream.

      Does not close the old reader.

      All internal variables are reset, the old input stream cannot be reused (internal buffer is discarded and lost). Lexical state is set to ZZ_INITIAL.

      Internal scan buffer is resized down to its initial length, if it has grown.

      Parameters:
      reader - The new input stream.
    • yyResetPosition

      private final void yyResetPosition()
      Resets the input position.
    • yyatEOF

      public final boolean yyatEOF()
      Returns whether the scanner has reached the end of the reader it reads from.
      Returns:
      whether the scanner has reached EOF.
    • yystate

      public final int yystate()
      Returns the current lexical state.
      Returns:
      the current lexical state.
    • yybegin

      public final void yybegin(int newState)
      Enters a new lexical state.
      Parameters:
      newState - the new lexical state
    • yytext

      public final String yytext()
      Returns the text matched by the current regular expression.
      Returns:
      the matched text.
    • yycharat

      public final char yycharat(int position)
      Returns the character at the given position from the matched text.

      It is equivalent to yytext().charAt(pos), but faster.

      Parameters:
      position - the position of the character to fetch. A value from 0 to yylength()-1.
      Returns:
      the character at position.
    • yylength

      public final int yylength()
      How many characters were matched.
      Returns:
      the length of the matched text region.
    • zzScanError

      private static void zzScanError(int errorCode)
      Reports an error that occurred while scanning.

      In a well-formed scanner (no or only correct usage of yypushback(int) and a match-all fallback rule) this method will only be called with things that "Can't Possibly Happen".

      If this method is called, something is seriously wrong (e.g. a JFlex bug producing a faulty scanner etc.).

      Usual syntax/scanner level error handling should be done in error fallback rules.

      Parameters:
      errorCode - the code of the error message to display.
    • yypushback

      public void yypushback(int number)
      Pushes the specified amount of characters back into the input stream.

      They will be read again by then next call of the scanning method.

      Parameters:
      number - the number of characters to be read again. This number must not be greater than yylength().
    • getNextToken

      public int getNextToken() throws IOException
      Resumes scanning until the next regular expression is matched, the end of input is encountered or an I/O-Error occurs.
      Returns:
      the next token.
      Throws:
      IOException - if any I/O-Error occurs.