Class GraphTokenStreamFiniteStrings

java.lang.Object
org.apache.lucene.util.graph.GraphTokenStreamFiniteStrings

public final class GraphTokenStreamFiniteStrings extends Object
Consumes a TokenStream and creates an Automaton where the transition labels are terms from the TermToBytesRefAttribute. This class also provides helpers to explore the different paths of the Automaton.
  • Field Details

    • MAX_RECURSION_LEVEL

      private static final int MAX_RECURSION_LEVEL
      Maximum level of recursion allowed in recursive operations.
      See Also:
    • tokens

      private AttributeSource[] tokens
    • det

      private final Automaton det
    • transition

      private final Transition transition
  • Constructor Details

  • Method Details

    • hasSidePath

      public boolean hasSidePath(int state)
      Returns whether the provided state is the start of multiple side paths of different length (eg: new york, ny)
    • getTerms

      public List<AttributeSource> getTerms(int state)
      Returns the list of tokens that start at the provided state
    • getTerms

      public Term[] getTerms(String field, int state)
      Returns the list of terms that start at the provided state
    • getFiniteStrings

      public Iterator<TokenStream> getFiniteStrings() throws IOException
      Get all finite strings from the automaton.
      Throws:
      IOException
    • getFiniteStrings

      public Iterator<TokenStream> getFiniteStrings(int startState, int endState)
      Get all finite strings that start at startState and end at endState.
    • articulationPoints

      public int[] articulationPoints()
      Returns the articulation points (or cut vertices) of the graph: https://en.wikipedia.org/wiki/Biconnected_component
    • build

      private Automaton build(TokenStream in) throws IOException
      Build an automaton from the provided TokenStream.
      Throws:
      IOException
    • articulationPointsRecurse

      private static void articulationPointsRecurse(Automaton a, int state, int d, int[] depth, int[] low, int[] parent, BitSet visited, List<Integer> points)