Class BayesianScoreEstimator

java.lang.Object
org.apache.lucene.search.BayesianScoreEstimator

public class BayesianScoreEstimator extends Object
Estimates BayesianScoreQuery parameters (alpha, beta, base rate) from corpus statistics via pseudo-query sampling.

The estimation algorithm:

  1. Reservoir-sample terms from the target field's indexed vocabulary
  2. Partition the sampled terms into pseudo-queries
  3. Run each pseudo-query via BM25 and collect the score distribution
  4. Estimate: beta = median(scores), alpha = 1 / std(scores)
  5. Estimate base rate: mean fraction of documents scoring above the 95th percentile
WARNING: This API is experimental and might change in incompatible ways in the next release.
  • Method Details

    • estimate

      public static BayesianScoreEstimator.Parameters estimate(IndexSearcher searcher, String field, int nSamples, int tokensPerQuery, long seed) throws IOException
      Estimates BayesianScoreQuery parameters from the given index.
      Parameters:
      searcher - the index searcher to sample from
      field - the indexed text field to create pseudo-queries for
      nSamples - number of pseudo-queries to sample (default 50)
      tokensPerQuery - number of indexed terms per pseudo-query (default 5)
      seed - random seed for reproducible sampling
      Returns:
      estimated alpha, beta, and base rate
      Throws:
      IOException - if an I/O error occurs reading the index
    • estimate

      public static BayesianScoreEstimator.Parameters estimate(IndexSearcher searcher, String field) throws IOException
      Estimates parameters with default settings (50 samples, 5 tokens per query, seed 42).
      Parameters:
      searcher - the index searcher
      field - the text field
      Returns:
      estimated parameters
      Throws:
      IOException - if an I/O error occurs