codec.base module

This module contains base classes/interfaces for “codec” objects.

Classes

class whoosh.codec.base.Codec
class whoosh.codec.base.PerDocumentWriter
class whoosh.codec.base.FieldWriter
class whoosh.codec.base.TermsReader
class whoosh.codec.base.VectorReader
class whoosh.codec.base.LengthsReader
class whoosh.codec.base.MultiLengths(lengths, offset=0)
class whoosh.codec.base.StoredFieldsReader
class whoosh.codec.base.Segment

Do not instantiate this object directly. It is used by the Index object to hold information about a segment. A list of objects of this class are pickled as part of the TOC file.

The TOC file stores a minimal amount of information – mostly a list of Segment objects. Segments are the real reverse indexes. Having multiple segments allows quick incremental indexing: just create a new segment for the new documents, and have the index overlay the new segment over previous ones for purposes of reading/search. “Optimizing” the index combines the contents of existing segments into one (removing any deleted documents along the way).

create_file(storage, ext, **kwargs)

Convenience method to create a new file in the given storage named with this segment’s ID and the given extension. Any keyword arguments are passed to the storage’s create_file method.

delete_document(docnum, delete=True)

Deletes the given document number. The document is not actually removed from the index until it is optimized.

Parameters:
  • docnum – The document number to delete.
  • delete – If False, this undeletes a deleted document.
deleted_count()
Returns:the total number of deleted documents in this segment.
doc_count()
Returns:the number of (undeleted) documents in this segment.
doc_count_all()

Returns the total number of documents, DELETED OR UNDELETED, in this segment.

has_deletions()
Returns:True if any documents in this segment are deleted.
is_deleted(docnum)
Returns:True if the given document number is deleted.
open_file(storage, ext, **kwargs)

Convenience method to open a file in the given storage named with this segment’s ID and the given extension. Any keyword arguments are passed to the storage’s open_file method.

Table Of Contents

Previous topic

analysis module

Next topic

collectors module

This Page