Package nltk_lite :: Module probability :: Class FreqDist
[hide private]
[frames] | no frames]

Class FreqDist

source code

object --+
         |
        FreqDist

A frequency distribution for the outcomes of an experiment. A frequency distribution records the number of times each outcome of an experiment has occured. For example, a frequency distribution could be used to record the frequency of each word type in a document. Formally, a frequency distribution can be defined as a function mapping from each sample to the number of times that sample occured as an outcome.

Frequency distributions are generally constructed by running a number of experiments, and incrementing the count for a sample every time it is an outcome of an experiment. For example, the following code will produce a frequency distribution that encodes how often each word occurs in a text:

>>> fdist = FreqDist()
>>> for word in tokenize.whitespace(sent):
...    fdist.inc(word)
Instance Methods [hide private]
 
__init__(self)
Construct a new empty, FreqDist.
source code
None
inc(self, sample, count=1)
Increment this FreqDist's count for the given sample.
source code
int
N(self)
Returns: The total number of sample outcomes that have been recorded by this FreqDist.
source code
int
B(self)
Returns: The total number of sample values (or bins) that have counts greater than zero.
source code
list
samples(self)
Returns: A list of all samples that have been recorded as outcomes by this frequency distribution.
source code
int
Nr(self, r, bins=None)
Returns: The number of samples with count r.
source code
 
_cache_Nr_values(self) source code
int
count(self, sample)
Return the count of a given sample.
source code
float
freq(self, sample)
Return the frequency of a given sample.
source code
any or None
max(self)
Return the sample with the greatest number of outcomes in this frequency distribution.
source code
sequence of any
sorted_samples(self)
Return the samples sorted in decreasing order of frequency.
source code
string
__repr__(self)
Returns: A string representation of this FreqDist.
source code
string
__str__(self)
Returns: A string representation of this FreqDist.
source code
boolean
__contains__(self, sample)
Returns: True if the given sample occurs one or more times in this frequency distribution.
source code

Inherited from object: __delattr__, __getattribute__, __hash__, __new__, __reduce__, __reduce_ex__, __setattr__

Properties [hide private]

Inherited from object: __class__

Method Details [hide private]

__init__(self)
(Constructor)

source code 

Construct a new empty, FreqDist. In particular, the count for every sample is zero.

Overrides: object.__init__

inc(self, sample, count=1)

source code 

Increment this FreqDist's count for the given sample.

Parameters:
  • sample (any) - The sample whose count should be incremented.
  • count (int) - The amount to increment the sample's count by.
Returns: None
Raises:
  • NotImplementedError - If sample is not a supported sample type.

N(self)

source code 
Returns: int
The total number of sample outcomes that have been recorded by this FreqDist. For the number of unique sample values (or bins) with counts greater than zero, use FreqDist.B().

B(self)

source code 
Returns: int
The total number of sample values (or bins) that have counts greater than zero. For the total number of sample outcomes recorded, use FreqDist.N().

samples(self)

source code 
Returns: list
A list of all samples that have been recorded as outcomes by this frequency distribution. Use count() to determine the count for each sample.

Nr(self, r, bins=None)

source code 
Parameters:
  • r (int) - A sample count.
  • bins (int) - The number of possible sample outcomes. bins is used to calculate Nr(0). In particular, Nr(0) is bins-self.B(). If bins is not specified, it defaults to self.B() (so Nr(0) will be 0).
Returns: int
The number of samples with count r.

count(self, sample)

source code 

Return the count of a given sample. The count of a sample is defined as the number of times that sample outcome was recorded by this FreqDist. Counts are non-negative integers.

Parameters:
  • sample (any.) - the sample whose count should be returned.
Returns: int
The count of a given sample.

freq(self, sample)

source code 

Return the frequency of a given sample. The frequency of a sample is defined as the count of that sample divided by the total number of sample outcomes that have been recorded by this FreqDist. The count of a sample is defined as the number of times that sample outcome was recorded by this FreqDist. Frequencies are always real numbers in the range [0, 1].

Parameters:
  • sample (any) - the sample whose frequency should be returned.
Returns: float
The frequency of a given sample.

max(self)

source code 

Return the sample with the greatest number of outcomes in this frequency distribution. If two or more samples have the same number of outcomes, return one of them; which sample is returned is undefined. If no outcomes have occured in this frequency distribution, return None.

Returns: any or None
The sample with the maximum number of outcomes in this frequency distribution.

sorted_samples(self)

source code 

Return the samples sorted in decreasing order of frequency. Instances with the same count will be arbitrarily ordered. Instances with a count of zero will be omitted. This method is O(N^2), where N is the number of samples, but will complete in a shorter time on average.

Returns: sequence of any
The set of samples in sorted order.

__repr__(self)
(Representation operator)

source code 

repr(x)

Returns: string
A string representation of this FreqDist.
Overrides: object.__repr__

__str__(self)
(Informal representation operator)

source code 

str(x)

Returns: string
A string representation of this FreqDist.
Overrides: object.__str__

__contains__(self, sample)
(In operator)

source code 
Parameters:
  • sample (any) - The sample to search for.
Returns: boolean
True if the given sample occurs one or more times in this frequency distribution.