random.transactions {arules} | R Documentation |
Simulates a random transactions
object using different
methods.
random.transactions(nItems, nTrans, method = "independent", ..., verbose = FALSE)
nItems |
an integer. Number of items. |
nTrans |
an integer. Number of transactions. |
method |
name of the simulation method used (default: all items occur independently). |
... |
further arguments used for the specific simulation method (see details). |
verbose |
report progress. |
The function generates a nitems
times ntrans
transaction database.
Currently two simulation methods are implemented:
"independent"
(see Hahsler et al., 2005)All items are treated as independent.
Each transaction is the result of nItems
independent
Bernoulli trials, one for each item with success
probabilities given by the numeric
vector iProb
of length nItems
(default: 0.01 for each item).
"agrawal"
(see Agrawal and Srikant, 1994)This method creates transactions with correlated items uses the following additional parameters:
average length of transactions.
number of patterns (potential maximal frequent itemsets) used.
average length of patterns.
correlation between consecutive patterns.
mean of the corruption level (normal distr.).
variance of the corruption level.
The simulation is a two-stage process. First, a set of
nPats
patterns
(potential maximal frequent itemsets) is generated.
The length of the patterns is Poisson distributed with mean
lPats
and consecutive patterns share some items controlled by
the correlation parameter corr
.
For later use, for each pattern a pattern weight is
generated by drawing
from an exponential distribution with a mean of 1 and
a corruption level is chosen from a normal distribution
with mean cmean
and variance cvar
.
The patterns are created using the following function:
random.patterns(nItems, nPats = 2000, method = "agrawal", lPats = 4, corr = 0.5, cmean = 0.5, cvar = 0.1, iWeight = NULL, verbose = FALSE)
The function returns the patterns as an itemsets
objects which can
be supplied to random.transactions
as the argument patterns
.
If no argument patterns
is supplied, the default values given above
are used.
In the second step, the transactions are generated using the patterns.
The length the transactions follows a Poisson
distribution with mean lPats
. For each transaction, patterns are
randomly chosen using the pattern weights till the transaction length
is reached. For each chosen
pattern, the associated corruption level is used to drop some
items before adding the pattern to the transaction.
Returns an object of class
transactions
.
Michael Hahsler, Kurt Hornik, and Thomas Reutterer (2006). Implications of probabilistic data modeling for mining association rules. In M. Spiliopoulou, R. Kruse, C. Borgelt, A. Nuernberger, and W. Gaul, editors, From Data and Information Analysis to Knowledge Engineering, Studies in Classification, Data Analysis, and Knowledge Organization, pages 598–605. Springer-Verlag.
Rakesh Agrawal and Ramakrishnan Srikant (1994). Fast algorithms for mining association rules in large databases. In Jorge B. Bocca, Matthias Jarke, and Carlo Zaniolo, editors, Proceedings of the 20th International Conference on Very Large Data Bases, VLDB, pages 487–499, Santiago, Chile.
## generate random 1000 transactions for 200 items with ## a success probability decreasing from 0.2 to 0.0001 ## using the method described in Hahsler et al. (2006). trans <- random.transactions(nItems = 200, nTrans = 1000, iProb = seq(0.2,0.0001, length=200)) ## display random data set image(trans) ## use the method by Agrawal and Srikant (1994) to simulate transactions ## which contains correlated items. This should create data similar to ## T10I4D100K (just only 1000 transactions) patterns <- random.patterns(nItems = 1000) summary(patterns) trans2 <- random.transactions(nItems = 1000, nTrans = 1000, method = "agrawal", patterns = patterns) image(trans2) ## plot data with items ordered by item frequency image(trans2[,order(itemFrequency(trans2), decreasing=TRUE)])