Vector-comparison {S4Vectors} | R Documentation |
Compare, order, tabulate vector-like objects
Description
Generic functions and methods for comparing, ordering, and tabulating vector-like objects.
Usage
## Element-wise (aka "parallel") comparison of 2 Vector objects
## ------------------------------------------------------------
pcompare(x, y)
## S4 method for signature 'Vector,Vector'
e1 == e2
## S4 method for signature 'Vector,ANY'
e1 == e2
## S4 method for signature 'ANY,Vector'
e1 == e2
## S4 method for signature 'Vector,Vector'
e1 <= e2
## S4 method for signature 'Vector,ANY'
e1 <= e2
## S4 method for signature 'ANY,Vector'
e1 <= e2
## S4 method for signature 'Vector,Vector'
e1 != e2
## S4 method for signature 'Vector,ANY'
e1 != e2
## S4 method for signature 'ANY,Vector'
e1 != e2
## S4 method for signature 'Vector,Vector'
e1 >= e2
## S4 method for signature 'Vector,ANY'
e1 >= e2
## S4 method for signature 'ANY,Vector'
e1 >= e2
## S4 method for signature 'Vector,Vector'
e1 < e2
## S4 method for signature 'Vector,ANY'
e1 < e2
## S4 method for signature 'ANY,Vector'
e1 < e2
## S4 method for signature 'Vector,Vector'
e1 > e2
## S4 method for signature 'Vector,ANY'
e1 > e2
## S4 method for signature 'ANY,Vector'
e1 > e2
## sameAsPreviousROW()
## -------------------
sameAsPreviousROW(x)
## match()
## -------
## S4 method for signature 'Vector,Vector'
match(x, table, nomatch = NA_integer_,
incomparables = NULL, ...)
## selfmatch()
## -----------
selfmatch(x, ...)
## duplicated() & unique()
## -----------------------
## S4 method for signature 'Vector'
duplicated(x, incomparables=FALSE, ...)
## S4 method for signature 'Vector'
unique(x, incomparables=FALSE, ...)
## %in%
## ----
## S4 method for signature 'Vector,Vector'
x %in% table
## S4 method for signature 'Vector,ANY'
x %in% table
## S4 method for signature 'ANY,Vector'
x %in% table
## findMatches() & countMatches()
## ------------------------------
findMatches(x, table, select=c("all", "first", "last"), ...)
countMatches(x, table, ...)
## sort()
## ------
## S4 method for signature 'Vector'
sort(x, decreasing=FALSE, na.last=NA, by)
## rank()
## ------
## S4 method for signature 'Vector'
rank(x, na.last = TRUE, ties.method = c("average",
"first", "last", "random", "max", "min"), by)
## xtfrm()
## -------
## S4 method for signature 'Vector'
xtfrm(x)
## table()
## -------
## S4 method for signature 'Vector'
table(...)
Arguments
x , y , e1 , e2 , table |
Vector-like objects. |
nomatch |
See |
incomparables |
The The See The |
select |
Only |
ties.method |
See |
decreasing , na.last |
See |
by |
A formula referencing the metadata columns by which to sort,
e.g., |
... |
A Vector object for Otherwise, extra arguments supported by specific methods. In particular:
|
Details
Doing pcompare(x, y)
on 2 vector-like objects x
and y
of length 1 must return an integer less than, equal to, or greater than zero
if the single element in x
is considered to be respectively less than,
equal to, or greater than the single element in y
.
If x
or y
have a length != 1, then they are typically expected
to have the same length so pcompare(x, y)
can operate element-wise,
that is, in that case it returns an integer vector of the same length
as x
and y
where the i-th element is the result of compairing
x[i]
and y[i]
. If x
and y
don't have the same
length and are not zero-length vectors, then the shortest is first
recycled to the length of the longest. If one of them is a zero-length
vector then pcompare(x, y)
returns a zero-length integer vector.
selfmatch(x, ...)
is equivalent to match(x, x, ...)
. This
is actually how the default ANY
method is implemented. However note
that the default selfmatch(x, ...)
for Vector x
will
typically be more efficient than match(x, x, ...)
, and can be made
even more so if a specific selfmatch
method is implemented for a
given subclass.
findMatches
is an enhanced version of match
which, by default
(i.e. if select="all"
), returns all the matches in a Hits
object.
countMatches
returns an integer vector of the length of x
containing the number of matches in table
for each element
in x
.
Value
For pcompare
: see Details section above.
For sameAsPreviousROW
: a logical vector of length equal to x
,
indicating whether each entry of x
is equal to the previous entry.
The first entry is always FALSE
for a non-zero-length x
.
For match
and selfmatch
: an integer vector of the
same length as x
.
For duplicated
, unique
, and %in%
: see
?BiocGenerics::duplicated
,
?BiocGenerics::unique
,
and ?`%in%`
.
For findMatches
: a Hits object by default (i.e. if
select="all"
).
For countMatches
: an integer vector of the length of x
containing the number of matches in table
for each element
in x
.
For sort
: see ?BiocGenerics::sort
.
For xtfrm
: see ?base::xtfrm
.
For table
: a 1D array of integer values promoted to the
"table"
class. See ?BiocGeneric::table
for more information.
Note
The following notes are for developers who want to implement comparing, ordering, and tabulating methods for their own Vector subclass.
Subclass comparison methods can be split into various categories. The first category must be implemented for each subclass, as these do not have sensible defaults for arbitrary Vector objects:
The S4Vectors package provides no
order
method for Vector objects. So callingorder
on a Vector derivative for which no specificorder
method is defined will usebase::order
, which callsxtfrm
, with in turn callsorder
, which callsxtfrm
, and so on. This infinite recursion of S4 dispatch eventually results in an error about reaching the stack limit.To avoid this behavior, a specialized
order
method needs to be implemented for specific Vector subclasses (e.g. for Hits and IntegerRanges objects).-
sameAsPreviousROW
is default implemented on top of the==
method, so will work out-of-the-box on Vector objects for which==
works as expected. However,==
is default implemented on top ofpcompare
, which itself has a default implementation that relies onsameAsPreviousROW
! This again leads to infinite recursion and an error about the stack limit.To avoid this behavior, a specialized
sameAsPreviousROW
method must be implemented for specific Vector subclasses.
The second category contains methods that have default implementations provided for all Vector objects and their derivatives. These methods rely on the first category to provide sensible default behaviour without further work from the developer. However, it is often the case that greater efficiency can be achieved for a specific data structure by writing a subclass-specific version of these methods.
The
pcompare
method for Vector objects is implemented on top oforder
andsameAsPreviousROW
, and so will work out-of-the-box on Vector derivatives for whichorder
andsameAsPreviousROW
work as expected.The
xtfrm
method for Vector objects is also implemented on top oforder
andsameAsPreviousROW
, and so will also work out-of-the-box on Vector derivatives for whichorder
andsameAsPreviousROW
work as expected.-
selfmatch
is itself implemented on top ofxtfrm
(indirectly, viagrouping
) so it will work out-of-the-box on Vector objects for whichxtfrm
works as expected. The
match
method for Vector objects is implemented on top ofselfmatch
, so works out-of-the-box on Vector objects for whichselfmatch
works as expected.
(A careful reader may notice that xtfrm
and order
could be
swapped between categories to achieve the same effect. Similarly,
sameAsPreviousROW
and pcompare
could also be swapped. The exact
categorization of these methods is left to the discretion of the developer,
though this is mostly academic if both choices are specialized.)
The third category also contains methods that have default implementations, but unlike the second category, these defaults are straightforward and generally do not require any specialization for efficiency purposes.
The 6 traditional binary comparison operators are:
==
,!=
,<=
,>=
,<
, and>
. The S4Vectors package provides the following methods for these operators:setMethod("==", c("Vector", "Vector"), function(e1, e2) { pcompare(e1, e2) == 0L } ) setMethod("<=", c("Vector", "Vector"), function(e1, e2) { pcompare(e1, e2) <= 0L } ) setMethod("!=", c("Vector", "Vector"), function(e1, e2) { !(e1 == e2) } ) setMethod(">=", c("Vector", "Vector"), function(e1, e2) { e2 <= e1 } ) setMethod("<", c("Vector", "Vector"), function(e1, e2) { !(e2 <= e1) } ) setMethod(">", c("Vector", "Vector"), function(e1, e2) { !(e1 <= e2) } )
With these definitions, the 6 binary operators work out-of-the-box on Vector objects for which
pcompare
works the expected way. Ifpcompare
is not implemented, then it's enough to implement==
and<=
methods to have the 4 remaining operators (!=
,>=
,<
, and>
) work out-of-the-box.The
duplicated
,unique
, and%in%
methods for Vector objects are implemented on top ofselfmatch
,duplicated
, andmatch
, respectively, so they work out-of-the-box on Vector objects for whichselfmatch
,duplicated
, andmatch
work the expected way.Also the default
findMatches
andcountMatches
methods are implemented on top ofmatch
andselfmatch
so they work out-of-the-box on Vector objects for which those things work the expected way.The
sort
method for Vector objects is implemented on top oforder
, so it works out-of-the-box on Vector objects for whichorder
works the expected way.The
table
method for Vector objects is implemented on top ofselfmatch
,order
, andas.character
, so it works out-of-the-box on a Vector object for which those things work the expected way.
Author(s)
Hervé Pagès, with contributions from Aaron Lun
See Also
The Vector class.
-
Hits-comparison for comparing and ordering hits.
-
Vector-setops for set operations on vector-like objects.
-
Vector-merge for merging vector-like objects.
-
IntegerRanges-comparison in the IRanges package for comparing and ordering ranges.
-
==
and%in%
in the base package, andBiocGenerics::match
,BiocGenerics::duplicated
,BiocGenerics::unique
,BiocGenerics::order
,BiocGenerics::sort
,BiocGenerics::rank
in the BiocGenerics package for general information about the comparison/ordering operators and functions. The Hits class.
-
BiocGeneric::table
in the BiocGenerics package.
Examples
## ---------------------------------------------------------------------
## A. SIMPLE EXAMPLES
## ---------------------------------------------------------------------
y <- c(16L, -3L, -2L, 15L, 15L, 0L, 8L, 15L, -2L)
selfmatch(y)
x <- c(unique(y), 999L)
findMatches(x, y)
countMatches(x, y)
## See ?`IntegerRanges-comparison` for more examples (on IntegerRanges
## objects). You might need to load the IRanges package first.
## ---------------------------------------------------------------------
## B. FOR DEVELOPERS: HOW TO IMPLEMENT THE BINARY COMPARISON OPERATORS
## FOR YOUR Vector SUBCLASS
## ---------------------------------------------------------------------
## The answer is: don't implement them. Just implement pcompare() and the
## binary comparison operators will work out-of-the-box. Here is an
## example:
## (1) Implement a simple Vector subclass.
setClass("Raw", contains="Vector", representation(data="raw"))
setMethod("length", "Raw", function(x) length(x@data))
setMethod("[", "Raw",
function(x, i, j, ..., drop) { x@data <- x@data[i]; x }
)
x <- new("Raw", data=charToRaw("AB.x0a-BAA+C"))
stopifnot(identical(length(x), 12L))
stopifnot(identical(x[7:3], new("Raw", data=charToRaw("-a0x."))))
## (2) Implement a "pcompare" method for Raw objects.
setMethod("pcompare", c("Raw", "Raw"),
function(x, y) {as.integer(x@data) - as.integer(y@data)}
)
stopifnot(identical(which(x == x[1]), c(1L, 9L, 10L)))
stopifnot(identical(x[x < x[5]], new("Raw", data=charToRaw(".-+"))))