ddply {plyr}R Documentation

Split data frame, apply function, and return results in a data frame.

Description

For each subset of a data frame, apply function then combine results into a data frame.

Usage

  ddply(.data, .variables, .fun = NULL, ...,
    .progress = "none", .drop = TRUE, .parallel = FALSE)

Arguments

.fun

function to apply to each piece

...

other arguments passed on to .fun

.progress

name of the progress bar to use, see create_progress_bar

.data

data frame to be processed

.variables

variables to split data frame by, as quoted variables, a formula or character vector

.drop

should combinations of variables that do not appear in the input data be preserved (FALSE) or dropped (TRUE, default)

.parallel

if TRUE, apply function in parallel, using parallel backend provided by foreach

Value

A data frame, as described in the output section.

Input

This function splits data frames by variables.

Output

The most unambiguous behaviour is achieved when .fun returns a data frame - in that case pieces will be combined with rbind.fill. If .fun returns an atomic vector of fixed length, it will be rbinded together and converted to a data frame. Any other values will result in an error.

If there are no results, then this function will return a data frame with zero rows and columns (data.frame()).

References

Hadley Wickham (2011). The Split-Apply-Combine Strategy for Data Analysis. Journal of Statistical Software, 40(1), 1-29. http://www.jstatsoft.org/v40/i01/.

See Also

Other data frame input: daply, dlply

Other data frame output: adply, ldply

Examples

ddply(baseball, .(year), "nrow")
ddply(baseball, .(lg), c("nrow", "ncol"))

rbi <- ddply(baseball, .(year), summarise,
  mean_rbi = mean(rbi, na.rm = TRUE))
with(rbi, plot(year, mean_rbi, type="l"))

base2 <- ddply(baseball, .(id), transform,
 career_year = year - min(year) + 1
)

[Package plyr version 1.7.1 Index]