CRAN Task View: Design of Experiments (DoE) & Analysis of Experimental Data

Maintainer:Ulrike Groemping
Contact:groemping at bht-berlin.de
Version:2014-03-21

This task view collects information on R packages for experimental design and analysis of data from experiments. Please feel free to suggest enhancements, and please send information on new packages or major package updates if you think they belong here. Contact details are given on my Web page .

Experimental design is applied in many areas, and methods have been tailored to the needs of various fields. This task view starts out with a section on the most general packages, continues with specific sections on agricultural and industrial experimentation, computer experiments, and experimentation in the clinical trials contexts, and closes with a section on various special experimental design packages that have been developed for other specific purposes. Of course, the division into fields is not always clear-cut, and some packages from the more specialized sections can also be applied in general contexts.
You may also notice that my own experience is mainly from industrial experimentation (in a broad sense), which may explain a somewhat biased view on things.

Experimental designs for general purposes

There are a few packages for creating and analyzing experimental designs for general purposes: First of all, the standard (generalized) linear model functions in the base package stats are of course very important for analyzing data from designed experiments (especially functions lm(), aov() and the methods and functions for the resulting linear model objects). These are concisely explained in Kuhnert and Venables (2005, p. 109 ff.); Vikneswaran (2005) points out specific usages for experimental design (using function contrasts(), multiple comparison functions and some convenience functions like model.tables(), replications() and plot.design()). Lalanne (2009) provides an R companion to the well-known book by Montgomery (2005); he so far covers the first few chapters only and (understandably!) does not keep pace with the fast development of R regarding experimental design facilities. GAD handles general balanced analysis of variance models with fixed and/or random effects and also nested effects (the latter can only be random); they quote Underwood 1997 for this work. The package is quite valuable, as many users have difficulties with using the R packages for handling random or mixed effects. granova offers some interesting non-standard graphical representations for results of simply-structured experiments (one-way and two-way layouts, paired data).

Experimental designs for agricultural and plant breeding experiments

agricolae offers extensive functionality on experimental design especially for agricultural and plant breeding experiments, which can also be useful for other purposes. It supports planning of lattice designs, factorial designs, randomized complete block designs, completely randomized designs, (Graeco-)Latin square designs, balanced incomplete block designs and alpha designs. There are also various analysis facilities for experimental data, e.g. treatment comparison procedures and several non-parametric tests, but also some quite specialized possibilities for specific types of experiments.

Experimental designs for industrial experiments

Some further packages especially handle designs for industrial experiments that are often highly fractionated, intentionally confounded and have few extra degrees of freedom for error.

Fractional factorial 2-level designs are particularly important in industrial experimentation.

Apart from tools for planning and analysing factorial designs, R also offers support for response surface optimization for quantitative factors (cf. e.g. Myers and Montgomery 1995):

In some industries, mixtures of ingredients are important; these require special designs, because the quantitative factors have a fixed total. Mixture designs are handled by packages AlgDesign (function gen.mixture, lattice designs), qualityTools (function mixDesign, lattice designs and simplex centroid designs), and mixexp (several small functions for simplex centroid, simplex lattice and extreme vertices designs as well as for plotting).

Occasionally, supersaturated designs can be useful. The two small packages mkssd and mxkssd provide fixed level and mixed level k-circulant supersaturated designs.

Experimental designs for computer experiments

Computer experiments with quantitative factors require special types of experimental designs: it is often possible to include many different levels of the factors, and replication will usually not be beneficial. Also, the experimental region is often too large to assume that a linear or quadratic model adequately represents the phenomenon under investigation. Consequently, it is desirable to fill the experimental space with points as well as possible (space-filling designs) in such a way that each run provides additional information even if some factors turn out to be irrelevant. The lhs package provides latin hypercube designs for this purpose. Furthermore, the package provides ways to analyse such computer experiments with emphasis on what follow-up experiments to conduct. Another package with similar orientation is the DiceDesign package, which adds further ways to construct space-filling designs and some measures to assess the quality of designs for computer experiments. The package DiceKriging provides the kriging methodology which is often used for creating meta models from computer experiments, the package DiceEval creates and evaluates meta models (among others Kriging ones), and the package DiceView provides facilities for viewing sections of multidimensional meta models.

Package tgp is another package dedicated to planning and analysing computer experiments. Here, emphasis is on Bayesian methods. The package can for example be used with various kinds of (surrogate) models for sequential optimization, e.g. with an expected improvement criterion for optimizing a noisy blackbox target function. Packages plgp and dynaTree enhance the functionality offered by tgp with particle learning facilities and learning for dynamic regression trees.

Package BatchExperiments is also designed for computer experiments, in this case specifically for experiments with algorithms to be run under different scenarios. The package is described in a technical report by Bischl et al. (2012).

Experimental designs for clinical trials

This task view only covers specific design of experiments packages; there may be some grey areas. Please, also consult the ClinicalTrials task view.

Experimental designs for special purposes

Various further packages handle special situations in experimental design:

Key references for packages in this task view

CRAN packages:

Related links: