CRAN Task View: Analysis of Ecological and Environmental Data

Maintainer:Gavin Simpson
Contact:ucfagls at gmail.com
Version:2014-03-07

Introduction

This Task View contains information about using R to analyse ecological and environmental data.

The base version of R ships with a wide range of functions for use within the field of environmetrics. This functionality is complemented by a plethora of packages available via CRAN, which provide specialist methods such as ordination & cluster analysis techniques. A brief overview of the available packages is provided in this Task View, grouped by topic or type of analysis. As a testament to the popularity of R for the analysis of environmental and ecological data, a special volume of the Journal of Statistical Software was produced in 2007.

Those useRs interested in environmetrics should consult the Spatial view. Complementary information is also available in the Multivariate, Phylogenetics, Cluster, and SpatioTemporal task views.

If you have any comments or suggestions for additions or improvements, then please contact the maintainer .

A list of available packages and functions is presented below, grouped by analysis type.

General packages

These packages are general, having wide applicability to the environmetrics field.

Modelling species responses and other data

Analysing species response curves or modeling other data often involves the fitting of standard statistical models to ecological data and includes simple (multiple) regression, Generalised Linear Models (GLM), extended regression (e.g. Generalised Least Squares [GLS]), Generalised Additive Models (GAM), and mixed effects models, amongst others.

Tree-based models

Tree-based models are being increasingly used in ecology, particularly for their ability to fit flexible models to complex data sets and the simple, intuitive output of the tree structure. Ensemble methods such as bagging, boosting and random forests are advocated for improving predictions from tree-based models and to provide information on uncertainty in regression models or classifiers.

Tree-structured models for regression, classification and survival analysis, following the ideas in the CART book, are implemented in

Multivariate trees are available in

Ensemble techniques for trees:

Graphical tools for the visualization of trees are available in package maptree.

Packages mda and earth implement Multivariate Adaptive Regression Splines (MARS), a technique which provides a more flexible, tree-based approach to regression than the piecewise constant functions used in regression trees.

Ordination

R and add-on packages provide a wide range of ordination methods, many of which are specialised techniques particularly suited to the analysis of species data. The two main packages are ade4 and vegan. ade4 derives from the traditions of the French school of Analyse des Donnees and is based on the use of the duality diagram. vegan follows the approach of Mark Hill, Cajo ter Braak and others, though the implementation owes more to that presented in Legendre & Legendre (1988) Numerical Ecology, 2 nd English Edition , Elsevier. Where the two packages provide duplicate functionality, the user should choose whichever framework that best suits their background.

Dissimilarity coefficients

Much ecological analysis proceeds from a matrix of dissimilarities between samples. A large amount of effort has been expended formulating a wide range of dissimilarity coefficients suitable for ecological data. A selection of the more useful coefficients are available in R and various contributed packages.

Standard functions that produce, square, symmetric matrices of pair-wise dissimilarities include:

Function distance() in package analogue can be used to calculate dissimilarity between samples of one matrix and those of a second matrix. The same function can be used to produce pair-wise dissimilarity matrices, though the other functions listed above are faster. distance() can also be used to generate matrices based on Gower's coefficient for mixed data (mixtures of binary, ordinal/nominal and continuous variables). Function daisy() in package cluster provides a faster implementation of Gower's coefficient for mixed-mode data than distance() if a standard dissimilarity matrix is required. Function gowdis() in package FD also computes Gower's coefficient and impliments extensions to ordinal variables.

Cluster analysis

Cluster analysis aims to identify groups of samples within multivariate data sets. A large range of approaches to this problem have been suggested, but the main techniques are hierarchical cluster analysis, partitioning methods, such as k -means, and finite mixture models or model-based clustering. In the machine learning literature, cluster analysis is an unsupervised learning problem.

The Cluster task view provides a more detailed discussion of available cluster analysis methods and appropriate R functions and packages.

Hierarchical cluster analysis:

Partitioning methods:

Mixture models and model-based cluster analysis:

Ecological theory

There is a growing number of packages and books that focus on the use of R for theoretical ecological models.

Population dynamics

Estimating animal abundance and related parameters

This section concerns estimation of population parameters (population size, density, survival probability, site occupancy etc.) by methods that allow for incomplete detection. Many of these methods use data on marked animals, variously called 'capture-recapture', 'mark-recapture' or 'capture-mark-recapture' data.

Packages mra, secr and DSpat can also be used to simulate data from their respective models.

See also the SpatioTemporal task view for analysis of animal tracking data under Moving objects, trajectories .

Modelling population growth rates:

Environmental time series

Additionally, a fuller description of available packages for time series analysis can be found in the TimeSeries task view.

Spatial data analysis

See the Spatial CRAN Task View for an overview of spatial analysis in R.

Extreme values

ismev provides functions for models for extreme value statistics and is support software for Coles (2001) An Introduction to Statistical Modelling of Extreme Values , Springer, New York. Other packages for extreme value theory include:

Phylogenetics and evolution

Packages specifically tailored for the analysis of phylogenetic and evolutionary data include:

The Phylogenetics task view provides more detailed coverage of the subject area and related functions within R.

UseRs may also be interested in Paradis (2006) Analysis of Phylogenetics and Evolution with R , Springer, New York, a book in the new UseR series from Springer.

Soil science

Several packages are now available that implement R functions for widely-used methods and approaches in pedology.

Hydrology and Oceanography

A growing number of packages are available that implement methods specifically related to the fields of hydrology and oceanography. Also see the Extreme Value and the Climatology sections for related packages.

Climatology

Several packages related to the field of climatology.

Palaeoecology and stratigraphic data

Several packages now provide speciailist functionality for the import, analysis, and plotting of palaeoecological data.

Other packages

Several other relevant contributed packages for R are available that do not fit under nice headings.

CRAN packages:

Related links: