|Maintainer:||Julia Piaskowski, Adam Sparks, Janet Williams|
|Contact:||julia.piask at gmail.com|
|Contributions:||Suggestions and improvements for this task view are very welcome and can be made through issues or pull requests on GitHub or via e-mail to the maintainer address. For further details see the Contributing guide.|
|Citation:||Julia Piaskowski, Adam Sparks, Janet Williams (2022). CRAN Task View: Agricultural Science. Version 2022-09-21. URL https://CRAN.R-project.org/view=Agriculture.|
|Installation:||The packages from this task view can be installed automatically using the ctv package. For example, |
Agriculture encompasses a broad breadth of disciplines. Many packages in base R and contributed packages are useful to agricultural researchers. For that reason, this is not an exhaustive list of all packages useful to agricultural research. This CRAN task view is intended to cover major packages that in most cases, have been developed to support agricultural research and analytical needs.
Note that some of these packages are on CRAN and others are on GitHub, Bioconductor, or R-Forge.
If you think that a package is missing from this list, please let us know through issues or pull requests in the GitHub repository, or via e-mail.
USDA databases: Data from the United States Department of Agriculture’s National Agricultural Statistical Service ‘Quick Stats’ web API can be accessed with rnassqs or with tidyUSDA, which also offers some mapping capabilities. The USDA’s Cropland Data Layer API can be accessed with CropScapeR and cdlTools, the latter providing utility functions for processing CDL data. rusda provides an interface to access the USDA-ARS Systematic Mycology and Microbiology Laboratory (SMML)’s four databases: Fungus-Host Distributions, Specimens, Literature and the Nomenclature database. The USDA’s Agricultural Resource Management Survey (ARMS) data API can be accessed with rarms. The USDA’s Livestock Mandatory Reporting data API can be accessed with usdampr. The packages FAOSTAT and faobulk can be used to access data from the FAOSTAT Database from the United Nations Food and Agricultural Organization (FAO).
Most USDA-NRCS soils related databases and APIs can be accessed with soilDB.
FedData provides access to geospatial data from the United States Soil Survey Geographic (SSURGO) database, the Global Historical Climatology Network (GHCN), the Daymet gridded estimates of daily weather parameters for North America, the International Tree Ring Data Bank, and the National Land Cover Database. SSURGO data can also be accessed and processed with XPolaris.
NASA soil moisture active-passive (SMAP) data can be accessed and processed with smapr.
PGRdup provides functions to aid the identification of probable/possible duplicates in plant genetic resources collections.
Many of the agriculture-focused packages listed in this guide also include data sets to illustrate their functionality (e.g. agricolae, AgroTech, BGLR).
agridat consists of a very large collection of agricultural data sets and example analyses; the package contains a vignette detailing additional data sets and extensive resources to support agricultural analysis.
agriTutorial provides a collection of agricultural data sets and analysis with particular attention to crop experiments.
The soybean nested associated mapping population data set can be accessed via SoyNAM.
simplePhenotypes can be used for simulating pleiotropic, linked and epistatic phenotypes.
USGS county data on fertilizer sales can be accessed with ggfertilizer.
Annual agriculture production data from the Peruvian Integrated System of Agricultural Statistics (SIEA) covering 2004 to 2014 can be accessed with cropdatape.
The packages nlraa and AgroReg provides linear and nonlinear regression functions specifically for agricultural applications. biotools can conduct a wide array of multivariate analysis for agronomists including genetic covariance, optimal plot size, tests for spatial dependence, and tests for seed lot heterogeneity.
agriCensData is a flexible package for working with censored data (e.g. time to flowering, instrumentation values below the detection limit, disease scoring).
grapesAgri1 houses a collection of shiny apps, GRAPES (General R-shiny based Analysis Platform Empowered by Statistics), that works as a graphical user interface for individuals to upload data files and analyse. Linear models, ANOVA for CRD and 2-way RCBD designs, correlation analysis, exploratory data analysis and other common hypothesis tests are supported.
ALUES implements methodology developed by the FAO and the International Rice Research Institute for evaluating land suitability for different crop production.
AGPRIS (AGricultural PRoductivity in Space) provides functions for different spatial analyses in implemented in INLA and other spatial approaches. The package KenSyn has example data sets and analytical code supporting the book De L’analyse des Réseaux Expérimentaux à la Méta-analyse (French) or From Experimental Network to Meta-analysis (English).
AgroTech provides functions for making chemical application calculations and example data sets.
The task views for Econometrics, (Empirical) Finance, and TimeSeries provide information on packages and tools relevant to agriculture economics.
The Hydrology has many resources for accessing and processing weather and climate data.
Data sources: Data from the Copernicus data set of agrometeorological indicators can be downloaded and extracted using ag5Tools. Climate crop zones in Brazil can be accessed and calculated with cropZoning using data sets from TerraClimate that are calibrated to weather stations run by the National Meteorological Institute of Brazil. acdcR (AgroClimatic Data by County) provides functions to calculate United States county-level variables in agricultural production or agroclimatic and weather analyses.
Data preparation: meteor provides a set of functions for weather and climate data manipulation to support crop and crop disease modeling. cropgrowdays can be used for calculating growing degree days, cumulative rainfall, number of stress day, mean radiation, evapotranspiration and other variables. agroclim contains functions to compute agroclimatic indices useful to zoning areas based on climatic variables and to evaluate the importance of temperature and precipitation for individual crops or in general for agricultural lands.
The frost package contains a compilation of empirical methods used by farmers and agronomic engineers to predict the minimum temperature to detect a frost event.
LWFBrook90R provides an implementation of the soil vegetation atmosphere transport (SVAT) model LWF-BROOK90 to calculate daily evaporation (transpiration, interception, and soil evaporation) and soil water fluxes, along with soil water contents and soil water tension of a soil profile covered with vegetation.
The task view for ExperimentalDesign provide additional information on experimental design for a wide variety of research problems.
agricolae provides extensive resources for the planning and analysis of planned field experiments. Designs constructed by agricolae can be visualised with agricolaeplotr. Agricultural field trials layout can be also be visualised with desplot.
The package DiGGer was developed for rectangular field trials; its purpose is to help users determine the optimal experimental design based on the treatment structure and number of replicates.
CropDetectR can be used to identify crop rows from image data.
FWRGB can process plant images for downstream machine learning models to predict fresh biomass.
pliman provides tools for image manipulation to quantify plant leaf area, disease severity, number of disease lesions, and obtain statistics of image objects such as grains, pods, pollen, leaves, and more.
General analysis: The package agricolae contains functions for analyzing many common designs in agriculture trials such as split plot, lattice, Latin square and some additional functions such AMMI and AUDPC calculations. The proprietary software asreml provides an R version of their mixed model software for field trial analysis (note this is not open source and requires an annual license). CRAN also contains an add-on package asremlPlus that provides several accessory functions to asreml. INLA provides tools for Bayesian inference of latent Gaussian models, and it contains functions for modelling spatial variation, such as field experiments or farm locations. The gosset package provides the toolkit for a workflow to analyse experimental agriculture data, from data synthesis to model selection and visualisation. AgroR has general functions and a shiny app for analysis of common designs in agriculture: CRD, RCBD and Latin square.
Spatial analysis: the statgenSTA has functions for single trial analysis with and without spatial components. SpATS can be used to adjust for field spatial variation using p-splines. A localised method of spatial adjustment for unreplicated trials, moving grid adjustment, is implemented with mvngGrAd.
Trials utilizing an incomplete block design can be analysed used ispd.
The Tracking task view has many resources for working with tracked animal data and studying animal movement.
The package usdampr provides access to the USDA’s Livestock Mandatory Reporting API.
Many of the genetic packages described in the breeding section of this task view can also be applied to animals.
See the R package repository Bionconductor for bioinformatic tools to support the processing of high-throughput genomic data.
General plant breeding: st4gi and variability provides several common utility functions for genetic improvement of crops. Also, please see the subsection on “genotype-by-environment interactions” in this task view for packages integrating environmental and genomic data in an analytical framework. gpbStat provides functions for common plant breeding analyses including line-by-tester analysis (Arunachalam 1974 and diallel analysis (Griffing 1956). plantbreeding provides many convenience functions for working with populations and designs common in plant breeding including dialleles, line testers, augmented trials, the Carolina design, and more.
heritability implements marker-based estimation of heritability when observations on genetically identical replicates are available.
Breeding simulations AlphaSimR is an implementation of the AlphaSim algorithm in R, providing functions for stochastic modelling of processes common to breeding programs such as selection and crossing. MoBPS has a suite of functions for simulating genetic gain and economic costs in a plant breeding program. isqg provides functions for high performance quantitative genetic simulations using a bitset-based algorithm.
There are several packages focused on linkage disequilibrium on Bioconductor.
There are two notable and long-standing packages for quantitative trait loci (QTL) analysis: (1) onemap, providing MapMaker/EXP-like performance and additional tools; and (2) qtl providing standard QTL mapping functionality and accessory functions for simulating crosses. BatchMap is a fork of onemap for fast computation of high density linkage maps. ASMap can conduct fast linkage mapping with the algorithm ‘MSTmap’. MapRtools is multi-purpose linkage mapping package for teaching and research.
For polyploids, the packages mappoly and polymapR can be used for linkage mapping and the packages qtlpoly and polyqtlR can be used for QTL estimation. diaQTL is for QTL and haplotype analysis of diallel populations (diploid and autotetraploid).
statgenMPP can conduct QTL mapping in multi-parent populations.
Linkage maps can be visualized with LinkageMapView.
There are many GWAS packages on Bioconductor.
GWAS can be conducted using a stepwise mixed linear model for multilocus data with mlmm.gwas or MultLocMixMod (use
library(mlmm) to load the package in R). The package statgenGWAS can fit GWAS models using the EMMAX algorithm. GAPIT3 is wrapper for several GWAS algorithms including the original GAPIT, FarmCPU and BLINK.
GWAS models for a very large number of SNPs and/or observations can be estimated with rMVP and megaLMM. Functions for conducting GWAS in autotetraploids are provided by GWASpoly, and these functions also work in diploid species. Variable selection for ultra-large dimensional GWAS data sets can be done with bravo, which implements the Bayesian algorithm SVEN, selection of variables with embedded screening.
StageWise provides functions to conduct a 2-stage GWAS when the phenotypic data are from multiple field trials.
For polyploids, polyBreedR provides convenience functions to facilitate the use of genome-wide markers for breeding autotetraploid species, and its functionality also extends to diploids.
General genomic selection packages: breedR is a general purpose package for performing quantitative genetic analyses. Genome feature mixed linear models using frequentist and Bayesian approaches can be implemented with qgg. The package STGS implements several genomic selection models for single traits. BWGS, “Breed Wheat Genomic Selection”, provides a pipeline of functions for conducting genomic selection in hexaploid wheat.
GBLUP: Packages supporting genetic prediction using mixed models augmented with pedigree or genetic marker data include sommer, rrBLUP, BGLR, lme4GS (this package has special installation instructions), lme4qtl, pedigreemm, qgtools, cpgen, QTLRel, and the licensed software asreml. Many of these packages have built-in functionality for data preparation steps including data imputation and calculation of the relationship matrices.
GSelection implements genomic selection integrating additive and non-additive models.
pedmod provides linear modelling functions integrating kinship for categorical traits.
coxme can fit Cox proportional hazards models containing both fixed and random effects with a kinship matrix.
GSMX, multivariate genomic selection, estimates trait heritability and handles overfitting through cross validation.
TSDFGS can estimate the optimal training population size and composition for genomic selection.
Multiple environments and traits: BGGE conducts genomic prediction for continuous variables, focused on genotype-by-environment genomic selection models following the methods of Jarquín 2014. The package BMTME builds genomic selection prediction models that can be expanded to multiple traits and environments using Bayesian models developed by Montesinos-Lopéz (2016, 2018a, 2018b).
Kinship and relatedness: AGHmatrix provides extensive options for calculating pedigree and genomic relationships (additive and dominance). The pedigree packages provides functionality for ordering pedigrees, calculating and inverting the pedigree relationship matrix and other related tasks. statgenIBD can calculate IBD probabilities for biparental, three-way and four-way crosses. kinship2 provides functions for manipulating and visualising pedigree-based kinship data.
The apsimx package has functions to read, inspect, edit and run files for APSIM “Next Generation” (
.apsimx) and APSIM “Classic” (
.apsim) files. rapsimng works with next generation APSIM files.
DSSAT provides a comprehensive R interface to the Decision Support System for Agrotechnology Transfer Cropping Systems Model (DSSAT-CSM) documented by Jones (2003). This package provides cross-platform functions to read and write input files, run DSSAT-CSM, and read output files.
meteor provides a set of functions for weather and climate data manipulation to support crop and crop disease modelling.
Crop Water Usage: cropDemand can be used to estimate crop water demand in Brazilian production regions using the TerraClimate data set. Evapotranspiration can estimate potential and actual evapotranspiration using 21 different models.
metrica has many convenience functions for comparing model predictions with ground truth data.
For packages supporting sensory studies, see the Psychometrics task view.
NutrienTrackeR provides convenience functions for calculating nutrient content (macronutrients and micronutrients) of foods using food composition data from several reference databases, including: ‘USDA’ (United States), ‘CIQUAL’ (France), ‘BEDCA’ (Spain) and ‘CNF’ (Canada).
statgenGxE implements several analytical approaches for addressing genotype-by-environment interactions.
EnvRtype can be used for assembling climate data, data set preparation and environmental classification or envirotyping.
A wide variety of stability analysis statistics can be calculated via agrostab including coefficient of homeostaticity, specific adaptive ability, weighted homeostaticity index, superiority measure, regression on environmental index, Tai’s stability parameters, stability variance, ecovalence and other stability parameters.
The Epidemiology task view lists relevant package for modelling plant diseases.
Epidemiology Simulation: Stochastic disease modelling of plant pathogens incorporating spatial and genetic information can be done with landsepi. The package ascotraceR can simulate an Ascochyta blight infection in a chickpea field following the model developed by Diggle (2022)).
epiphy is a toolbox for analyzing plant disease epidemics. It provides a common framework for plant disease intensity data recorded over time and/or space.
epifitter provides functions for analysis and visualization of plant disease progress curve data.
Plant Pathogen Genetics: hagis has functions for analysis of plant pathogen pathotype survey data. Functions provided calculate distribution of susceptibilities, distribution of complexities with statistics, pathotype frequency distribution, as well as diversity indices for pathotypes. Evolution of resistance genes under pesticide pressure can be simulated under different numbers of pests, modes of pest reproduction, resistance loci, number of pesticides and other facets with resevol. Populations with mixed clonal/sexual reproductive strategies can be analyzed with poppr, which has population genetic analysis tools for hierarchical analysis of partially clonal populations.
See the task view for Psychometrics for general sociology packages.
Spatial: The Spatial and SpatioTemporal CRAN task views provide extensive resources in spatial statistics. mpspline2 implements a mass-preserving spline to soil attributes to make continuous down-profile estimates of attributes measured over discrete, often discontinuous depth intervals.
The sharpshootR contains a compendium of utility functions supporting soils survey work including data management, summary, visualisations and conversions.
For soil pedology, aqp provides a general toolkit for soil scientists: specialized data structures, soil profile summary, visualisation, color conversion, and more. SoilTaxonomy provides functions for parsing soil taxonomic terms. pedometrics has many utility functions for common analyses of soil data.
Soil water: Soil water retention curves can be calculated by the soilwater packages using the Van Genuchten (1980) method for soil water retention and Mualem (1976) method for hydraulic conductivity. Estimation and prediction of parameters of soil hydraulic property models can be accomplished with spsh.
SoilR models soil organic matter decomposition in terrestrial ecosystems with linear and nonlinear models. The sorcering can be used to model soil organic carbon and soil organic nitrogen and to calculate N mineralisation rates.
Soil texture triangles can be graphed using soiltexture; this package can also classify and transform soil texture data.
QI can be used to calculate potassium intensity and exchangeability.
Soil Fertility Testing: soiltestcorr has functions for conducting correlation analysis between soil test values and crop yield data. SoilTesting provides functions for calculating soil mineral concentrations from analytical lab results. fertplan provides fertilizer recommendations based on soil test results (note this package is optimized for horticultural crop production in Italy).
Remote Sensing: Agriculture image features from spectral data can extracted with agrifeature. It has functions to calculate gray level co-occurrence matrix (GLCM), RGB-based vegetative index (RGB VI) and normalized difference vegetation index (NDVI). Experimental units (e.g. plots) can be obtained from spectral images using rPAex. The mapsRinteractive package provides functions for working with soil point data in raster format.
The suitability of specific soils for crop production can be analyzed using soilassessment, including soil fertility classes, soil erosion models and soil salinity classification. Suitability requirements are for crops grouped into cereal crops, nuts, legumes, fruits, vegetables, industrial crops, and root crops.
For ecological studies and analytical applications, the Environmetrics task view provides a list of existing R resources in this topic.
PROSPER is a package for simulating weed population dynamics at the individual and population level under a range of conditions including herbicide resistance and herbicide pressure.
.rdafile, not a package)
|Core:||AGHmatrix, agricolae, agridat, apsimx, aqp, cdlTools, drc, DSSAT, FedData, inti, meteor, nlraa, qtl, sommer, tidyUSDA.|
|Regular:||acdcR, ag5Tools, AGPRIS, agricolaeplotr, agrifeature, agriTutorial, agroclim, AgroR, AgroReg, agrostab, AgroTech, AlphaSimR, ALUES, ascotraceR, ASMap, asremlPlus, bayesammi, BGGE, BGLR, biotools, BMTME, bravo, BWGS, ClimMobTools, coxme, cropdatape, cropDemand, CropDetectR, cropgrowdays, CropScapeR, cropZoning, desplot, DMMF, eemdTDNN, epifitter, Evapotranspiration, FAOSTAT, febr, FWRGB, gge, gosset, gpbStat, grapesAgri1, GSelection, GSMX, hagis, heritability, hnp, IBCF.MTME, ispd, isqg, KenSyn, kinship2, landsepi, LinkageMapView, lmDiallel, LW1949, LWFBrook90R, mappoly, mapsRinteractive, metrica, mlmm.gwas, mpspline2, mvngGrAd, NutrienTrackeR, onemap, pedigree, pedigreemm, pedmod, pedometrics, PGRdup, pliman, polymapR, polyqtlR, poppr, PROSPER, qgg, qgtools, QI, qtlpoly, QTLRel, rapsimng, rarms, Recocrop, resevol, rfieldclimate, rMVP, rnassqs, rPAex, Rquefts, rrBLUP, rusda, Rwofost, selection.index, sharpshootR, smapr, soilassessment, soilDB, SoilR, SoilTaxonomy, soiltestcorr, SoilTesting, soiltexture, soilwater, sorcering, SoyNAM, SpATS, spFW, spsh, statgenGWAS, statgenGxE, statgenHTP, statgenIBD, statgenMPP, statgenSTA, STGS, stlELM, TSDFGS, usdampr, variability, WCM, ZeBook.|