This vignette focuses on how to create in-text tables with the inTextSummaryTable
package.
In this vignette we assume you have ready the data.frame
(s) to create the tables. If you have doubts on the data format, please look the introductory vignette at the section “data format”.
We will use the example data available in the clinUtils
package. Let’s load the packages and the data, and get started!
library(inTextSummaryTable)
library(pander)
library(tools) # toTitleCase
library(clinUtils)
# load example data
data(dataADaMCDISCP01)
dataAll <- dataADaMCDISCP01
labelVars <- attr(dataAll, "labelVars")
The getSummaryStatisticsTable
creates an in-text table of summary statistics for variable(s) of interest.
The Demographic data (ADSL
dataset) is used as example for the summary statistics table.
dataSL <- dataAll$ADSL
Variable(s) to summarize in the table are specified via the var
parameter.
Different set of statistics are reported depending on the type of variable: Categorical variable or Continuous variable.
See the documentation in section Base statistics for more details on the statistics included by default for each type, via:
? `inTextSummaryTable-stats`
For a discrete/categorical variable, the in-text table can display the counts/percentages of the number of subjects or records for each category of the variable.
If no variable is specified (via the var
parameter), the counts are displayed for the entire dataset.
getSummaryStatisticsTable(data = dataSL)
Statistic | StatisticValue |
---|---|
statN | 7 |
statm | 7 |
statPercTotalN | 7 |
statPercN | 100 |
Please note that this is equivalent of setting (var = 'all'
).
If a variable is specified (via the var
parameter), the counts are displayed for each category.
getSummaryStatisticsTable(data = dataSL, var = "SEX")
Variable group | StatisticValue |
---|---|
Statistic | |
F | |
statN | 5 |
statm | 5 |
statPercTotalN | 7 |
statPercN | 71.43 |
M | |
statN | 2 |
statm | 2 |
statPercTotalN | 7 |
statPercN | 28.57 |
The categories of the variable are sorted alphabetically by default. To sort the categories in a specific order, the variable should be formatted as factor
, whose ordered categories are included in its levels
.
# specify manually the order of the categories
dataSL$SEX <- factor(dataSL$SEX, levels = c("M", "F"))
getSummaryStatisticsTable(data = dataSL, var = "SEX")
Variable group | StatisticValue |
---|---|
Statistic | |
M | |
statN | 2 |
statm | 2 |
statPercTotalN | 7 |
statPercN | 28.57 |
F | |
statN | 5 |
statm | 5 |
statPercTotalN | 7 |
statPercN | 71.43 |
# order categories based on a numeric variable
dataSL$SEXN <- ifelse(dataSL$SEX == "M", 2, 1)
dataSL$SEX <- reorder(dataSL$SEX, dataSL$SEXN)
getSummaryStatisticsTable(data = dataSL, var = "SEX")
Variable group | StatisticValue |
---|---|
Statistic | |
F | |
statN | 5 |
statm | 5 |
statPercTotalN | 7 |
statPercN | 71.43 |
M | |
statN | 2 |
statm | 2 |
statPercTotalN | 7 |
statPercN | 28.57 |
By default, the table only includes the categories present in the input data, to ensure a compact table for CSR export.
dataSLExample <- dataSL
# 'SEX' formatted as character with only male
dataSLExample$SEX <- "M" # only male
getSummaryStatisticsTable(data = dataSLExample, var = "SEX")
Variable group | StatisticValue |
---|---|
Statistic | |
M | |
statN | 7 |
statm | 7 |
statPercTotalN | 7 |
statPercN | 100 |
If extra categories should be represented in the table, the categorical variable should be formatted as a factor, whose levels contain all categories to be displayed in the table.
Furthermore, the parameter: varInclude0
should be set to TRUE
or to the specific variable (in case multiple variables are specified) to indicate that categories with 0 counts should be included.
# 'SEX' formatted as factor, to include also female in the table
# (even if not available in the data)
dataSLExample$SEX <- factor("M", levels = c("F", "M"))
getSummaryStatisticsTable(data = dataSLExample, var = "SEX", varInclude0 = TRUE)
Variable group | StatisticValue |
---|---|
Statistic | |
F | |
statN | 0 |
statm | 0 |
statPercTotalN | 7 |
statPercN | 0 |
M | |
statN | 7 |
statm | 7 |
statPercTotalN | 7 |
statPercN | 100 |
# or:
getSummaryStatisticsTable(data = dataSLExample, var = "SEX", varInclude0 = "SEX")
Variable group | StatisticValue |
---|---|
Statistic | |
F | |
statN | 0 |
statm | 0 |
statPercTotalN | 7 |
statPercN | 0 |
M | |
statN | 7 |
statm | 7 |
statPercTotalN | 7 |
statPercN | 100 |
A specific type of categorical variable is a ‘flag variable’, which indicates if a record fulfills a specific criteria.
Such variable is typically formatted in the data as:
The name of such variable typically ends with ‘FL’ in a CDISC-compliant ADaM or SDTM dataset.
For example, the subject-level dataset contains the following flag variables:
labelVars[grep("FL$", colnames(dataSL), value = TRUE)]
## SAFFL ITTFL EFFFL COMP8FL
## "Safety Population Flag" "Intent-to-Treat Population Flag" "Efficacy Population Flag" "Completers of Week 8 Population Flag"
## COMP16FL COMP24FL DISCONFL DSRAEFL
## "Completers of Week 16 Population Flag" "Completers of Week 24 Population Flag" "Did the Subject Discontinue the Study?" "Discontinued due to AE?"
## DTHFL
## "Subject Died?"
# has the subject discontinued from the study?
dataSL$DISCONFL
## [1] "" "" "Y" "Y" "Y" "Y" "Y"
If this variable is specified in var
, the counts for each category is reported:
getSummaryStatisticsTable(
data = dataSL,
var = "SAFFL"
)
Variable group | StatisticValue |
---|---|
Statistic | |
Y | |
statN | 7 |
statm | 7 |
statPercTotalN | 7 |
statPercN | 100 |
However, the interest is often to only reports the counts for the records fulfilling the criteria (records with ‘Y’). This is the case if the variable is specified via the varFlag
parameter too.
getSummaryStatisticsTable(
data = dataSL,
var = "SAFFL",
varFlag = "SAFFL"
)
Statistic | StatisticValue |
---|---|
statN | 7 |
statm | 7 |
statPercTotalN | 7 |
statPercN | 100 |
To include the total counts across categories, the varTotalInclude
parameter should be set to TRUE
(or to the specific variable).
getSummaryStatisticsTable(
data = dataSL,
var = "SEX",
varTotalInclude = TRUE
)
Variable group | StatisticValue |
---|---|
Statistic | |
Total | |
statN | 7 |
statm | 7 |
statPercTotalN | 7 |
statPercN | 100 |
F | |
statN | 5 |
statm | 5 |
statPercTotalN | 7 |
statPercN | 71.43 |
M | |
statN | 2 |
statm | 2 |
statPercTotalN | 7 |
statPercN | 28.57 |
For a continuous variable, the in-text table displays standard distribution statistics of the variable.
Please note that missing records (NA) for the variable are filtered, so the count statistics (number of subjects, records, percentage) are based only on the non missing records.
For a continuous variable, the presence of different values for the same subject (and across row/column variables) are checked and an appropriate error message is returned if multiple different values are available.
getSummaryStatisticsTable(data = dataSL, var = "AGE")
Statistic | StatisticValue |
---|---|
statN | 7 |
statm | 7 |
statMean | 74.29 |
statSD | 9.827 |
statSE | 3.714 |
statMedian | 75 |
statMin | 57 |
statMax | 89 |
statPercTotalN | 7 |
statPercN | 100 |
The table can contain a mix of categorical and continuous variables.
getSummaryStatisticsTable(
data = dataSL,
var = c("AGE", "SEX")
)
Variable | StatisticValue |
---|---|
Variable group | |
Statistic | |
AGE | |
statN | 7 |
statm | 7 |
statMean | 74.29 |
statSD | 9.827 |
statSE | 3.714 |
statMedian | 75 |
statMin | 57 |
statMax | 89 |
statPercTotalN | 7 |
statPercN | 100 |
SEX | |
F | |
statN | 5 |
statm | 5 |
statPercTotalN | 7 |
statPercN | 71.43 |
M | |
statN | 2 |
statm | 2 |
statPercTotalN | 7 |
statPercN | 28.57 |
Statistics of interest and their format are specified via the stats
parameter.
If an unique statistic expression is specified, the ‘Statistic’ column doesn’t appear in the table.
In case multiple statistics are specified, these are included as separated row.
A standard set of statistics is specified via specific tags to be passed to the stats
function.
The list of available statistics is mentioned in the section ‘Formatted statistics’ in:
? `inTextSummaryTable-stats`
Please see below examples of commonly used statistics.
# count: n, '%' and m
getSummaryStatisticsTable(
data = dataSL,
var = "SEX",
stats = "count"
)
Variable group | StatisticValue |
---|---|
Statistic | |
F | |
n | 5 |
% | 71.4 |
m | 5 |
M | |
n | 2 |
% | 28.6 |
m | 2 |
# n (%)
getSummaryStatisticsTable(
data = dataSL,
var = "SEX",
stats = "n (%)"
)
Variable group | n (%) |
---|---|
F | 5 (71.4) |
M | 2 (28.6) |
# n/N (%)
getSummaryStatisticsTable(
data = dataSL,
var = "SEX",
stats = "n/N (%)"
)
Variable group | n/N (%) |
---|---|
F | 5/7 (71.4) |
M | 2/7 (28.6) |
## continuous variable
# all summary stats
getSummaryStatisticsTable(
data = dataSL,
var = "AGE",
stats = "summary"
)
Statistic | StatisticValue |
---|---|
n | 7 |
Mean | 74.3 |
SD | 9.8 |
SE | 3.71 |
Median | 75.0 |
Min | 57 |
Max | 89 |
% | 100 |
m | 7 |
# median (range)
getSummaryStatisticsTable(
data = dataSL,
var = "AGE",
stats = "median (range)"
)
Median (range) |
---|
75.0 (57,89) |
# median and (range) in a different line:
getSummaryStatisticsTable(
data = dataSL,
var = "AGE",
stats = "median\n(range)"
)
Median |
---|
75.0 |
# mean (se)
getSummaryStatisticsTable(
data = dataSL,
var = "AGE",
stats = "mean (se)"
)
Mean (SE) |
---|
74.3 (3.71) |
# mean (sd)
getSummaryStatisticsTable(
data = dataSL,
var = "AGE",
stats = "mean (sd)"
)
Mean (SD) |
---|
74.3 (9.8) |
To change the formatting of the statistics, the stats
parameter should contain a language object (e.g. expression
or call
) of the default base set of statistics.
See the documentation in section ‘Base statistics’ for more details on the base statistics included by default, via:
? `inTextSummaryTable-stats`
For example, the following count table is restricted to the number of subjects per categories:
getSummaryStatisticsTable(
data = dataSL,
var = c("RACE", "SEX"),
stats = list(N = expression(statN))
)