The PEDALFAST (PEDiatric vALidation oF vAriableS in TBI) project was a prospective cohort study conducted at multiple American College of Surgeons freestanding level I Pediatric Trauma Centers. The cohort consists of patients under 18 years of age who were admitted to the intensive care unit (ICU) with an acute traumatic brain injury (TBI) diagnosis and Glasgow Coma Scale (GCS) score not exceeding 12 or a neurosurgical procedure (intracranial pressure [ICP] monitor, external ventricular drain [EVD], craniotomy, or craniectomy) within the first 24 hours of admission.
This data set was used for several publications:
Funded by NICHD grant number R03HD094912 we retroactively mapped the data collected by the PEDALFAST project the Federal Interagency Traumatic Brain Injury Research (FITBIR) data standard. The R data package pedalfast.data provides the data submitted to FITBIR as both raw files and in ready to use R data sets.
The PEDALFAST study data were collected and managed using REDCap electronic data capture tools hosted at the University of Colorado Denver. (Harris et al. 2009) REDCap (Research Electronic Data Capture) is a secure, web-based application designed to support data capture for research studies, providing 1) an intuitive interface for validated data entry; 2) audit trails for tracking data manipulation and export procedures; 3) automated export procedures for seamless data downloads to common statistical packages; and 4) procedures for importing data from external sources.
This vignette documents the provided data set and other utilities of this package.
The pedalfast.data package provides the following data objects:
data(package = "pedalfast.data")$results[, c("Item", "Title")]
## Item Title
## [1,] "pedalfast" "PEDALFAST Data"
## [2,] "pedalfast_metadata" "PEDALFAST Metadata"
Each of these objects will be described in detail in the following sections.
The provided data sets are data.frames. Examples for working with the provided data sets will be done using base R, the tidyverse, and data.table. Click the following buttons to have the different data paradigms displayed or not while reading this vignette.
The data collected during the PEDALFAST study has been provided in two data.frames so the end user may opt into using another paradigm such as data.table or the tidyverse. The following will focus on use of base R methods only.
Reproduction of the examples in this vignette will require the following namespaces.
library(pedalfast.data)
Load the provided data sets into the active session via data as follows.
data(pedalfast, package = "pedalfast.data")
data(pedalfast_metadata, package = "pedalfast.data")
str(pedalfast, max.level = 0)
## 'data.frame': 388 obs. of 103 variables:
str(pedalfast_metadata, max.level = 0)
## 'data.frame': 103 obs. of 3 variables:
The pedalfast
is a data frame with each row reporting
the collected data for one subject, and each column being a unique
variable. The pedalfast_metadata
data frame is a selection
of columns from the data dictionary provided by a REDCap export of the
project. In the following you will find examples of specific utilities
provided in this package to make formatting the data easier.
Let’s look at the first three columns of pedalfast, and the first three rows of pedalfast_metadata.
head(pedalfast[, 1:3])
## studyid age female
## 1 102 1179 0
## 2 103 90 0
## 3 110 1164 1
## 4 112 1413 1
## 5 114 233 0
## 6 116 5791 0
pedalfast_metadata[1:3, ]
## variable description values
## 1 studyid PEDALFAST Patient ID <NA>
## 2 age Age, in days, at time of admission <NA>
## 3 female Is the patient female? 0, no | 1, yes
The first column of pedalfast
is the studyid, and the
first row of pedalfast_metadata
is the documentation for
the studyid. Similarly, the second column of pedalfast
and
second row of pedalfast_metadata
are for the age of the
patient. The first notable change in is in the third row of the
pedalfast_metadata
where the indicator for female is
documented including the mapping from integer to English: 0, no | 1,
yes
The rest of this section of the vignette provides details on each of the variables in the data set and provides some examples for data use.
The PEDALFAST data was collected at multiple sites. The study id provided is a patient specific random number between 100 and 999 with no mapping to the sites. That is, you should not be able to determine which site provided a specific row of data.
knitr::kable(subset(pedalfast_metadata, variable == "studyid"))
variable | description | values |
---|---|---|
studyid | PEDALFAST Patient ID | NA |
str(pedalfast$studyid)
## int [1:388] 102 103 110 112 114 116 120 122 123 124 ...
Age of the patient is reported in days.
knitr::kable(subset(pedalfast_metadata, variable == "age"))
variable | description | values | |
---|---|---|---|
2 | age | Age, in days, at time of admission | NA |
summary(pedalfast$age) # in days
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.0 679.5 2508.5 2699.3 4635.5 6501.0
summary(pedalfast$age / 365.25) # in years
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.000 1.860 6.868 7.390 12.691 17.799
The PEDALFAST data has been submitted to the Federal Interagency Traumatic Brain
Injury Research (FITBIR) Informatics System. As part of that
submission age of the patient was to be reported as the floor of the
patients age in years with the exception of those under one year of age.
For those under one year of age the reported value was to be the
truncated three decimal age in years. For example, a patient more than
one month but less than two months would have a reported age of 0.083
(1/12), a 8 month old would have a reported age of 0.666 (8/12). Note
the truncation of the decimal. If you require the same rounding scheme
we have provided a function in this package round_age
to
provide the rounding with the truncation. The function will return age
as a character by default, a numeric value will be returned when
specified.
fitbir_ages <-
data.frame(age = pedalfast$age / 365.25,
char = round_age(pedalfast$age / 365.25),
num = round_age(pedalfast$age / 365.25, type = "numeric"))
plot(fitbir_ages$age, fitbir_ages$num, xlab = "Age (years)", ylab = "FITBIR Age (Years)")
The variable female is an indicator for sex/gender. The category of female/male was made by the attending physicians or reported by the patient/caregivers. This variable was not determined by sex chromosomes genotyping. The intent was to report sex but gender, the social constructed identify of sex, might be more appropriate.
knitr::kable(subset(pedalfast_metadata, variable == "female"))
variable | description | values | |
---|---|---|---|
3 | female | Is the patient female? | 0, no | 1, yes |
sum(pedalfast$female)
## [1] 149
mean(pedalfast$female)
## [1] 0.3840206
Three variables related to injury. The source of information for the injury and the injury mechanism (injurymech) are both categorical variables with known values and are presented as character vectors in the pedalfast data.frame. The time from injury to admission (injurytoadmit) is reported in days, if the date of injury was known.
inj_vars <- c("sourceinj", "injurytoadmit", "injurymech")
knitr::kable(subset(pedalfast_metadata, variable %in% inj_vars))
variable | description | values | |
---|---|---|---|
4 | sourceinj | Source of Injury Information | NA |
5 | injurytoadmit | Days from injury, if known, to admission. | NA |
6 | injurymech | Injury mechanism | 1, traffic | 2, fall | 3, known or suspected abuse | 4, self-harm | 9, other |
summary(pedalfast[, inj_vars])
## sourceinj injurytoadmit injurymech
## Length:388 Min. : 0.000 Length:388
## Class :character 1st Qu.: 0.000 Class :character
## Mode :character Median : 0.000 Mode :character
## Mean : 1.415
## 3rd Qu.: 0.000
## Max. :366.000
## NA's :41
The injurymech is a character vector by default so the end user may build a factor as needed.
table(pedalfast$injurymech, useNA = "always")
##
## Fall Known or suspected abuse Other
## 72 91 77
## Self-harm Traffic <NA>
## 6 142 0
Several variables were collected in both the emergency department (ED) and the intensive care unit (ICU). The following are the notes for the variables collected in the ED.
The Glasgow Coma Score was assessed in one or both of the Emergency Department (ED) and the ICU. There are several variables noted here for GCS with the suffix ‘ed’ which are also reported later from the ICU with the suffix ‘icu’.
knitr::kable(subset(pedalfast_metadata, grepl("^gcs.*ed$", variable)))
variable | description | values | |
---|---|---|---|
7 | gcsyned | Was a GCS obtained in the ED? | 0, no | 1, yes |
8 | gcseyeed | ED GCS Eye | 4, spontaneous | 3, to speech | 2, to pain only | 1, no response |
9 | gcsverbaled | ED GCS Verbal | 5, oriented, appropriate or coos and babbles | 4, confused or irritable cries | 3, inappropriate words or cries to pain | 2, incomprehensible sounds or moans to pain | 1, no response |
10 | gcsmotored | ED GCS Motor | 6, obeys commands | 5, localizes pain or withdraws to touch | 4, withdraws from painful stimuli | 3, abnormal flexion to pain | 2, abnormal extension to pain | 1, no response/flaccid |
11 | gcsed | ED GCS Total | [gcseyeed]+[gcsverbaled]+[gcsmotored] |
12 | gcsetted | Was the patient intubated at the time of their ED GCS assessment? | 0, no | 1, yes |
13 | gcsseded | Was the patient sedated at the time of their ED GCS assessment? | 0, no | 1, yes |
14 | gcspared | Was the patient chemically paralyzed at the time of their ED GCS assessment? | 0, no | 1, yes |
15 | gcseyeobed | Were the patient’s eyes obscured by injury, swelling, or bandage at the time of their ED GCS assessment? | 0, no | 1, yes |
summary(pedalfast[, grep("^gcs.*ed$", names(pedalfast))])
## gcsyned gcseyeed gcsverbaled gcsmotored
## Min. :0.0000 Min. :1.000 Min. :1.000 Min. :1.000
## 1st Qu.:1.0000 1st Qu.:1.000 1st Qu.:1.000 1st Qu.:1.000
## Median :1.0000 Median :1.000 Median :1.000 Median :4.000
## Mean :0.9835 Mean :1.677 Mean :1.595 Mean :3.326
## 3rd Qu.:1.0000 3rd Qu.:2.000 3rd Qu.:1.000 3rd Qu.:5.000
## Max. :1.0000 Max. :4.000 Max. :5.000 Max. :6.000
## NA's :24 NA's :20 NA's :20 NA's :20
## gcsed gcsetted gcsseded gcspared
## Min. : 3.000 Min. :0.0000 Min. :0.0000 Min. :0.0000
## 1st Qu.: 3.000 1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:0.0000
## Median : 6.000 Median :1.0000 Median :1.0000 Median :0.0000
## Mean : 6.598 Mean :0.7371 Mean :0.6158 Mean :0.1355
## 3rd Qu.: 9.000 3rd Qu.:1.0000 3rd Qu.:1.0000 3rd Qu.:0.0000
## Max. :15.000 Max. :1.0000 Max. :1.0000 Max. :1.0000
## NA's :20 NA's :19 NA's :21 NA's :19
## gcseyeobed
## Min. :0.00000
## 1st Qu.:0.00000
## Median :0.00000
## Mean :0.04905
## 3rd Qu.:0.00000
## Max. :1.00000
## NA's :21
GCS for the eye, verbal, and motor can be used as both numeric values
(as reported in the pedalfast data.frame) or as a categorical variable.
The pedalfast.data package provides functions for quickly
mapping from the numeric values to a factor for gcs. The functions
gcs_as_integer
and gcs_as_factor
While GCS is a common assessment, the specific language used may vary. By providing these functions we are able to report the exact language used on the assessment.
Lower numeric values of GCS correspond to lower neurological functioning. To illustrate this consider, mapping the integer values 1 through 6 to the labels for the GCS scales:
knitr::kable(
data.frame(integers = 1:6,
eye = gcs_as_factor(1:6, scale = "eye"),
motor = gcs_as_factor(1:6, scale = "motor"),
verbal = gcs_as_factor(1:6, scale = "verbal"))
)
integers | eye | motor | verbal |
---|---|---|---|
1 | No response | No response/flaccid | No response |
2 | To pain only | Abnormal extension to pain | Incomprehensible sounds or moans to pain |
3 | To speech | Abnormal flexion to pain | Inappropriate words or cries to pain |
4 | Spontaneous | Withdraws from painful stimuli | Confused or irritable cries |
5 | NA | Localizes pain or withdraws to touch | Oriented, appropriate or coos and babbles |
6 | NA | Obeys commands | NA |
By default, the mapping of the integer values to factor levels will
map the the integer value of 1 to level 1. The argument
highest_first
will reverse the order of the levels. This
option has been provided to help make setting a logical reference level
for modeling. For example, say we want to estimate hospital length of
stay by the motor GCS score.
gcs_example_data <-
data.frame(los = pedalfast$hosplos,
motor_int = pedalfast$gcsmotored,
motor_f1 = gcs_as_factor(pedalfast$gcseyeed, scale = "eye"),
motor_f2 = gcs_as_factor(pedalfast$gcseyeed, scale = "eye", highest_first = TRUE))
head(gcs_example_data)
## los motor_int motor_f1 motor_f2
## 1 22 4 No response No response
## 2 24 2 No response No response
## 3 9 4 To pain only To pain only
## 4 6 5 Spontaneous Spontaneous
## 5 40 6 No response No response
## 6 36 5 No response No response
Just looking at the summary of the example data set shows the order of the factor is different
summary(gcs_example_data)
## los motor_int motor_f1 motor_f2
## Min. : 0.00 Min. :1.000 No response :262 Spontaneous : 63
## 1st Qu.: 4.00 1st Qu.:1.000