In this vignette we will explore the functionalities of
generateSequenceCohort()
.
CohortSymmetry package is designed to work with data mapped to OMOP, so the first step is to create a reference to the data using the CDMConnector package. We will use the Eunomia dataset for the subsequent examples.
CohortSymmetry package requires that the cdm object contains two cohort tables: the index cohort and the marker cohort. There are a lot of different ways to create these cohorts, and it will depend on what the index cohort and marker cohort represent. Here, we use the DrugUtilisation package to generate two drug cohorts in the cdm object. For illustrative purposes, we will carry out SSA on aspirin (index_cohort) against acetaminophen (marker_cohort).
library(DrugUtilisation)
aspirin_code <- CodelistGenerator::getDrugIngredientCodes(
cdm = cdm,
name = "aspirin"
)
cdm <- DrugUtilisation::generateDrugUtilisationCohortSet(
cdm = cdm,
name = "aspirin",
conceptSet = aspirin_code
)
acetaminophen_code <- CodelistGenerator::getDrugIngredientCodes(
cdm = cdm,
name = "acetaminophen"
)
cdm <- DrugUtilisation::generateDrugUtilisationCohortSet(
cdm = cdm,
name = "acetaminophen",
conceptSet = acetaminophen_code
)
cdm$aspirin %>%
dplyr::glimpse()
#> Rows: ??
#> Columns: 4
#> Database: DuckDB v0.10.1 [xihangc@Windows 10 x64:R 4.3.1/C:\Users\xihangc\AppData\Local\Temp\RtmpA5cYib\file3c807582296c.duckdb]
#> $ cohort_definition_id <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1…
#> $ subject_id <int> 81, 163, 677, 733, 1036, 1088, 1333, 1353, 1450, …
#> $ cohort_start_date <date> 1963-09-11, 1979-06-04, 1977-09-11, 1948-07-29, …
#> $ cohort_end_date <date> 1963-12-10, 1979-06-25, 1977-10-16, 1948-08-26, …
cdm$acetaminophen %>%
dplyr::glimpse()
#> Rows: ??
#> Columns: 4
#> Database: DuckDB v0.10.1 [xihangc@Windows 10 x64:R 4.3.1/C:\Users\xihangc\AppData\Local\Temp\RtmpA5cYib\file3c807582296c.duckdb]
#> $ cohort_definition_id <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1…
#> $ subject_id <int> 1, 78, 141, 300, 334, 334, 334, 384, 384, 549, 67…
#> $ cohort_start_date <date> 1971-01-04, 1968-11-25, 1987-10-31, 2007-08-21, …
#> $ cohort_end_date <date> 1971-01-18, 1968-12-16, 1987-11-14, 2007-08-28, …
In order to initiate the calculations, the two cohorts tables need to
be intersected using generateSequenceCohortSet()
. This
process will output all the individuals who appear on both tables
subject to different parameters. Each parameter corresponds to a
specific requirement. The parameters for this function include
cohortDateRange
, daysPriorObservation
,
washoutWindow
, indexMarkerGap
and
combinationWindow
. Let’s go through examples to see how
each parameter works.
Let’s study the simplest case where no requirements are imposed. See figure below to see an example of an analysis containing six different participants.
See that only the first event/episode (for both the index and the marker) is included in the analysis. As there is no restriction criteria and all the individuals have an episode in the index and the marker cohort, all the subjects are included in the analysis. We can get a sequence cohort without including any particular requirement like so:
cdm <- generateSequenceCohortSet(
cdm = cdm,
indexTable = "aspirin",
markerTable = "acetaminophen",
name = "intersect",
cohortDateRange = as.Date(c(NA, NA)), #default
daysPriorObservation = 0, #default
washoutWindow = 0, #default
indexMarkerGap = NULL, #default
combinationWindow = c(0,Inf))
cdm$intersect %>%
dplyr::glimpse()
#> Rows: ??
#> Columns: 6
#> Database: DuckDB v0.10.1 [xihangc@Windows 10 x64:R 4.3.1/C:\Users\xihangc\AppData\Local\Temp\RtmpA5cYib\file3c807582296c.duckdb]
#> $ cohort_definition_id <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1…
#> $ subject_id <int> 6, 16, 42, 35, 40, 53, 49, 11, 32, 43, 12, 17, 63…
#> $ cohort_start_date <date> 1965-06-23, 1972-04-10, 1914-07-09, 1960-06-20, …
#> $ cohort_end_date <date> 1969-12-20, 1974-06-11, 1937-09-07, 1993-04-28, …
#> $ index_date <date> 1965-06-23, 1972-04-10, 1914-07-09, 1993-04-28, …
#> $ marker_date <date> 1969-12-20, 1974-06-11, 1937-09-07, 1960-06-20, …
See that the generated table has the format of an OMOP CDM cohort,
but it also includes two additional columns: the index_date
and the marker_date
, which are the
cohort_start_date
of the index and marker episode
respectively. The cohort_start_date and the cohort_end_date
are defined as:
cohort_start_date
: earliest
cohort_start_date
between the index and the marker
events.cohort_end_date
: latest
cohort_start_date
between the index and the marker
events.The cohort_definition_id
in the output is associated
with the cohort_definition_id}
of the index table
(indexId
) and the cohort_definition_id
of the
marker table (markerId
). To see the correspondence, one
could do the following:
attr(cdm$intersect, "cohort_set")
#> # Source: table<main.intersect_set> [1 x 10]
#> # Database: DuckDB v0.10.1 [xihangc@Windows 10 x64:R 4.3.1/C:\Users\xihangc\AppData\Local\Temp\RtmpA5cYib\file3c807582296c.duckdb]
#> cohort_definition_id cohort_name index_id index_name marker_id marker_name
#> <int> <chr> <int> <chr> <int> <chr>
#> 1 1 index_1191_asp… 1 1191_aspi… 1 161_acetam…
#> # ℹ 4 more variables: days_prior_observation <dbl>, washout_window <dbl>,
#> # index_marker_gap <chr>, combination_window <chr>
The user may also wish to subset the index table and marker table
based on their cohort_definition_id using indexId
and
markerId
respectively. For example, the following code only
includes cohort_definidtion_id
\(= 1\) from both the index and the marker
table.
We can restrict the study period of the analysis to only include episodes or events happening during a specific period of time. See figure below to see an example of an analysis containing six different participants.
Notice that, by imposing a restriction on study period, some of the participants might be excluded. For example, participant 4 is excluded because the only index episode is outside of the study period whereas participant 6 is included because he/she does have an index episode within the study period.
The study period can be restricted using the
cohortDateRange
argument, which is defined as:
cohortDateRange = c(start_of_the_study_period, end_of_the_study_period)
See an example of the usage below, where we have restricted the
cohortDateRange
within 01/01/1950 until 01/01/1969.
Consequently, the cohort range falls into the pre-specified period:
cdm <- generateSequenceCohortSet(
cdm = cdm,
indexTable = "aspirin",
markerTable = "acetaminophen",
name = "intersect",
cohortDateRange = as.Date(c("1950-01-01","1969-01-01")),
combinationWindow = c(0,Inf))
cdm$intersect %>%
dplyr::summarise(min_cohort_start_date = min(cohort_start_date),
max_cohort_start_date = max(cohort_start_date),
min_cohort_end_date = min(cohort_end_date),
max_cohort_end_date = max(cohort_end_date)) %>%
dplyr::glimpse()
#> Rows: ??
#> Columns: 4
#> Database: DuckDB v0.10.1 [xihangc@Windows 10 x64:R 4.3.1/C:\Users\xihangc\AppData\Local\Temp\RtmpA5cYib\file3c807582296c.duckdb]
#> $ min_cohort_start_date <date> 1950-01-02
#> $ max_cohort_start_date <date> 1968-09-08
#> $ min_cohort_end_date <date> 1950-07-19
#> $ max_cohort_end_date <date> 1969-01-01
We can also specify the minimum prior history that an individual has to have before the start of the first event. Individuals with not enough prior history will be excluded. See the figure below, imagine the prior observation history is set to be 31 days, then participant 5 would be excluded because the first event happening within the study period does not have more than (or equal to) 31 days of prior history: