Brings Seurat to the tidyverse!
website: stemangiola.github.io/tidyseurat/
Please also have a look at
tidyseurat provides a bridge between the Seurat single-cell package [@butler2018integrating; @stuart2019comprehensive] and the tidyverse [@wickham2019welcome]. It creates an invisible layer that enables viewing the Seurat object as a tidyverse tibble, and provides Seurat-compatible dplyr, tidyr, ggplot and plotly functions.
Seurat-compatible Functions | Description |
---|---|
all |
After all tidyseurat is a Seurat object, just better |
tidyverse Packages | Description |
---|---|
dplyr |
All dplyr APIs like for any tibble |
tidyr |
All tidyr APIs like for any tibble |
ggplot2 |
ggplot like for any tibble |
plotly |
plot_ly like for any tibble |
Utilities | Description |
---|---|
tidy |
Add tidyseurat invisible layer over a Seurat object |
as_tibble |
Convert cell-wise information to a tbl_df |
join_features |
Add feature-wise information, returns a tbl_df |
From CRAN
install.packages("tidyseurat")
From Github (development)
devtools::install_github("stemangiola/tidyseurat")
library(dplyr)
library(tidyr)
library(purrr)
library(magrittr)
library(ggplot2)
library(Seurat)
library(tidyseurat)
tidyseurat
, the best of both worlds!This is a seurat object but it is evaluated as tibble. So it is fully compatible both with Seurat and tidyverse APIs.
data("pbmc_small")
It looks like a tibble
pbmc_small
## # A Seurat-tibble abstraction: 80 × 15
## [90m# Features=230 | Active assay=RNA | Assays=RNA[39m
## .cell orig.ident nCount_RNA nFeature_RNA RNA_snn_res.0.8 letter.idents groups
## <chr> <fct> <dbl> <int> <fct> <fct> <chr>
## 1 ATGC… SeuratPro… 70 47 0 A g2
## 2 CATG… SeuratPro… 85 52 0 A g1
## 3 GAAC… SeuratPro… 87 50 1 B g2
## 4 TGAC… SeuratPro… 127 56 0 A g2
## 5 AGTC… SeuratPro… 173 53 0 A g2
## 6 TCTG… SeuratPro… 70 48 0 A g1
## 7 TGGT… SeuratPro… 64 36 0 A g1
## 8 GCAG… SeuratPro… 72 45 0 A g1
## 9 GATA… SeuratPro… 52 36 0 A g1
## 10 AATG… SeuratPro… 100 41 0 A g1
## # … with 70 more rows, and 8 more variables: RNA_snn_res.1 <fct>, PC_1 <dbl>,
## # PC_2 <dbl>, PC_3 <dbl>, PC_4 <dbl>, PC_5 <dbl>, tSNE_1 <dbl>, tSNE_2 <dbl>
But it is a Seurat object after all
pbmc_small@assays
## $RNA
## Assay data with 230 features for 80 cells
## Top 10 variable features:
## PPBP, IGLL5, VDAC3, CD1C, AKR1C3, PF4, MYL9, GNLY, TREML1, CA2
Set colours and theme for plots.
# Use colourblind-friendly colours
friendly_cols <- c("#88CCEE", "#CC6677", "#DDCC77", "#117733", "#332288", "#AA4499", "#44AA99", "#999933", "#882255", "#661100", "#6699CC")
# Set theme
my_theme <-
list(
scale_fill_manual(values = friendly_cols),
scale_color_manual(values = friendly_cols),
theme_bw() +
theme(
panel.border = element_blank(),
axis.line = element_line(),
panel.grid.major = element_line(size = 0.2),
panel.grid.minor = element_line(size = 0.1),
text = element_text(size = 12),
legend.position = "bottom",
aspect.ratio = 1,
strip.background = element_blank(),
axis.title.x = element_text(margin = margin(t = 10, r = 10, b = 10, l = 10)),
axis.title.y = element_text(margin = margin(t = 10, r = 10, b = 10, l = 10))
)
)
We can treat pbmc_small
effectively as a normal tibble for plotting.
Here we plot number of features per cell.
pbmc_small %>%
tidyseurat::ggplot(aes(nFeature_RNA, fill = groups)) +
geom_histogram() +
my_theme
Here we plot total features per cell.
pbmc_small %>%
tidyseurat::ggplot(aes(groups, nCount_RNA, fill = groups)) +
geom_boxplot(outlier.shape = NA) +
geom_jitter(width = 0.1) +
my_theme
Here we plot abundance of two features for each group.
pbmc_small %>%
join_features(features = c("HLA-DRA", "LYZ")) %>%
ggplot(aes(groups, .abundance_RNA + 1, fill = groups)) +
geom_boxplot(outlier.shape = NA) +
geom_jitter(aes(size = nCount_RNA), alpha = 0.5, width = 0.2) +
scale_y_log10() +
my_theme
Also you can treat the object as Seurat object and proceed with data processing.
pbmc_small_pca <-
pbmc_small %>%
SCTransform(verbose = FALSE) %>%
FindVariableFeatures(verbose = FALSE) %>%
RunPCA(verbose = FALSE)
pbmc_small_pca
## # A Seurat-tibble abstraction: 80 × 17
## [90m# Features=220 | Active assay=SCT | Assays=RNA, SCT[39m
## .cell orig.ident nCount_RNA nFeature_RNA RNA_snn_res.0.8 letter.idents groups
## <chr> <fct> <dbl> <int> <fct> <fct> <chr>
## 1 ATGC… SeuratPro… 70 47 0 A g2
## 2 CATG… SeuratPro… 85 52 0 A g1
## 3 GAAC… SeuratPro… 87 50 1 B g2
## 4 TGAC… SeuratPro… 127 56 0 A g2
## 5 AGTC… SeuratPro… 173 53 0 A g2
## 6 TCTG… SeuratPro… 70 48 0 A g1
## 7 TGGT… SeuratPro… 64 36 0 A g1
## 8 GCAG… SeuratPro… 72 45 0 A g1
## 9 GATA… SeuratPro… 52 36 0 A g1
## 10 AATG… SeuratPro… 100 41 0 A g1
## # … with 70 more rows, and 10 more variables: RNA_snn_res.1 <fct>,
## # nCount_SCT <dbl>, nFeature_SCT <int>, PC_1 <dbl>, PC_2 <dbl>, PC_3 <dbl>,
## # PC_4 <dbl>, PC_5 <dbl>, tSNE_1 <dbl>, tSNE_2 <dbl>
If a tool is not included in the tidyseurat collection, we can use as_tibble
to permanently convert tidyseurat
into tibble.
pbmc_small_pca %>%
as_tibble() %>%
select(contains("PC"), everything()) %>%
GGally::ggpairs(columns = 1:5, ggplot2::aes(colour = groups)) +
my_theme
We proceed with cluster identification with Seurat.
pbmc_small_cluster <-
pbmc_small_pca %>%
FindNeighbors(verbose = FALSE) %>%
FindClusters(method = "igraph", verbose = FALSE)
pbmc_small_cluster
## # A Seurat-tibble abstraction: 80 × 19
## [90m# Features=220 | Active assay=SCT | Assays=RNA, SCT[39m
## .cell orig.ident nCount_RNA nFeature_RNA RNA_snn_res.0.8 letter.idents groups
## <chr> <fct> <dbl> <int> <fct> <fct> <chr>
## 1 ATGC… SeuratPro… 70 47 0 A g2
## 2 CATG… SeuratPro… 85 52 0 A g1
## 3 GAAC… SeuratPro… 87 50 1 B g2
## 4 TGAC… SeuratPro… 127 56 0 A g2
## 5 AGTC… SeuratPro… 173 53 0 A g2
## 6 TCTG… SeuratPro… 70 48 0 A g1
## 7 TGGT… SeuratPro… 64 36 0 A g1
## 8 GCAG… SeuratPro… 72 45 0 A g1
## 9 GATA… SeuratPro… 52 36 0 A g1
## 10 AATG… SeuratPro… 100 41 0 A g1
## # … with 70 more rows, and 12 more variables: RNA_snn_res.1 <fct>,
## # nCount_SCT <dbl>, nFeature_SCT <int>, SCT_snn_res.0.8 <fct>,
## # seurat_clusters <fct>, PC_1 <dbl>, PC_2 <dbl>, PC_3 <dbl>, PC_4 <dbl>,
## # PC_5 <dbl>, tSNE_1 <dbl>, tSNE_2 <dbl>
Now we can interrogate the object as if it was a regular tibble data frame.
pbmc_small_cluster %>%
tidyseurat::count(groups, seurat_clusters)
## # A tibble: 8 × 3
## groups seurat_clusters n
## <chr> <fct> <int>
## 1 g1 0 17
## 2 g1 1 14
## 3 g1 2 9
## 4 g1 3 4
## 5 g2 0 13
## 6 g2 1 12
## 7 g2 2 6
## 8 g2 3 5
We can identify cluster markers using Seurat.
# Identify top 10 markers per cluster
markers <-
pbmc_small_cluster %>%
FindAllMarkers(only.pos = TRUE, min.pct = 0.25, thresh.use = 0.25) %>%
group_by(cluster) %>%
top_n(10, avg_log2FC)
# Plot heatmap
pbmc_small_cluster %>%
DoHeatmap(
features = markers$gene,
group.colors = friendly_cols
)
We can calculate the first 3 UMAP dimensions using the Seurat framework.
pbmc_small_UMAP <-
pbmc_small_cluster %>%
RunUMAP(reduction = "pca", dims = 1:15, n.components = 3L, )
And we can plot them using 3D plot using plotly.
pbmc_small_UMAP %>%
plot_ly(
x = ~`UMAP_1`,
y = ~`UMAP_2`,
z = ~`UMAP_3`,
color = ~seurat_clusters,
colors = friendly_cols[1:4]
)
We can infer cell type identities using SingleR [@aran2019reference] and manipulate the output using tidyverse.
# Get cell type reference data
blueprint <- celldex::BlueprintEncodeData()
# Infer cell identities
cell_type_df <-
GetAssayData(pbmc_small_UMAP, slot = 'counts', assay = "SCT") %>%
log1p() %>%
Matrix::Matrix(sparse = TRUE) %>%
SingleR::SingleR(
ref = blueprint,
labels = blueprint$label.main,
method = "single"
) %>%
as.data.frame() %>%
as_tibble(rownames = "cell") %>%
select(cell, first.labels)
# Join UMAP and cell type info
pbmc_small_cell_type <-
pbmc_small_UMAP %>%
left_join(cell_type_df, by = "cell")
# Reorder columns
pbmc_small_cell_type %>%
tidyseurat::select(cell, first.labels, everything())
## # A Seurat-tibble abstraction: 80 × 23
## [90m# Features=220 | Active assay=SCT | Assays=RNA, SCT[39m
## cell first.labels orig.ident nCount_RNA nFeature_RNA RNA_snn_res.0.8
## <chr> <chr> <fct> <dbl> <int> <fct>
## 1 ATGC… CD4+ T-cells SeuratPro… 70 47 0
## 2 CATG… CD8+ T-cells SeuratPro… 85 52 0
## 3 GAAC… CD8+ T-cells SeuratPro… 87 50 1
## 4 TGAC… CD4+ T-cells SeuratPro… 127 56 0
## 5 AGTC… CD4+ T-cells SeuratPro… 173 53 0
## 6 TCTG… CD4+ T-cells SeuratPro… 70 48 0
## 7 TGGT… CD4+ T-cells SeuratPro… 64 36 0
## 8 GCAG… CD4+ T-cells SeuratPro… 72 45 0
## 9 GATA… CD4+ T-cells SeuratPro… 52 36 0
## 10 AATG… CD4+ T-cells SeuratPro… 100 41 0
## # … with 70 more rows, and 17 more variables: letter.idents <fct>,
## # groups <chr>, RNA_snn_res.1 <fct>, nCount_SCT <dbl>, nFeature_SCT <int>,
## # SCT_snn_res.0.8 <fct>, seurat_clusters <fct>, PC_1 <dbl>, PC_2 <dbl>,
## # PC_3 <dbl>, PC_4 <dbl>, PC_5 <dbl>, tSNE_1 <dbl>, tSNE_2 <dbl>,
## # UMAP_1 <dbl>, UMAP_2 <dbl>, UMAP_3 <dbl>
We can easily summarise the results. For example, we can see how cell type classification overlaps with cluster classification.
pbmc_small_cell_type %>%
count(seurat_clusters, first.labels)
## # A tibble: 9 × 3
## seurat_clusters first.labels n
## <fct> <chr> <int>
## 1 0 CD4+ T-cells 8
## 2 0 CD8+ T-cells 10
## 3 0 NK cells 12
## 4 1 Macrophages 1
## 5 1 Monocytes 25
## 6 2 B-cells 10
## 7 2 Macrophages 1
## 8 2 Monocytes 4
## 9 3 Erythrocytes 9
We can easily reshape the data for building information-rich faceted plots.
pbmc_small_cell_type %>%
# Reshape and add classifier column
pivot_longer(
cols = c(seurat_clusters, first.labels),
names_to = "classifier", values_to = "label"
) %>%
# UMAP plots for cell type and cluster
ggplot(aes(UMAP_1, UMAP_2, color = label)) +
geom_point() +
facet_wrap(~classifier) +
my_theme
We can easily plot gene correlation per cell category, adding multi-layer annotations.
pbmc_small_cell_type %>%
# Add some mitochondrial abundance values
mutate(mitochondrial = rnorm(n())) %>%
# Plot correlation
join_features(features = c("CST3", "LYZ"), shape = "wide") %>%
ggplot(aes(CST3 + 1, LYZ + 1, color = groups, size = mitochondrial)) +
geom_point() +
facet_wrap(~first.labels, scales = "free") +
scale_x_log10() +
scale_y_log10() +
my_theme
A powerful tool we can use with tidyseurat is nest
. We can easily perform independent analyses on subsets of the dataset. First we classify cell types in lymphoid and myeloid; then, nest based on the new classification
pbmc_small_nested <-
pbmc_small_cell_type %>%
filter(first.labels != "Erythrocytes") %>%
mutate(cell_class = if_else(`first.labels` %in% c("Macrophages", "Monocytes"), "myeloid", "lymphoid")) %>%
nest(data = -cell_class)
pbmc_small_nested
## # A tibble: 2 × 2
## cell_class data
## <chr> <list>
## 1 lymphoid <Seurat[,40]>
## 2 myeloid <Seurat[,31]>
Now we can independently for the lymphoid and myeloid subsets (i) find variable features, (ii) reduce dimensions, and (iii) cluster using both tidyverse and SingleCellExperiment seamlessly.
pbmc_small_nested_reanalysed <-
pbmc_small_nested %>%
mutate(data = map(
data, ~ .x %>%
FindVariableFeatures(verbose = FALSE) %>%
RunPCA(npcs = 10, verbose = FALSE) %>%
FindNeighbors(verbose = FALSE) %>%
FindClusters(method = "igraph", verbose = FALSE) %>%
RunUMAP(reduction = "pca", dims = 1:10, n.components = 3L, verbose = FALSE)
))
pbmc_small_nested_reanalysed
## # A tibble: 2 × 2
## cell_class data
## <chr> <list>
## 1 lymphoid <Seurat[,40]>
## 2 myeloid <Seurat[,31]>
Now we can unnest and plot the new classification.
pbmc_small_nested_reanalysed %>%
# Convert to tibble otherwise Seurat drops reduced dimensions when unifying data sets.
mutate(data = map(data, ~ .x %>% as_tibble())) %>%
unnest(data) %>%
# Define unique clusters
unite("cluster", c(cell_class, seurat_clusters), remove = FALSE) %>%
# Plotting
ggplot(aes(UMAP_1, UMAP_2, color = cluster)) +
geom_point() +
facet_wrap(~cell_class) +
my_theme
sessionInfo()
## R version 4.1.0 (2021-05-18)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: CentOS Linux 7 (Core)
##
## Matrix products: default
## BLAS: /stornext/System/data/apps/R/R-4.1.0/lib64/R/lib/libRblas.so
## LAPACK: /stornext/System/data/apps/R/R-4.1.0/lib64/R/lib/libRlapack.so
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_US.UTF-8 LC_COLLATE=C
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] tidyseurat_0.5.1 ttservice_0.1.2 SeuratObject_4.0.3 Seurat_4.0.5
## [5] ggplot2_3.3.5 magrittr_2.0.1 purrr_0.3.4 tidyr_1.1.4
## [9] dplyr_1.0.7 knitr_1.36
##
## loaded via a namespace (and not attached):
## [1] Rtsne_0.15 colorspace_2.0-2 deldir_1.0-6
## [4] ellipsis_0.3.2 ggridges_0.5.3 markdown_1.1
## [7] spatstat.data_2.1-0 rstudioapi_0.13 farver_2.1.0
## [10] leiden_0.3.9 listenv_0.8.0 rstan_2.26.6
## [13] ggrepel_0.9.1 RSpectra_0.16-0 fansi_0.5.0
## [16] codetools_0.2-18 splines_4.1.0 polyclip_1.10-0
## [19] jsonlite_1.7.2 ica_1.0-2 cluster_2.1.2
## [22] png_0.1-7 uwot_0.1.10 spatstat.sparse_2.0-0
## [25] shiny_1.7.1 sctransform_0.3.2 compiler_4.1.0
## [28] httr_1.4.2 assertthat_0.2.1 Matrix_1.3-4
## [31] fastmap_1.1.0 lazyeval_0.2.2 limma_3.50.0
## [34] cli_3.1.0 later_1.3.0 htmltools_0.5.2
## [37] prettyunits_1.1.1 tools_4.1.0 igraph_1.2.8
## [40] gtable_0.3.0 glue_1.5.0 RANN_2.6.1
## [43] reshape2_1.4.4 V8_3.6.0 Rcpp_1.0.7
## [46] scattermore_0.7 vctrs_0.3.8 nlme_3.1-153
## [49] lmtest_0.9-39 xfun_0.28 stringr_1.4.0
## [52] globals_0.14.0 ps_1.6.0 mime_0.12
## [55] miniUI_0.1.1.1 lifecycle_1.0.1 irlba_2.3.3
## [58] goftest_1.2-3 future_1.23.0 MASS_7.3-54
## [61] zoo_1.8-9 scales_1.1.1 spatstat.core_2.3-1
## [64] spatstat.utils_2.2-0 promises_1.2.0.1 parallel_4.1.0
## [67] inline_0.3.19 RColorBrewer_1.1-2 curl_4.3.2
## [70] reticulate_1.22 pbapply_1.5-0 gridExtra_2.3
## [73] loo_2.4.1 StanHeaders_2.26.6 rpart_4.1-15
## [76] reshape_0.8.8 stringi_1.7.5 highr_0.9
## [79] pkgbuild_1.3.1 rlang_0.4.12 pkgconfig_2.0.3
## [82] matrixStats_0.61.0 evaluate_0.14 lattice_0.20-45
## [85] tensor_1.5 ROCR_1.0-11 labeling_0.4.2
## [88] patchwork_1.1.1 htmlwidgets_1.5.4 cowplot_1.1.1
## [91] processx_3.5.2 tidyselect_1.1.1 GGally_2.1.2
## [94] parallelly_1.29.0 RcppAnnoy_0.0.19 plyr_1.8.6
## [97] R6_2.5.1 generics_0.1.1 DBI_1.1.2
## [100] mgcv_1.8-38 pillar_1.6.4 withr_2.4.3
## [103] fitdistrplus_1.1-6 abind_1.4-5 survival_3.2-13
## [106] tibble_3.1.6 future.apply_1.8.1 crayon_1.4.2
## [109] KernSmooth_2.23-20 utf8_1.2.2 spatstat.geom_2.3-0
## [112] plotly_4.10.0 grid_4.1.0 data.table_1.14.2
## [115] callr_3.7.0 digest_0.6.28 xtable_1.8-4
## [118] httpuv_1.6.3 RcppParallel_5.1.4 stats4_4.1.0
## [121] munsell_0.5.0 viridisLite_0.4.0