scRNA-seq Dataset Integration using R-based tools

## Warning: package 'knitr' was built under R version 4.0.5

The code below will install some packages if you don’t have them already.

if (!require("RColorBrewer"))   install.packages("RColorBrewer")
if (!require("knitr"))     install.packages("knitr")
if (!require("dplyr"))     install.packages("dplyr")
if (!require("cowplot"))   install.packages("cowplot")
if (!require("harmony"))   install.packages("harmony")
if (!require("rliger"))    install.packages("rliger")
if (!require("remotes"))    install.packages("remotes")

if (!require("SeuratWrappers"))     remotes::install_github('satijalab/seurat-wrappers')

## sitmo        (2.0.1   -> 2.0.2 ) [CRAN]
## dqrng        (0.2.1   -> 0.3.0 ) [CRAN]
## irlba        (2.3.3   -> 2.3.5 ) [CRAN]
## RcppAnnoy    (0.0.18  -> 0.0.19) [CRAN]
## goftest      (1.2-2   -> 1.2-3 ) [CRAN]
## fs           (1.5.0   -> 1.5.2 ) [CRAN]
## digest       (0.6.28  -> 0.6.29) [CRAN]
## jquerylib    (0.1.3   -> 0.1.4 ) [CRAN]
## sass         (0.3.1   -> 0.4.0 ) [CRAN]
## cachem       (1.0.4   -> 1.0.6 ) [CRAN]
## bslib        (0.2.4   -> 0.3.1 ) [CRAN]
## glue         (1.4.2   -> 1.5.1 ) [CRAN]
## crayon       (1.4.1   -> 1.4.2 ) [CRAN]
## later        (1.1.0.1 -> 1.3.0 ) [CRAN]
## htmltools    (0.5.1.1 -> 0.5.2 ) [CRAN]
## fontawesome  (NA      -> 0.2.2 ) [CRAN]
## mime         (0.10    -> 0.12  ) [CRAN]
## httpuv       (1.5.5   -> 1.6.3 ) [CRAN]
## parallelly   (1.24.0  -> 1.29.0) [CRAN]
## future       (1.21.0  -> 1.23.0) [CRAN]
## future.apply (1.7.0   -> 1.8.1 ) [CRAN]
## bitops       (1.0-6   -> 1.0-7 ) [CRAN]
## gtools       (3.8.2   -> 3.9.2 ) [CRAN]
## here         (NA      -> 1.0.1 ) [CRAN]
## openssl      (1.4.3   -> 1.4.5 ) [CRAN]
## generics     (0.1.0   -> 0.1.1 ) [CRAN]
## crosstalk    (1.1.1   -> 1.2.0 ) [CRAN]
## htmlwidgets  (1.5.3   -> 1.5.4 ) [CRAN]
## shiny        (1.6.0   -> 1.7.1 ) [CRAN]
## igraph       (1.2.6   -> 1.2.9 ) [CRAN]
## reticulate   (1.18    -> 1.22  ) [CRAN]
## uwot         (0.1.10  -> 0.1.11) [CRAN]
## SeuratObject (4.0.0   -> 4.0.4 ) [CRAN]
## plotly       (4.9.3   -> 4.10.0) [CRAN]
## pbapply      (1.4-3   -> 1.5-0 ) [CRAN]
## lmtest       (0.9-38  -> 0.9-39) [CRAN]
## leiden       (0.3.7   -> 0.3.9 ) [CRAN]
## fitdistrplus (1.1-3   -> 1.1-6 ) [CRAN]
## Seurat       (4.0.1   -> 4.0.5 ) [CRAN]
## rsvd         (1.0.3   -> 1.0.5 ) [CRAN]
##   
   checking for file ‘/tmp/RtmpsTai4O/remotesc7b2357539a/satijalab-seurat-wrappers-8510069/DESCRIPTION’ ...
  
✓  checking for file ‘/tmp/RtmpsTai4O/remotesc7b2357539a/satijalab-seurat-wrappers-8510069/DESCRIPTION’
## 
  
─  preparing ‘SeuratWrappers’:
##    checking DESCRIPTION meta-information ...
  
✓  checking DESCRIPTION meta-information
## 
  
─  checking for LF line-endings in source and make files and shell scripts
## 
  
─  checking for empty or unneeded directories
## 
  
─  building ‘SeuratWrappers_0.3.0.tar.gz’
## 
  
   
##

if (!requireNamespace("BiocManager", quietly = TRUE)) install.packages("BiocManager")
if (!require("glmGamPoi")) BiocManager::install("glmGamPoi")

Other packages required for this tutorial should already be installed in your JupyterHub by default:

library(Seurat)
library(SeuratWrappers)
library(DropletUtils)
library(ggplot2)
library(RColorBrewer)
library(knitr)
library(dplyr)
library(harmony)
library(rliger)
library(glmGamPoi)
library(patchwork)
library(cowplot)

Let’s also load a custom function, plot_integrated_clusters, that is stored in /data as custom_seurat_functions.R:

source("../data/custom_seurat_functions.R")

Introduction

As more and more scRNA-seq datasets become available, carrying merged_seurat comparisons between them is key. There are two main approaches to comparing scRNASeq datasets. The first approach is “label-centric” which is focused on trying to identify equivalent cell-types/states across datasets by comparing individual cells or groups of cells. The other approach is “cross-dataset normalization” which attempts to computationally remove experiment-specific technical/biological effects so that data from multiple experiments can be combined and jointly analyzed.

We will illustrate this process using two very different datasets: 1) 10k PBMC (3’ v3 chemistry) downloaded from 10x Genomics website; 2) major cell populations isolated from whole human blood using FACS, and sequenced using STRT-seq (GSE149938).

Although we already have all the necessary files in our /data folder, we can download the necessary files from GEO database:

download.file("https://ftp.ncbi.nlm.nih.gov/geo/series/GSE149nnn/GSE149938/suppl/GSE149938_umi_matrix.csv.gz",
              destfile = "GSE149938_umi_matrix.csv.gz")
download.file("https://cf.10xgenomics.com/samples/cell-exp/4.0.0/Parent_NGSC3_DI_PBMC/Parent_NGSC3_DI_PBMC_filtered_feature_bc_matrix.h5",
              destfile = "pbmc10k_filt.h5")

Let’s read the processed file available via GEO, and the 10x file we’ve processed using SoupX (see another notebook in this repository):

umi_gz <- gzfile("../data/GSE149938_umi_matrix.csv.gz",'rt')  
umi <- read.csv(umi_gz,check.names = F,quote = "")
matrix_3p    <- Read10X("../data/soupX_pbmc10k_filt")

Next, let’s make Seurat objects and re-define some of the metadata columns (GEO dataset simply puts the cell type into the orig.ident slot, which will interfere with what we want to do next):

srat_wb <- CreateSeuratObject(t(umi),project = "whole_blood")

## Warning: Non-unique features (rownames) present in the input matrix, making
## unique

srat_3p <- CreateSeuratObject(matrix_3p,project = "pbmc10k_3p")
rm(umi_gz)
rm(umi)
rm(matrix_3p)

colnames(srat_wb@meta.data)[1] <- "cell_type"
srat_wb@meta.data$orig.ident <- "whole_blood"
srat_wb@meta.data$orig.ident <- as.factor(srat_wb@meta.data$orig.ident)
head(srat_wb[[]])

##                    cell_type nCount_RNA nFeature_RNA  orig.ident
## BNK_spBM1_L1_bar25       BNK      24494         1869 whole_blood
## BNK_spBM1_L1_bar26       BNK      61980         2051 whole_blood
## BNK_spBM1_L1_bar27       BNK     124382         3872 whole_blood
## BNK_spBM1_L1_bar28       BNK       8144         1475 whole_blood
## BNK_spBM1_L1_bar29       BNK      53612         2086 whole_blood
## BNK_spBM1_L1_bar30       BNK      33582         2038 whole_blood

Do basic quality control. STRT-Seq is quite different from 10x and has a lot more detected genes per cell. Also, for some reason we don’t have the MT genes in the quantified matrix of the whole blood dataset. That’s unfortunate, but not critical.

srat_wb <- SetIdent(srat_wb,value = "orig.ident")

srat_wb[["percent.mt"]] <- PercentageFeatureSet(srat_wb, pattern = "^MT-")
srat_wb[["percent.rbp"]] <- PercentageFeatureSet(srat_wb, pattern = "^RP[SL]")
srat_3p[["percent.mt"]] <- PercentageFeatureSet(srat_3p, pattern = "^MT-")
srat_3p[["percent.rbp"]] <- PercentageFeatureSet(srat_3p, pattern = "^RP[SL]")

VlnPlot(srat_wb, features = c("nFeature_RNA","nCount_RNA","percent.mt","percent.rbp"), ncol = 4)

VlnPlot(srat_3p, features = c("nFeature_RNA","nCount_RNA","percent.mt","percent.rbp"), ncol = 4)

The annotation that was used to process the GEO whole blood dataset is quite different from the Cell Ranger GRCh38-2020A. Let’s see how many common genes are there:

table(rownames(srat_3p) %in% rownames(srat_wb))

## 
## FALSE  TRUE 
## 18050 18551

common_genes <- rownames(srat_3p)[rownames(srat_3p) %in% rownames(srat_wb)]

Let’s filter the cells with too high or too low number of genes, or too high MT gene content. Generally speaking, cleaner datasets (both in terms of ambient RNA and in terms of low-quality cells) integrate better.

Also, let’s limit the individual matrices to common genes only:

srat_3p <- subset(srat_3p, subset = nFeature_RNA > 1000 & nFeature_RNA < 6000 & percent.mt < 15)
srat_wb <- subset(srat_wb, subset = nFeature_RNA > 1000)

srat_3p <- srat_3p[rownames(srat_3p) %in% common_genes,]
srat_wb <- srat_wb[rownames(srat_wb) %in% common_genes,]

As previously for Seurat v3, let’s make a list and normalize/find HVG for each object:

wb_list <- list()
wb_list[["pbmc10k_3p"]]   <- srat_3p
wb_list[["whole_blood"]]  <- srat_wb

for (i in 1:length(wb_list)) {
  wb_list[[i]] <- NormalizeData(wb_list[[i]], verbose = F)
  wb_list[[i]] <- FindVariableFeatures(wb_list[[i]], selection.method = "vst", nfeatures = 2000, verbose = F)
}

Here we actually do the integration. Seurat 3 does it in two steps.

wb_anchors <- FindIntegrationAnchors(object.list = wb_list, dims = 1:30)

## Computing 2000 integration features

## Scaling features for provided objects

## Finding all pairwise anchors

## Running CCA

## Merging objects

## Finding neighborhoods

## Finding anchors

##  Found 11481 anchors

## Filtering anchors

##  Retained 3178 anchors

wb_seurat  <- IntegrateData(anchorset = wb_anchors, dims = 1:30)

## Merging dataset 2 into 1

## Extracting anchors for merged samples

## Finding integration vectors

## Finding integration vector weights

## Integrating data

rm(wb_list)
rm(wb_anchors)

Let’s do the basic processing and visualization of the uncorrected dataset:

DefaultAssay(wb_seurat) <- "RNA"
wb_seurat <- NormalizeData(wb_seurat, verbose = F)
wb_seurat <- FindVariableFeatures(wb_seurat, selection.method = "vst", nfeatures = 2000, verbose = F)
wb_seurat <- ScaleData(wb_seurat, verbose = F)
wb_seurat <- RunPCA(wb_seurat, npcs = 30, verbose = F)
wb_seurat <- RunUMAP(wb_seurat, reduction = "pca", dims = 1:30, verbose = F)

## Warning: The default method for RunUMAP has changed from calling Python UMAP via reticulate to the R-native UWOT using the cosine metric
## To use Python UMAP via reticulate, set umap.method to 'umap-learn' and metric to 'correlation'
## This message will be shown once per session

DimPlot(wb_seurat, reduction = "umap") + plot_annotation(title = "10k 3' PBMC and whole blood, before integration")

DimPlot(wb_seurat, reduction = "umap", group.by = "orig.ident", pt.size = .1, split.by = 'orig.ident') + NoLegend()

Now, let’s take a look at the integrated dataset (it’s already normalized and HVGs are selected):

DefaultAssay(wb_seurat) <- "integrated"
wb_seurat <- ScaleData(wb_seurat, verbose = F)
wb_seurat <- RunPCA(wb_seurat, npcs = 30, verbose = F)
wb_seurat <- RunUMAP(wb_seurat, reduction = "pca", dims = 1:30, verbose = F)

DimPlot(wb_seurat, reduction = "umap") + plot_annotation(title = "10k 3' PBMC and white blood cells, after integration (Seurat 3)")

Let’s look at some markers:

FeaturePlot(wb_seurat,c("MS4A1","LYZ","NKG7","PPBP","LTF","HBA1","FCER1A","IL7R","FCGR3B")) & scale_colour_gradientn(colours = rev(brewer.pal(n = 11, name = "Spectral")))

From the plot we can see that there are some significant cell types that are absent from PBMC dataset, but exist in the whole blood dataset. LTF gene is the most prominent marker of neutrophils, and HBA1 is a hemoglobin gene expressed in erythrocytes.

Now let’s cluster the integrated matrix and look how clusters are distributed between the two sets:

wb_seurat <- FindNeighbors(wb_seurat, dims = 1:30, k.param = 10, verbose = F)
wb_seurat <- FindClusters(wb_seurat, verbose = F)
DimPlot(wb_seurat,label = T) + NoLegend()

Cluster composition shows many clusters unique to the whole blood dataset:

count_table <- table(wb_seurat@meta.data$seurat_clusters, wb_seurat@meta.data$orig.ident)
count_table

##     
##      pbmc10k_3p whole_blood
##   0        1466         217
##   1        1553          94
##   2        1326         197
##   3        1213         109
##   4        1087         107
##   5           0         906
##   6         354         460
##   7           7         677
##   8         437         216
##   9           1         555
##   10        348         187
##   11        321         157
##   12          0         450
##   13        364          83
##   14          3         413
##   15        288         124
##   16          0         377
##   17         11         366
##   18          8         353
##   19          0         341
##   20         13         313
##   21        204           8
##   22          0         205
##   23          0         203
##   24         19         178
##   25          0         185
##   26        118          17
##   27         99          19
##   28          0          68
##   29          0          58

plot_integrated_clusters(wb_seurat)

## Using cluster as id variables

We can take advantage of the metadata that was present in GSE149938:

meta <- wb_seurat[[]]
table(meta[meta$seurat_clusters == '5',]$cell_type) ## erythrocytes

## 
##  ery immB  MEP preB proB 
##  900    2    1    2    1

table(meta[meta$seurat_clusters == '20',]$cell_type) ## neutrophils

## 
##    BNK    CLP    ery    GMP   immB kineNK   LMPP    MLP    NKP   preB   proB 
##     74     75      6      8      1      1      2     35     30     33     28 
##   regB 
##     20

table(meta[meta$seurat_clusters == '24',]$cell_type) ## plasma B cells

## 
##    ery   naiB plasma 
##     11      1    166

rm(wb_seurat)

Harmony, 3’ 10k PBMC cells and whole blood STRT-Seq

Similarly to the previous approaches, let’s make a merged Seurat dataset, normalize and process it:

wb_harmony    <- merge(srat_3p,srat_wb)

wb_harmony <- NormalizeData(wb_harmony, verbose = F)
wb_harmony <- FindVariableFeatures(wb_harmony, selection.method = "vst", nfeatures = 2000, verbose = F)
wb_harmony <- ScaleData(wb_harmony, verbose = F)
wb_harmony <- RunPCA(wb_harmony, npcs = 30, verbose = F)
wb_harmony <- RunUMAP(wb_harmony, reduction = "pca", dims = 1:30, verbose = F)

We can take a look at the PCA plot for a change, as well as distributions along the first principal component:

p1 <- DimPlot(object = wb_harmony, reduction = "pca", pt.size = .1, group.by = "orig.ident") + NoLegend()
p2 <- VlnPlot(object = wb_harmony, features = "PC_1", group.by = "orig.ident", pt.size = .1) + NoLegend()
plot_grid(p1,p2)

UMAP also shows clear differences between the datasets (we saw this plot above already):

DimPlot(wb_harmony,reduction = "umap") + plot_annotation(title = "10k 3' PBMC and whole blood, before integration")

Let’s run harmony using a simple wrapper named RunHarmony from SeuratWrappers library:

wb_harmony <- wb_harmony %>% RunHarmony("orig.ident", plot_convergence = T)

## Harmony 1/10

## Harmony 2/10

## Harmony 3/10

## Harmony converged after 3 iterations

## Warning: Invalid name supplied, making object name syntactically valid. New
## object name is Seurat..ProjectDim.RNA.harmony; see ?make.names for more details
## on syntax validity

This generates the embeddings that we shall later use for all downstream analysis.

harmony_embeddings <- Embeddings(wb_harmony, 'harmony')
harmony_embeddings[1:5, 1:5]

##                     harmony_1  harmony_2 harmony_3  harmony_4  harmony_5
## AAACCCACATAACTCG-1  3.2402720  0.5517086 -4.014592 -0.2490492 -0.4555420
## AAACCCACATGTAACC-1  4.3992278 -0.2653914  8.905441  5.7085193 -2.6650973
## AAACCCAGTGAGTCAG-1  3.6924188  1.4199533 -2.323807 -1.7352863  1.2648235
## AAACGAACAGTCAGTT-1 -0.8957237 -3.2318868 -2.398822 -0.1342351 -1.9238917
## AAACGAACATTCGGGC-1  4.6218704  1.6066620  1.914114 -1.7354947  0.8168312

Corrected PCA and distribution:

p1 <- DimPlot(object = wb_harmony, reduction = "harmony", pt.size = .1, group.by = "orig.ident") + NoLegend()
p2 <- VlnPlot(object = wb_harmony, features = "harmony_1", group.by = "orig.ident", pt.size = .1) + NoLegend()
plot_grid(p1,p2)

Run UMAP and perform Louvain clustering:

wb_harmony <- wb_harmony %>% 
  RunUMAP(reduction = "harmony", dims = 1:30, verbose = F) %>% 
  FindNeighbors(reduction = "harmony", k.param = 10, dims = 1:30) %>% 
  FindClusters() %>% 
  identity()

## Computing nearest neighbor graph

## Computing SNN

## Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
## 
## Number of nodes: 16883
## Number of edges: 275342
## 
## Running Louvain algorithm...
## Maximum modularity in 10 random starts: 0.9231
## Number of communities: 29
## Elapsed time: 1 seconds

wb_harmony <- SetIdent(wb_harmony,value = "orig.ident")
DimPlot(wb_harmony,reduction = "umap") + plot_annotation(title = "10k 3' PBMC and whole blood, after integration (Harmony)")

DimPlot(wb_harmony, reduction = "umap", group.by = "orig.ident", pt.size = .1, split.by = 'orig.ident') + NoLegend()

Corrected results for this dataset appear to be very similar to Seurat v3:

wb_harmony <- SetIdent(wb_harmony,value = "seurat_clusters")
DimPlot(wb_harmony,label = T) + NoLegend()

More detailed cluster examination also seems to confirm this:

plot_integrated_clusters(wb_harmony)

## Using cluster as id variables

rm(wb_harmony)

LIGER, 3’ 10k PBMC cells and whole blood STRT-Seq

Finally, let’s do data integration with LIGER. This step takes several minutes to run:

wb_liger    <- merge(srat_3p,srat_wb)

wb_liger    <- NormalizeData(wb_liger)
wb_liger    <- FindVariableFeatures(wb_liger)
wb_liger    <- ScaleData(wb_liger, split.by = "orig.ident", do.center = F)

## Scaling data matrix

## Scaling data from split pbmc10k_3p

## Scaling data from split whole_blood

wb_liger    <- RunOptimizeALS(wb_liger, k = 30, lambda = 5, split.by = "orig.ident")

## 
  |                                                                            
  |                                                                      |   0%
  |                                                                            
  |==                                                                    |   3%
  |                                                                            
  |=====                                                                 |   7%
  |                                                                            
  |=======                                                               |  10%
  |                                                                            
  |=========                                                             |  13%
  |                                                                            
  |============                                                          |  17%
  |                                                                            
  |==============                                                        |  20%
  |                                                                            
  |================                                                      |  23%
  |                                                                            
  |===================                                                   |  27%
  |                                                                            
  |=====================                                                 |  30%
  |                                                                            
  |=======================                                               |  33%
  |                                                                            
  |==========================                                            |  37%
  |                                                                            
  |============================                                          |  40%
  |                                                                            
  |==============================                                        |  43%
  |                                                                            
  |=================================                                     |  47%
  |                                                                            
  |===================================                                   |  50%
  |                                                                            
  |=====================================                                 |  53%
  |                                                                            
  |========================================                              |  57%
  |                                                                            
  |==========================================                            |  60%
  |                                                                            
  |============================================                          |  63%
  |                                                                            
  |===============================================                       |  67%
  |                                                                            
  |=================================================                     |  70%
  |                                                                            
  |===================================================                   |  73%
  |                                                                            
  |======================================================                |  77%
  |                                                                            
  |========================================================              |  80%
  |                                                                            
  |==========================================================            |  83%
  |                                                                            
  |=============================================================         |  87%
  |                                                                            
  |===============================================================       |  90%
  |                                                                            
  |=================================================================     |  93%
  |                                                                            
  |====================================================================  |  97%
  |                                                                            
  |======================================================================| 100%
## Finished in 10.07576 mins, 30 iterations.
## Max iterations set: 30.
## Final objective delta: 4.18028e-05.
## Best results with seed 1.

## Warning: No columnames present in cell embeddings, setting to 'riNMF_1:30'

wb_liger    <- RunQuantileNorm(wb_liger, split.by = "orig.ident")

We will then perform Louvain clustering (FindNeighbors and FindClusters) with the settings similar to what we have been using before:

wb_liger    <- FindNeighbors(wb_liger,reduction = "iNMF",k.param = 10,dims = 1:30)

## Computing nearest neighbor graph

## Computing SNN

wb_liger    <- FindClusters(wb_liger)

## Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
## 
## Number of nodes: 16883
## Number of edges: 276592
## 
## Running Louvain algorithm...
## Maximum modularity in 10 random starts: 0.9233
## Number of communities: 38
## Elapsed time: 0 seconds

Let’s look at the corrected UMAP plot in a couple of different ways:

wb_liger    <- RunUMAP(wb_liger, dims = 1:ncol(wb_liger[["iNMF"]]), reduction = "iNMF",verbose = F)
wb_liger <- SetIdent(wb_liger,value = "orig.ident")
DimPlot(wb_liger,reduction = "umap") + plot_annotation(title = "10k 3' PBMC and 10k 5' PBMC cells, after integration (LIGER)")

DimPlot(wb_liger, reduction = "umap", group.by = "orig.ident", pt.size = .1, split.by = 'orig.ident') + NoLegend()

Finally, a look at distribution of datasets per cluster:

plot_integrated_clusters(wb_liger)

## Using cluster as id variables

rm(wb_liger)
rm(srat_wb)
rm(srat_3p)

sessionInfo()

View session info

## R version 4.0.4 (2021-02-15)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 20.04.1 LTS
## 
## Matrix products: default
## BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.9.0
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.9.0
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## attached base packages:
## [1] parallel  stats4    stats     graphics  grDevices utils     datasets 
## [8] methods   base     
## 
## other attached packages:
##  [1] reshape2_1.4.4              rliger_1.0.0               
##  [3] patchwork_1.1.1             Matrix_1.4-0               
##  [5] ggplot2_3.3.5               DropletUtils_1.10.3        
##  [7] SingleCellExperiment_1.12.0 SummarizedExperiment_1.20.0
##  [9] Biobase_2.50.0              GenomicRanges_1.42.0       
## [11] GenomeInfoDb_1.26.4         IRanges_2.24.1             
## [13] S4Vectors_0.28.1            BiocGenerics_0.36.1        
## [15] MatrixGenerics_1.2.1        matrixStats_0.61.0         
## [17] SeuratWrappers_0.3.0        SeuratObject_4.0.4         
## [19] Seurat_4.0.5                glmGamPoi_1.2.0            
## [21] remotes_2.4.2               harmony_1.0                
## [23] Rcpp_1.0.7                  cowplot_1.1.1              
## [25] dplyr_1.0.7                 RColorBrewer_1.1-2         
## [27] knitr_1.36                 
## 
## loaded via a namespace (and not attached):
##   [1] plyr_1.8.6                igraph_1.2.6             
##   [3] lazyeval_0.2.2            splines_4.0.4            
##   [5] BiocParallel_1.24.1       listenv_0.8.0            
##   [7] scattermore_0.7           digest_0.6.29            
##   [9] foreach_1.5.1             htmltools_0.5.2          
##  [11] fansi_0.5.0               magrittr_2.0.1           
##  [13] doParallel_1.0.16         tensor_1.5               
##  [15] cluster_2.1.1             ROCR_1.0-11              
##  [17] limma_3.46.0              globals_0.14.0           
##  [19] R.utils_2.10.1            spatstat.sparse_2.0-0    
##  [21] colorspace_2.0-2          ggrepel_0.9.1            
##  [23] xfun_0.28                 crayon_1.4.2             
##  [25] RCurl_1.98-1.3            jsonlite_1.7.2           
##  [27] spatstat.data_2.1-0       iterators_1.0.13         
##  [29] survival_3.2-10           zoo_1.8-9                
##  [31] glue_1.5.1                polyclip_1.10-0          
##  [33] gtable_0.3.0              zlibbioc_1.36.0          
##  [35] XVector_0.30.0            leiden_0.3.9             
##  [37] DelayedArray_0.16.3       Rhdf5lib_1.12.1          
##  [39] future.apply_1.8.1        HDF5Array_1.18.1         
##  [41] abind_1.4-5               scales_1.1.1             
##  [43] edgeR_3.32.1              DBI_1.1.1                
##  [45] miniUI_0.1.1.1            riverplot_0.10           
##  [47] viridisLite_0.4.0         xtable_1.8-4             
##  [49] dqrng_0.3.0               reticulate_1.22          
##  [51] spatstat.core_2.3-2       mclust_5.4.8             
##  [53] bit_4.0.4                 rsvd_1.0.5               
##  [55] htmlwidgets_1.5.4         httr_1.4.2               
##  [57] FNN_1.1.3                 ellipsis_0.3.2           
##  [59] ica_1.0-2                 farver_2.1.0             
##  [61] scuttle_1.0.4             pkgconfig_2.0.3          
##  [63] R.methodsS3_1.8.1         sass_0.4.0               
##  [65] uwot_0.1.11               deldir_1.0-6             
##  [67] locfit_1.5-9.4            utf8_1.2.2               
##  [69] labeling_0.4.2            tidyselect_1.1.1         
##  [71] rlang_0.4.12              later_1.3.0              
##  [73] munsell_0.5.0             tools_4.0.4              
##  [75] generics_0.1.1            ggridges_0.5.3           
##  [77] evaluate_0.14             stringr_1.4.0            
##  [79] fastmap_1.1.0             yaml_2.2.1               
##  [81] goftest_1.2-3             bit64_4.0.5              
##  [83] fitdistrplus_1.1-6        purrr_0.3.4              
##  [85] RANN_2.6.1                sparseMatrixStats_1.2.1  
##  [87] pbapply_1.5-0             future_1.23.0            
##  [89] nlme_3.1-152              mime_0.12                
##  [91] R.oo_1.24.0               hdf5r_1.3.3              
##  [93] compiler_4.0.4            plotly_4.10.0            
##  [95] png_0.1-7                 spatstat.utils_2.2-0     
##  [97] tibble_3.1.6              bslib_0.3.1              
##  [99] stringi_1.7.6             highr_0.9                
## [101] RSpectra_0.16-0           lattice_0.20-41          
## [103] vctrs_0.3.8               rhdf5filters_1.2.0       
## [105] pillar_1.6.4              lifecycle_1.0.1          
## [107] BiocManager_1.30.16       spatstat.geom_2.3-0      
## [109] lmtest_0.9-39             jquerylib_0.1.4          
## [111] RcppAnnoy_0.0.19          data.table_1.14.2        
## [113] bitops_1.0-7              irlba_2.3.5              
## [115] httpuv_1.6.3              R6_2.5.1                 
## [117] promises_1.2.0.1          KernSmooth_2.23-18       
## [119] gridExtra_2.3             parallelly_1.29.0        
## [121] codetools_0.2-18          MASS_7.3-53.1            
## [123] assertthat_0.2.1          rhdf5_2.34.0             
## [125] withr_2.4.3               sctransform_0.3.2        
## [127] GenomeInfoDbData_1.2.4    mgcv_1.8-34              
## [129] beachmat_2.6.4            grid_4.0.4               
## [131] rpart_4.1-15              tidyr_1.1.4              
## [133] DelayedMatrixStats_1.12.3 rmarkdown_2.7            
## [135] Rtsne_0.15                shiny_1.7.1

Dataset integration using Seurat and other R tools

scRNA-seq Dataset Integration using R-based tools

Introduction

Harmony, 3’ 10k PBMC cells and whole blood STRT-Seq

LIGER, 3’ 10k PBMC cells and whole blood STRT-Seq

sessionInfo()