This vignette should introduce you to some typical tasks, using Seurat eco-system. Seurat vignettes are available here. An alternative to this vignette in Python (using scanpy) is also available in the same repository. The interconversion and exploration of datasets from Python to Seurat (and SCE) is described in a separate vignette. Let’s install the packages currently missing in a default JupyterHub environment.

Only run the chunk below if you’re running the code for the first time, and do not have the packages installed.

remotes::install_github('chris-mcginnis-ucsf/DoubletFinder')
if (!requireNamespace("BiocManager", quietly = TRUE)) install.packages("BiocManager")
BiocManager::install("celldex")
BiocManager::install("SingleR")
BiocManager::install("glmGamPoi")

Let’s now load all the libraries that will be needed for the tutorial.

library(Seurat)
library(DropletUtils)
library(ggplot2)
library(DoubletFinder)
library(SingleR)
library(dplyr)
library(celldex)
library(knitr)
library(RColorBrewer)

PART 1. Basic quality control and filtering.

We start the analysis after two preliminary steps have been completed: 1) ambient RNA correction using soupX; 2) doublet detection using scrublet. Both vignettes can be found in this repository.

To start the analysis, let’s read in the corrected matrices:

adj.matrix <- Read10X("soupX_pbmc10k_filt")

After this, we will make a Seurat object. Seurat object summary shows us that 1) number of cells (“samples”) approximately matches the description of each dataset (10194); 2) there are 36601 genes (features) in the reference.

srat <- CreateSeuratObject(adj.matrix,project = "pbmc10k") 
srat
## An object of class Seurat 
## 36601 features across 10194 samples within 1 assay 
## Active assay: RNA (36601 features, 0 variable features)

Let’s look at the combined object a bit closer. Str allows us to see all fields of the class:

str(srat)
## Formal class 'Seurat' [package "SeuratObject"] with 13 slots
##   ..@ assays      :List of 1
##   .. ..$ RNA:Formal class 'Assay' [package "SeuratObject"] with 8 slots
##   .. .. .. ..@ counts       :Formal class 'dgCMatrix' [package "Matrix"] with 6 slots
##   .. .. .. .. .. ..@ i       : int [1:24330253] 25 30 32 42 43 44 51 59 60 62 ...
##   .. .. .. .. .. ..@ p       : int [1:10195] 0 4803 7036 11360 11703 15846 18178 20413 22584 27802 ...
##   .. .. .. .. .. ..@ Dim     : int [1:2] 36601 10194
##   .. .. .. .. .. ..@ Dimnames:List of 2
##   .. .. .. .. .. .. ..$ : chr [1:36601] "MIR1302-2HG" "FAM138A" "OR4F5" "AL627309.1" ...
##   .. .. .. .. .. .. ..$ : chr [1:10194] "AAACCCACATAACTCG-1" "AAACCCACATGTAACC-1" "AAACCCAGTGAGTCAG-1" "AAACCCAGTGCTTATG-1" ...
##   .. .. .. .. .. ..@ x       : num [1:24330253] 1 2 1 1 1 3 1 1 1 1 ...
##   .. .. .. .. .. ..@ factors : list()
##   .. .. .. ..@ data         :Formal class 'dgCMatrix' [package "Matrix"] with 6 slots
##   .. .. .. .. .. ..@ i       : int [1:24330253] 25 30 32 42 43 44 51 59 60 62 ...
##   .. .. .. .. .. ..@ p       : int [1:10195] 0 4803 7036 11360 11703 15846 18178 20413 22584 27802 ...
##   .. .. .. .. .. ..@ Dim     : int [1:2] 36601 10194
##   .. .. .. .. .. ..@ Dimnames:List of 2
##   .. .. .. .. .. .. ..$ : chr [1:36601] "MIR1302-2HG" "FAM138A" "OR4F5" "AL627309.1" ...
##   .. .. .. .. .. .. ..$ : chr [1:10194] "AAACCCACATAACTCG-1" "AAACCCACATGTAACC-1" "AAACCCAGTGAGTCAG-1" "AAACCCAGTGCTTATG-1" ...
##   .. .. .. .. .. ..@ x       : num [1:24330253] 1 2 1 1 1 3 1 1 1 1 ...
##   .. .. .. .. .. ..@ factors : list()
##   .. .. .. ..@ scale.data   : num[0 , 0 ] 
##   .. .. .. ..@ key          : chr "rna_"
##   .. .. .. ..@ assay.orig   : NULL
##   .. .. .. ..@ var.features : logi(0) 
##   .. .. .. ..@ meta.features:'data.frame':   36601 obs. of  0 variables
##   .. .. .. ..@ misc         : list()
##   ..@ meta.data   :'data.frame': 10194 obs. of  3 variables:
##   .. ..$ orig.ident  : Factor w/ 1 level "pbmc10k": 1 1 1 1 1 1 1 1 1 1 ...
##   .. ..$ nCount_RNA  : num [1:10194] 22196 7630 21358 857 15007 ...
##   .. ..$ nFeature_RNA: int [1:10194] 4734 2191 4246 342 4075 2285 2167 2151 5134 3037 ...
##   ..@ active.assay: chr "RNA"
##   ..@ active.ident: Factor w/ 1 level "pbmc10k": 1 1 1 1 1 1 1 1 1 1 ...
##   .. ..- attr(*, "names")= chr [1:10194] "AAACCCACATAACTCG-1" "AAACCCACATGTAACC-1" "AAACCCAGTGAGTCAG-1" "AAACCCAGTGCTTATG-1" ...
##   ..@ graphs      : list()
##   ..@ neighbors   : list()
##   ..@ reductions  : list()
##   ..@ images      : list()
##   ..@ project.name: chr "pbmc10k"
##   ..@ misc        : list()
##   ..@ version     :Classes 'package_version', 'numeric_version'  hidden list of 1
##   .. ..$ : int [1:3] 4 0 0
##   ..@ commands    : list()
##   ..@ tools       : list()

Meta.data is the most important field for next steps. It can be acessed using both @ and [[]] operators. Right now it has 3 fields per celL: dataset ID, number of UMI reads detected per cell (nCount_RNA), and the number of expressed (detected) genes per same cell (nFeature_RNA).

meta <- srat@meta.data
dim(meta)
## [1] 10194     3
head(meta)
##                    orig.ident nCount_RNA nFeature_RNA
## AAACCCACATAACTCG-1    pbmc10k      22196         4734
## AAACCCACATGTAACC-1    pbmc10k       7630         2191
## AAACCCAGTGAGTCAG-1    pbmc10k      21358         4246
## AAACCCAGTGCTTATG-1    pbmc10k        857          342
## AAACGAACAGTCAGTT-1    pbmc10k      15007         4075
## AAACGAACATTCGGGC-1    pbmc10k       9855         2285
summary(meta$nCount_RNA)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##     499    5549    7574    8902   10730   90732
summary(meta$nFeature_RNA)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##      47    1725    2113    2348    2948    7265

Let’s add several more values useful in diagnostics of cell quality. Michochondrial genes are useful indicators of cell state. For mouse datasets, change pattern to “Mt-”, or explicitly list gene IDs with the features = … option.

srat[["percent.mt"]] <- PercentageFeatureSet(srat, pattern = "^MT-")

Similarly, we can define ribosomal proteins (their names begin with RPS or RPL), which often take substantial fraction of reads:

srat[["percent.rb"]] <- PercentageFeatureSet(srat, pattern = "^RP[SL]")

Now, let’s add the doublet annotation generated by scrublet to the Seurat object metadata.

doublets <- read.table("scrublet_calls.tsv",header = F,row.names = 1)
colnames(doublets) <- c("Doublet_score","Is_doublet")
srat <- AddMetaData(srat,doublets)
head(srat[[]])
##                    orig.ident nCount_RNA nFeature_RNA percent.mt percent.rb
## AAACCCACATAACTCG-1    pbmc10k      22196         4734   5.275725  25.067580
## AAACCCACATGTAACC-1    pbmc10k       7630         2191   8.833552  33.853211
## AAACCCAGTGAGTCAG-1    pbmc10k      21358         4246   6.283360  19.276149
## AAACCCAGTGCTTATG-1    pbmc10k        857          342  31.388565   1.750292
## AAACGAACAGTCAGTT-1    pbmc10k      15007         4075   7.916306  14.986340
## AAACGAACATTCGGGC-1    pbmc10k       9855         2285   7.762557  41.024860
##                    Doublet_score Is_doublet
## AAACCCACATAACTCG-1    0.30542385       True
## AAACCCACATGTAACC-1    0.01976120      False
## AAACCCAGTGAGTCAG-1    0.03139876      False
## AAACCCAGTGCTTATG-1    0.02755040      False
## AAACGAACAGTCAGTT-1    0.36867997       True
## AAACGAACATTCGGGC-1    0.09723644      False

Let’s make violin plots of the selected metadata features. Note that the plots are grouped by categories named identity class. Identity class can be seen in , or using Idents() function. Active identity can be changed using SetIdents().

VlnPlot(srat, features = c("nFeature_RNA","nCount_RNA","percent.mt","percent.rb"),ncol = 4) & 
  theme(plot.title = element_text(size=10))