0%

scRNA-seq分析之Seurat流程

本文介绍Seurat进行单细胞聚类的流程。

1. 创建对象

首先需要读取数据,并创建Seurat对象,同时过滤掉在少于3个细胞中表达的基因以及检测到的feature少于200的细胞。

1
2
3
4
5
mydata <- Read10X(data.dir = "D:/000 MyWork/000 MyProject/000 scVariants/sample1",
gene.column = 1)
mydata <- CreateSeuratObject(counts = mydata,project = "sample1",
min.cells = 3,min.features = 200,
assay = "RNA")

2. 数据质控

常用的质控指标包括:

  • 每个细胞的唯一基因数目 feature number
    • 低质量或空液泡往往只能检测到少量基因
    • 双液泡(doublet)或多液泡(multiplets)会具有异常多的基因数目
  • 每个细胞的总counts数(相当于每个细胞的测序深度)
    • 线粒体基因占比
    • 低质量或死细胞会具有异常高的线粒体基因表达

这里计算线粒体基因的表达比例,并且查看feature number、count number、线粒体基因占比的小提琴图。

1
2
3
4
5
6
7
8
9
mydata[["percent.mt"]] <- PercentageFeatureSet(mydata,pattern = "^mt:")
VlnPlot(object = mydata,
features = c("nFeature_RNA", "nCount_RNA", "percent.mt"),
ncol = 3)
VlnPlot(object = mydata,
features = c("nFeature_RNA", "nCount_RNA", "percent.mt"),
idents = NULL,assay = NULL,pt.size = 0,ncol = 3)
mydata <- subset(mydata, subset = nFeature_RNA > 500
& nFeature_RNA < 5000 & percent.mt < 5)

根据小提琴图的情况来决定过滤的标准,有个更好的质控方法是根据质控指标的分位数进行过滤,例如过滤掉 nFeature_RNA 上四分位数和下四分位数的细胞。但是有时候也会感觉标准略微严格,如果我想尽可能多地保留细胞,那也可以只过滤掉一些离群值。

3. normalization

质控之后,进行counts的normalization,默认使用 “LogNormalize” 的方法,即将每个基因的counts除以细胞总的counts数,乘上10,000,再进行对数转换。这种转换的目的?

1
mydata <- NormalizeData(mydata,scale.factor = 10000)

Seurat提供了另外的normalization方法,通过 normalization.method 指定, 包括:

  • “CLR”: centered log ratio transformation
  • “RC”: equals to “LogNormalize” without log-transformation

4. 特征选择

鉴定高变基因(代表了细胞之间主要的生物学差异)进行后续分析,默认选择前2000个。

1
2
3
4
5
6
7
8
mydata <- FindVariableFeatures(mydata, 
selection.method = "vst", nfeatures = 2000)
top10 <- head(VariableFeatures(mydata), 10)
plot1 <- VariableFeaturePlot(mydata)
plot2 <- LabelPoints(plot = plot1, points = top10, repel = TRUE,
xnudge = 0,ynudge = 0)
plot1
plot2

5. 数据缩放

对数据进行缩放scaling,目的是使数据的均值为0,方差为1。

1
2
3
4
all.genes <- rownames(mydata)
mydata[["percent.mt"]] <- PercentageFeatureSet(mydata,
pattern = "^mt:",assay="RNA")
mydata <- ScaleData(mydata, features = all.genes,vars.to.regress = "percent.mt")

Seurat v4.0版本中新的标准化方法SCTransform,这个函数可以替代三个函数normalization、Scale、FindVariableFeatures的功能。

6. 线性降维

只用2000个高变基因进行PCA线性降维,根据ElbowPlot函数的拐点确定后续用于分析的PCA主成分数。

1
2
3
4
5
6
7
mydata <- RunPCA(mydata, npcs = 20, verbose = FALSE)
DimPlot(mydata, reduction = "pca")
DimHeatmap(mydata, dims = 1:20, cells = 500, balanced = TRUE)
mydata <- JackStraw(mydata, num.replicate = 100)
mydata <- ScoreJackStraw(mydata, dims = 1:20)
JackStrawPlot(mydata, dims = 1:20)
ElbowPlot(mydata)

7. 细胞聚类

1
2
3
4
mydata <- FindNeighbors(mydata, reduction = "pca", dims = 1:20)
mydata <- FindClusters(mydata, resolution = 0.4)
mydata <- RunUMAP(mydata, reduction = "pca", dims = 1:20)
mydata <- RunTSNE(mydata,reduction = "pca",dims = 1:20)

8. UMAP/tSNE降维

1
2
3
4
5
6
7
8
p1 <- DimPlot(mydata, reduction = "umap")
p2 <- DimPlot(mydata, reduction = "umap", label = TRUE, repel = TRUE)
p1+p2

p3 <- DimPlot(mydata, reduction = "tsne")
p4 <- DimPlot(mydata, reduction = "tsne", label = TRUE, repel = TRUE)
p3+p4

9. 鉴定细胞类群的marker

FindMarkers返回指定指定cluster之间的marker。
FindAllMarkers 可以一次寻找所有clusters的markers,但只返回上调的markers。
VlnPlot绘制表达情况的小提琴图;FeaturePlot绘制表达情况的细胞降维图。

1
2
3
4
5
6
7
8
9
10
11
12
# find all markers distinguishing cluster 5 from clusters 0 and 3
cluster5.markers <- FindMarkers(mydata, ident.1 = 5, ident.2 = c(0, 3), min.pct = 0.25)
head(cluster5.markers, n = 5)

# find markers for every cluster compared to all remaining cells, report only the positive ones
mydata.markers <- FindAllMarkers(mydata, only.pos = TRUE, min.pct = 0.25, logfc.threshold = 0.25
mydata.markers %>%
group_by(cluster) %>%
slice_max(n = 2, order_by = avg_log2FC)

VlnPlot(mydata, features = c("MS4A1", "CD79A"))
FeaturePlot(mydata, features = c("MS4A1", "CD79A"))

10. 细胞注释

进行细胞注释,重命名cluster ID为细胞类型。

1
2
3
4
new.cluster.ids <- c("Naive CD4 T", "CD14+ Mono", "Memory CD4 T", "B", "CD8 T", "FCGR3A+ Mono","NK", "DC", "Platelet")
names(new.cluster.ids) <- levels(mydata)
mydata <- RenameIdents(mydata, new.cluster.ids)
DimPlot(mydata, reduction = "umap", label = TRUE, pt.size = 0.5) + NoLegend()