0%

Seurat包合并多个sample datasets

很多情况下,我们需要将多个sample dataset合并在一起,然后进行接下来的分析。Seurat v4官方提供了可以用于整合数据集的函数:FindIntegrationAnchors()和IntegrateData()。

1.导入数据,设置SeuratObject

首先读入数据,然后创建SeuratObject对象。

1
2
3
4
5
6
7
8
9
10
11
12
13
library(Seurat)

data1 <- Read10X(data.dir = "D:/000 MyWork/000 MyProject/000 scVariants/sample1",
gene.column = 1)
data2 <- Read10X(data.dir = "D:/000 MyWork/000 MyProject/000 scVariants/sample2",
gene.column = 1)

data1 <- CreateSeuratObject(counts = data1,project = "sample1",
min.cells = 3,min.features = 200,
assay = "RNA")
data2 <- CreateSeuratObject(counts = data2,project = "sample2",
min.cells = 3,min.features = 200,
assay = "RNA")

接下来分别对两个数据各自进行Normalize(每个都要各自进行normalize),并鉴定variable features,最后选择两个datasets中repeatedly variable features进行integration。

1
2
3
4
5
6
7
data.list <- c(data1,data2)
data.list <- lapply(X = data.list, FUN = function(x) {
x <- NormalizeData(x)
x <- FindVariableFeatures(x, selection.method = "vst", nfeatures = 2000)
})

features <- SelectIntegrationFeatures(object.list = data.list)

2.进行整合

然后,使用FindIntegrationAnchors()函数确定Anchor,这个函数以SeuratObject函数作为输入,然后再利用IntegrateData()函数进行整合。

1
2
3
4
data.anchors <- FindIntegrationAnchors(object.list = data.list,
anchor.features = features)

data.combined <- IntegrateData(anchorset = data.anchors)

这样整合之后的数据只包括了前面鉴定到的features信息,后续的scale等等分析都仅基于这些features,默认是2000个,因此数据会小很多。

3.整合后的分析

接下来就可以进行后续的分析了。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
DefaultAssay(data.combined) <- "integrated"

# Run the standard workflow for visualization and clustering
data.combined <- ScaleData(data.combined, verbose = FALSE)
data.combined <- RunPCA(data.combined, npcs = 30, verbose = FALSE)
data.combined <- FindNeighbors(data.combined, reduction = "pca", dims = 1:20)
data.combined <- FindClusters(data.combined, resolution = 0.5)
data.combined <- RunUMAP(data.combined, reduction = "pca", dims = 1:20)

# Visualization
p1 <- DimPlot(data.combined, reduction = "umap")
p2 <- DimPlot(data.combined, reduction = "umap", label = TRUE, repel = TRUE)
p1
p2

这种整合方法对于较大的数据集,比较消耗内存和时间,了解到还有另外一种整合方法:Harmony,之后再说。

参考:https://satijalab.org/seurat/articles/integration_introduction.html