从FCA数据库中的loom文件中选择想要的cell构建SeuratObject

从FCA数据库中下载的原始.loom文件,选择自己选择想要的cells转换为Seurat对象。

将.loom文件格式数据转换为SeuratObject，需要SeuratDisk包。

1. SeuratDisk包安装

SeuratDisk包只能通过github安装，因此，需要下载devtools，然后再用devtools通过github安装。

1 2	install.packages("devtools") devtools::install_github("mojaveazure/SeuratDisk")

2. .loom文件格式

loom文件格式是HDF5文件的文件结构，可以有效地储存大型单细胞数据集。可以使用HDFview来查看其数据格式。
打开之后，数据格式如下：

/matrix中储存基因*细胞矩阵，/attrs中储存数据的一些相关信息，如数据创建日期，/col_attrs中储存矩阵列（细胞）的信息，如CellID、ClusterID、Embeddings信息等，/row_attrs中储存矩阵行（基因）的信息，如基因ID、细胞marker等。/layers中储存另外的基因*矩阵，例如scale.data。

3. 将loom转换为Seurat

由于Seurat::as.Seurat()所识别的loom文件和FCA数据库中的loom文件格式不太一样，因此不能直接转换，只能自己构建。
导入loom文件，并读取基因*细胞矩阵，这样读取的矩阵是raw counts，可以直接根据矩阵的属性访问。

> library(Seurat)
> library(SeuratDisk)
> ds <- Connect("r_fca_biohub_body_10x.loom",mode = "r")
# 读取全部矩阵
> fca_body_mat <- ds[["/matrix"]][,]
# 访问矩阵第一行第一列元素
> ds[["/matrix"]][1,1]
	0
# 读取矩阵的维度
> ds[["/matrix"]]$dims

读取loom文件中储存的细胞ID和基因ID信息，分别为矩阵的列名和行名。

fca_body_cellid <- ds[["/col_attrs/CellID"]][]
fca_body_geneid <- ds[["/row_attrs/Gene"]][]
colnames(fca_body_mat) <- fca_body_geneid
rownames(fca_body_mat) <- fca_body_cellid

将自己所需要的细胞ID存储于一个文件中并进行读取。

mycell <- read.table(file = "attrs_myline.txt",header = T,
sep = "\t")
mycell <- mycell$CellID
mycell <- as.array(mycell)

提取自己所需要的细胞的矩阵。

1 2	fca_my_mat <- fca_body_mat[rownames(fca_body_mat) %in% mycell,] rm(fca_body_mat)

将矩阵转换为稀疏矩阵(节省空间)并进行转置。

1 2	fca_my_mat <- Matrix::Matrix(fca_my_mat, sparse=T) fca_my_mat <- Matrix::t(fca_my_mat)

提取所需要的meta信息。

attrs <- c('CellID', 'ClusterID', 'n_counts', 'n_genes', 'percent_mito','sex',
'annotation', 'annotation_broad', 'S_annotation_broad_extrapolated')
attrs_df <- map_dfc(attrs, ~ ds[[paste0("col_attrs/", .)]][]) %>% as.data.frame()
colnames(attrs_df) <- attrs
rownames(attrs_df) <- fca_body_cellid

attrs_df <- attrs_df[rownames(attrs_df) %in% mycell,]
write.table(attrs_df,"attrs.txt",sep="\t")

如果我们不想要数据库已经注释好的信息，想要自己的注释，例如合并某些cluster，怎么办呢？
我们可以在输出的attrs.txt中加上一列my_annotation，然后就可以啦。

attrs <- c('CellID', 'ClusterID', 'n_counts', 'n_genes', 'percent_mito','sex',
'annotation', 'my_annotation','annotation_broad', 'S_annotation_broad_extrapolated')

attrs_df <- read.table(file = "attrs_my.txt",sep = "\t",header = T)
colnames(attrs_df) <- attrs
rownames(attrs_df) <- attrs_df$CellID

fca_my_seurat <- CreateSeuratObject(counts = fca_my_mat,
	meta.data = attrs_df)

fca_my_seurat@active.ident <- as.factor(fca_my_seurat$my_annotation)

saveRDS(fca_my_seurat,file = "fca_my_seurat.rds")

这样构建出的SeuratObject就有了自己想要的细胞的表达矩阵以及meta信息，可以进行接下来的分析了。