0%

GATK/bcftools变异检测

本文总结利用RNA-seq数据进行snp calling的流程。

1. 构建参考基因组index

1
2
3
bwa index Dmel.dna.fa
samtools faidx Dmel.dna.fa
picard CreateSequenceDictionary -R ~/00.reference/01.Dmel/Dmel.dna.fa -O Dmel.dna.dict

2. bam文件预处理

对bam文件进行排序。

1
2
samtools merge -@ 10 -l 9 merged.bam sample1.bam sample2.bam sample3.bam
samtools sort -@ 10 -l 9 merged.bam -o merged.sort.bam

3. 去重复mark duplicates

去除掉建库、测序过程中偏好性导致的一些”假阳性”地增高的reads。

1
2
3
4
picard MarkDuplicates -I merged.sort.bam
\ -M merged_mark_metrics.txt
\ --REMOVE_DUPLICATES 1
\ -O merged.sort.removed.bam

4-1. GATK snp calling

1. SplitNCigarReads

这一步是针对RNA-seq数据call snp特有的步骤。mRNA转录本主要是由DNA的外显子exon的可变剪切,不包括intron部分。因此,在RNA-seq数据call snp时,如果直接比对到基因组上,若比对到intron的N区域,就会有假阳性的snp,为了减少假阳性,需要将reads进行切割,跨越intron区域。若为n个intron的read片段,则切割为n+1个子read片段。

1
2
3
gatk SplitNCigarReads -R ~/00.reference/01.Dmel/Dmel.dna.fa
\ -I merged.sort.removed.bam
\ -O merged.sort.removed.splited.bam

2. GATK snp calling

HaplotypeCaller进行变异检测,之后,只保留SNP位点,并对SNP进行过滤:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
gatk HaplotypeCaller -R ~/00.reference/01.Dmel/Dmel.dna.fa
\ --ploidy 2 -I 03.bam/merged.sort.removed.bam
\ -O 04.call_snp/all.merged.vcf.gz

gatk SelectVariants -R ~/00.reference/01.Dmel/Dmel.dna.fa
\ -O 04.call_snp/all.merged.snp.vcf
\ --variant 04.call_snp/all.merged.vcf.gz
\ --select-type-to-include SNP

gatk VariantFiltration -R ~/00.reference/01.Dmel/Dmel.dna.fa
\ -V 04.call_snp/all.merged.snp.vcf.gz
\ --cluster-size 4 --cluster-window-size 10
\ --mask-name aroundIndel --mask 04.call_snp/all.merged.indel.vcf.gz -mask-extension 3
\ --filter-name "lowMQRankSum" --filter-expression "MQRankSum < 12.5"
\ --filter-name "highFS" --filter-expression "FS > 60.0"
\ --filter-name "lowReadPosRankSum" --filter-expression "ReadPosRankSum < -8.0"
\ --filter-name "lowMQ" --filter-expression "MQ < 40.0"
\ --filter-name "lowQD" --filter-expression "QD < 2.0"
\ --genotype-filter-name "lowDP" --genotype-filter-expression "DP < 8.0"
\ -O 05.filter_snp/all.merged.filered.snp.vcf.gz

4-2. bcftools snp calling

去重复之后,也可利用bcftools进行snp calling,并进行过滤。

1
2
3
bcftools mpileup -Ou -f ~/00.reference/01.Dmel/Dmel.dna.fa 03.bam/merged.sort.removed.bam | bcftools call -mv -o all.merged.vcf.gz
bcftools filter all.merged.vcf.gz -e 'QUAL<30' --SnpGap 5 -Oz -o all.filtered.merged.vcf.gz;
bcftools filter all.filtered.merged.vcf.gz -e 'DP<10' -Oz -o all.filtered2.merged.vcf.gz;