并用tabix生成index,使得vcftools和GATK中直接使用vcf.gz文件。
bgzip和tabix包含在 samtools/htslib 内。
samtools/htslib
1 2
samtools/htslib/bgzip -c file.vcf > file.vcf.gzsamtools/htslib/tabix -p vcf file.vcf.gz
小模块 在 vcftools/vcftools-build/bin/ 中的小程序,实现简单的特定功能。列出几个最常用的。 其他更多
vcftools/vcftools-build/bin/
使用前需要导入perl_module lib
1
export PERL5LIB=/home/wanglizhong/software/vcftools/vcftools-build/lib/perl5/site_perl/5.8.8:$PERL5LIB;
INFO
fill-an-ac 重新计算AN和AC 并加入 INFO 中。
zcat in.vcf.gz|fill-an-ac > out.vcf
vcf转成一致性序列(consensus sequence)。
1234567
cat ref.fa|vcf-consensus file.vcf.gz > out.fa# 其中某一个个体cat ref.fa|vcf-consensus -s sampleA_ID file.vcf.gz > out.sampleA_ID.fa# 只有某一个haplotype(1或者2)cat ref.fa|vcf-consensus -H 1 file.vcf.gz > out.hap1.fa
vcf-subset -c
vcf-subset -c NA0001,NA0002 file.vcf.gz | bgzip -c > out.vcf.gz
12
vcf-query file.vcf.gz 1:10327-10330vcf-query file.vcf -f '%CHROM:%POS %REF %ALT [ %DP]\n'
1234567891011121314151617181920212223
Usage: vcf-query [OPTIONS] file.vcf.gzOptions: -c, --columns <file|list> List of comma-separated column names or one column name per line in a file. -f, --format <string> The default is '%CHROM:%POS\t%REF[\t%SAMPLE=%GT]\n' -l, --list-columns List columns. -r, --region chr:from-to Retrieve the region. (Runs tabix.) --use-old-method Use old version of API, which is slow but more robust. -h, -?, --help This help message.Expressions: %CHROM The CHROM column (similarly also other columns) %GT Translated genotype (e.g. C/A) %GTR Raw genotype (e.g. 0/1) %INFO/TAG Any tag in the INFO column %LINE Prints the whole line %SAMPLE Sample name [] The brackets loop over all samples %*<A><B> All format fields printed as KEY<A>VALUE<B>Examples: vcf-query file.vcf.gz 1:1000-2000 -c NA001,NA002,NA003 vcf-query file.vcf.gz -r 1:1000-2000 -f '%CHROM:%POS\t%REF\t%ALT[\t%SAMPLE:%*=,]\n' vcf-query file.vcf.gz -f '[%GT\t]%LINE\n' vcf-query file.vcf.gz -f '[%GT\ ]%LINE\n' vcf-query file.vcf.gz -f '%CHROM\_%POS\t%INFO/DP\t%FILTER\n'
Reorder columns
将 file.vcf.gz 按照 template.vcf.gz 顺序重排
file.vcf.gz
template.vcf.gz
vcf-shuffle-cols -t template.vcf.gz file.vcf.gz > out.vcf
vcf-sort file.vcf.gz > out.vcfzcat file.vcf.gz| vcf-sort > out.vcf
zcat file.vcf.gz | vcf-to-tab > out.tab
输出如下:
12345678910
#CHROM POS REF 05537chr2 131 T T/Tchr2 437 G G/Gchr2 453 G G/Gchr2 469 G G/Gchr2 526 G G/Gchr2 618 G G/Gchr2 745 A A/Achr2 756 T T/Tchr2 760 T T/T
cat file.vcf | vcf-tstv
vcf-validator file.vcf.gz
可以用结果画维恩图Venn-Diagram。
12345678
# runvcf-compare test.vcf.gz test2.vcf.gz test3.vcf.gz|grep ^VN | cut -f 2-# resultsVN 94 test.vcf.gz (9.5%) test2.vcf.gz (22.3%) test3.vcf.gz (48.5%)VN 100 test.vcf.gz (10.1%) test3.vcf.gz (51.5%)VN 327 test.vcf.gz (32.9%) test2.vcf.gz (77.7%)VN 473 test.vcf.gz (47.6%)
1.个体(header)需完全相同。
vcf-concat chr1.vcf.gz chr2.vcf.gz | bgzip -c > all.vcf.gz
2.合并有不同个体的多个vcf
最好有相同Postions,缺失的默认会被填上miss(.)
vcf-merge A.vcf.gz B.vcf.gz C.vcf.gz | bgzip -c > out.vcf.gz
把mis的用自定义genotype(0|0,0/0,1/1等等)填上
vcf-merge -R '0|0' A.vcf.gz B.vcf.gz C.vcf.gz | bgzip -c > out.vcf.gz
bcftools
samtools/bcftools/bcftools merge --merge all --no-version --threads 10 file1.vcf.gz file2.vcf.gz |bgzip -c > merge.vcf.gz
3.一行命令合并vcf,巧用括号
123
# Merge (that is, concatenate) two VCF files into one, keeping the header# from first one only.(zcat A.vcf.gz | grep ^#; zcat A.vcf.gz | grep -v ^#; zcat B.vcf.gz | grep -v ^#; )| bgzip -c > out.vcf.gz
12345
# 输出两个vcf里共有的vcf-isec -n +2 A.vcf.gz B.vcf.gz | bgzip -c > out.vcf.gz# 输出第一个vcf里有,其他vcf(B和C及后面更多)里没有的vcf-isec -c A.vcf.gz B.vcf.gz C.vcf.gz | bgzip -c > out.vcf.gz