Tuesday, December 3, 2013

grep: select non-matching lines

I recently discovered the -v option from grep to select non-matching lines. It simply inverts the matching of the grep command.

For example, to look through a .vcf file for lines that don't have the QD annotation:
grep -v QD variants.vcf
You can also count the abundance of such lines if you exclude the header with the -P option and use a perl regex, and also the -c option to count:
grep -v -P "^#" variants.vcf | grep -v -c QD
This post came about because I'm working with the Broad's GATK on a variant calling pipeline for an organism lacking an established variant database. While working on testing different variant hard-filters to bootstrap a "truth" set for variant calling, the GATK VariantFiltration walker threw some warnings that I wanted to look into further; I found grep's -v option from this post on the GATK forums.

No comments:

Post a Comment