Count number of reads in a SAM file above or below a mapping quality score with awk
Another newbie use of awk. This is a one line program for counting the number of reads in a SAM file based on a mapping quality score threshold (column 5). It can also be easily modified for counting lines on some other condition.
In this example, we're testing if the value of column 5 for the row is greater than or equal to 20 (a 99% probability the read was mapped correctly), and incrementing a counter variable. The condition could be modified for any condition that is of interest to count.
awk also has logical operators for "and" and "or": "&&" and "||":
In this example, we're testing if the value of column 5 for the row is greater than or equal to 20 (a 99% probability the read was mapped correctly), and incrementing a counter variable. The condition could be modified for any condition that is of interest to count.
$ awk 'BEGIN{i=0} $5>=20 {i=1+i} END{print i}' inputsam.sam
awk also has logical operators for "and" and "or": "&&" and "||":
$ awk 'BEGIN{i=0} $5>=20 && $5<=30 {i=1+i} END{print i}' inputsam.sam
Comments
Post a Comment