awk print every other line (or every Nth line) in fasta file
This specific line of awk doesn't have much general utility, but it was intended to pull out every other sequence record in a .fasta file. It can be applied to every Nth record in the fasta file as well by changing the modulo operator statement. It only applies to .fasta files in which the sequence string isn't wrapped into multiple lines.
Here it is in its one-liner form:
Here it is in its one-liner form:
awk 'BEGIN{i=0} (substr($0,1,1) == ">") { if (i%2 == 0) {print $0; getline; print $0} i++}' test.faAnd it makes a bit more sense when formatted:
awk 'BEGIN{i=0} (substr($0,1,1) == ">") { if (i%2 == 0) { print $0 getline print $0 } i++ }' test.fa
This assumes the .fasta file is of the format:
>SequenceID1 ATGACTA >SequenceID2 AGGCATG
and the sequence string is contained entirely on one line.
Comments
Post a Comment