awk print every other line (or every Nth line) in fasta file

Posted by Frogee August 21, 2013

awk print every other line (or every Nth line) in fasta file

This specific line of awk doesn't have much general utility, but it was intended to pull out every other sequence record in a .fasta file. It can be applied to every Nth record in the fasta file as well by changing the modulo operator statement. It only applies to .fasta files in which the sequence string isn't wrapped into multiple lines.

Here it is in its one-liner form:

awk 'BEGIN{i=0} (substr($0,1,1) == ">") { if (i%2 == 0) {print $0; getline; print $0} i++}' test.fa

And it makes a bit more sense when formatted:

awk 'BEGIN{i=0} (substr($0,1,1) == ">") {
 if (i%2 == 0) {
  print $0
  getline
  print $0
 }
 i++
}' test.fa

This assumes the .fasta file is of the format:

>SequenceID1
ATGACTA
>SequenceID2
AGGCATG

and the sequence string is contained entirely on one line.

Search This Blog

Codex Technicanum

awk print every other line (or every Nth line) in fasta file

Comments

Post a Comment

Popular Posts

Find minimum oriented bounding box of point cloud (C++ and PCL)

Watterson estimator calculation (theta or $\theta_w$) under infinite-sites assumption