Saturday, October 19, 2013

awk and GNU parallel: problems with quotes

Embarrassing as it is to admit, I spent about two hours trying to work out how to parallelize an awk command with GNU parallel. I think the conclusion is that I don't understand quotes in bash as well as I should.

My goal was to run the awk code from my previous post in parallel. Credit goes to this post on Stack Overflow for getting me to the solution.

One of the answers in the Stack Overflow post suggested storing the awk command in a string, and that worked for me:
#!/bin/bash

awk_body='(substr($0, 1, 1) != "@") && ($0 != "+") {print substr($0, 6)}
          (substr($0, 1, 1) == "@") || ($0 == "+") {print $0}'

ls *.fastq | parallel --gnu -j-2 --eta "cat {} | awk '$awk_body' > trimmed{}"

Kudos to Ole Tange for GNU parallel.

Edit:

Ole Tange stopped by and left some good pointers in the comments:

The GNU Parallel man page has a section dedicated to quoting: http://www.gnu.org/software/parallel/man.html#quoting

Often you can simply add \' around every '.

Or use --shellquote to generate a quoted version of your string.

For readability you might want to look into writing a Bash function instead: http://www.gnu.org/software/parallel/man.html#example__calling_bash_functions

Thanks for the suggestions, Ole!


1 comment:

  1. The GNU Parallel man page has a section dedicated to quoting: http://www.gnu.org/software/parallel/man.html#quoting

    Often you can simply add \' around every '.

    Or use --shellquote to generate a quoted version of your string.

    For readability you might want to look into writing a Bash function instead: http://www.gnu.org/software/parallel/man.html#example__calling_bash_functions

    ReplyDelete