Monday, July 15, 2013

Compression of files in parallel using GNU parallel

*Update*

In the comments, Ole Tange (GNU Parallel's author!) himself weighed in with a much more elegant method than what I previously wrote.
parallel --gnu -j-2 --eta gzip ::: *.fastq

parallel --gnu -j-2 --eta gunzip ::: *.fastq.gz
And pointed out the helpful bibtex option:
parallel --gnu --bibtex
Which returns the bibtex citation:
@article{Tange2011a,
 title = {GNU Parallel - The Command-Line Power Tool},
 author = {O. Tange},
 address = {Frederiksberg, Denmark},
 journal = {;login: The USENIX Magazine},
 month = {Feb},
 number = {1},
 volume = {36},
 url = {http://www.gnu.org/s/parallel},
 year = {2011},
 pages = {42-47}
}


*Resuming previous post*

I'm very fond of GNU parallel for making it easy to parallelize commands. Here is an example of using it to parallelize gzip compression and decompression of a set of fastq format files. The --gnu option designates that we want to use GNU parallel, and the -j-2 option designates that we want to use 2 less than the number of available processors.
#!/bin/bash
#Gzip compress files with GNU parallel

#Use this for compression
ls ./*.fastq | parallel --gnu -j-2 --eta \
'gzip {}'

#Use this for decompression
ls ./*.fastq.gz | parallel --gnu -j-2 --eta \
'gzip -d {}'
Helpful documentation for GNU parallel can be found here.

2 comments:

  1. It may be easier to read:

    parallel --gnu -j-2 --eta gzip ::: *.fastq

    parallel --gnu -j-2 --eta gunzip ::: *.fastq.gz

    Also when using GNU Parallel for research remember --bibtex.

    ReplyDelete
    Replies
    1. Thank you for the more elegant solution, and for writing GNU Parallel!

      Delete