Thursday, June 26, 2014

sra to gzipped fastq

One way to take NCBI's SRA format to a gzipped fastq file (since BWA can take a gzipped fastq as input) is to pipe the SRA Toolkit's fastq-dump to gzip and direct that to an output file. Here's an example that uses the -X flag to only take 10 spots from the SRA file to test it out:
#!/bin/bash
#Get an sra file:
wget ftp://ftp-trace.ncbi.nlm.nih.gov/sra/sra-instant/reads//BySample/sra/SRS/SRS399/SRS399719/SRR771638/SRR771638.sra

#Designate where the SRAToolkit executables are
SRATOOLKITBIN=./sratoolkit.2.3.5-2-ubuntu64/bin/

#fastq-dump converts the sra format to fastq format. -X designates how many spots to convert, and -Z designates to write to stdout.
${SRATOOLKITBIN}fastq-dump -X 10 -Z SRR771638.sra | gzip > SRR771638.fasta.gz

echo "Finished compression"

#Take a peek at the gzipped file.
gunzip -c SRR771638.fasta.gz | head -n 10

No comments:

Post a Comment