Thursday, December 12, 2013

Embedding Python in C++: converting C++ vectors to numpy arrays, and plotting C++ vector contents using matplotlib

Edit: A comment on StackOverflow from user4815162342 gave a helpful suggestion:
You really should look into using PyArray_SimpleNewFromData, as the OP proposed in the question. For larger vectors it is very important to avoid creating a Python list or tuple, because they require all of their elements to be Python objects, which is very memory- and CPU-inefficient for large vectors.


I haven't found a plotting library for C++ that I like as much as Python's matplotlib, but the prospect of writing data calculated in a C++ program to a file and then reading it into a Python program for plotting was not appealing. Fortunately, there's a Python/C API that enables Python to be embedded into C/C++.

I managed to cobble together code that makes two C++ vectors, converts them to Python tuples, passes them to Python, converts them to NumPy arrays, then plots them using matplotlib. There is probably a better way that folks with more experience can provide, but this seems to work.

Credit goes to these pages for helping me cobble it together:

A Makefile along with C++ and Python code can be found at my Github, but here are some of the highlights.

Here is a bit of the important stuff from the .cpp file:
     //Make some vectors containing the data
     static const double xarr[] = {1,2,3,4,5,6,7,8,9,10,11,12,13,14};
     std::vector<double> xvec (xarr, xarr + sizeof(xarr) / sizeof(xarr[0]) );
     static const double yarr[] = {0,0,1,1,0,0,2,2,0,0,1,1,0,0};
     std::vector<double> yvec (yarr, yarr + sizeof(yarr) / sizeof(yarr[0]) );

     //Transfer the C++ vector to a python tuple
     pXVec = PyTuple_New(xvec.size()); 
     for (i = 0; i < xvec.size(); ++i) {
          pValue = PyFloat_FromDouble(xvec[i]);
          if (!pValue) {
               fprintf(stderr, "Cannot convert array value\n");
               return 1;
          PyTuple_SetItem(pXVec, i, pValue);

     //Transfer the other C++ vector to a python tuple
     pYVec = PyTuple_New(yvec.size()); 
     for (i = 0; i < yvec.size(); ++i) {
          pValue = PyFloat_FromDouble(yvec[i]);
          if (!pValue) {
               fprintf(stderr, "Cannot convert array value\n");
               return 1;
          PyTuple_SetItem(pYVec, i, pValue); //

     //Set the argument tuple to contain the two input tuples
     PyTuple_SetItem(pArgTuple, 0, pXVec);
     PyTuple_SetItem(pArgTuple, 1, pYVec);

     //Call the python function
     pValue = PyObject_CallObject(pFunc, pArgTuple);
Here's the entire .py file:
def plotStdVectors(x, y):
    import numpy as np
    import matplotlib.pyplot as plt
    print "Printing from Python in plotStdVectors()"
    print x
    print y
    x = np.fromiter(x, dtype = np.float)
    y = np.fromiter(y, dtype = np.float)
    print x
    print y
    plt.plot(x, y)
    return 0
And, after compiling with the Makefile (which is for Ubuntu 12.10 using the system's default Python installation), can be run with:
$ ./testEmbed pythonToEmbed plotStdVectors
Hello from main
Hello from runPython()
Printing from Python in plotStdVectors()
(1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0, 10.0, 11.0, 12.0, 13.0, 14.0)
(0.0, 0.0, 1.0, 1.0, 0.0, 0.0, 2.0, 2.0, 0.0, 0.0, 1.0, 1.0, 0.0, 0.0)
[  1.   2.   3.   4.   5.   6.   7.   8.   9.  10.  11.  12.  13.  14.]
[ 0.  0.  1.  1.  0.  0.  2.  2.  0.  0.  1.  1.  0.  0.]
Result of call: 0
Program finished
And the plot:

Thursday, December 5, 2013

Getting started with FFTW for computing the Discrete Fourier Transform

The FFTW (Fastest Fourier Transform in the West) is a library for, among other things, computing the Discrete Fourier Transform (DFT) of real data.

I have been using Scipy for computing the DFT of real data, but I have a love/hate relationship with Python so I wanted to have it working with C/C++. Additionally, my understanding is that the FFTW is supposed to have exceptional performance when computing the DFT of many arrays of the same size because it invests in an initialization step that tailors the computation to the hardware (the "plan").

The installation on Ubuntu 12.10 went smoothly after downloading the package from their website. The usual round of
sudo make install
took care of everything.

The FFTW documentation was very helpful and got me up and running. My code and Makefile are posted on Github. The code generates an array of real numbers and computes the DFT, returning an array of complex numbers in the first function, and it returns the "halfcomplex" format in the second function (check the FFTW documentation for more on the "halfcomplex" format):
Hello from main

Hello from real2complex()

This takes in a real number array and returns a complex number array:
Input array: 
 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1
Output array: 
 (8,0) (0,0) (0,0) (0,0) (-4,4) (0,0) (0,0) (0,0) (0,0) (0,0) (0,0) (0,0) (0,0) (0,0) (0,0) (0,0)

Hello from real2real()

This takes in a real number array and returns a real number array:
Input array
 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1
"Halfcomplex" output array: 
 8 0 0 0 -4 0 0 0 0 0 0 0 4 0 0 0
Power spectrum
 0 0 0 5.65685 0 0 0
Frequencies of spectrum 
 0.0625 0.125 0.1875 0.25 0.3125 0.375 0.4375
Please let me know if you find any errors, and use the code at your own risk; my theoretical understanding of the DFT certainly isn't reliable!

Tuesday, December 3, 2013

grep: select non-matching lines

I recently discovered the -v option from grep to select non-matching lines. It simply inverts the matching of the grep command.

For example, to look through a .vcf file for lines that don't have the QD annotation:
grep -v QD variants.vcf
You can also count the abundance of such lines if you exclude the header with the -P option and use a perl regex, and also the -c option to count:
grep -v -P "^#" variants.vcf | grep -v -c QD
This post came about because I'm working with the Broad's GATK on a variant calling pipeline for an organism lacking an established variant database. While working on testing different variant hard-filters to bootstrap a "truth" set for variant calling, the GATK VariantFiltration walker threw some warnings that I wanted to look into further; I found grep's -v option from this post on the GATK forums.