Wednesday, April 20, 2016

Getting started with Intel's Threading Building Blocks: installing and compiling an example that uses a std::vector and the Code::Blocks IDE

I hit a problem where it looks like I'll have to parallelize execution. After reading a bit about it, I opted to use Intel's Threading Building Blocks over OpenMP. The consensus seemed to be that Intel TBB is better suited for slotting into C++ projects. TBB also has a bunch of birds on its webpage, which is a plus in my book.

The most excellent TBB bird logo.

To download and install, I got the .tgz corresponding to the source code at the download page: https://www.threadingbuildingblocks.org/download

After unarchiving and decompressing, I invoked "make" in the directory (this was on an Ubuntu 14.04 OS). This built the debug and release targets in the "build" directory.

You can also invoke "make" in the "examples" directory; it builds and runs a number of fun examples that use TBB.

To set that up to compile with the Code::Blocks IDE, I did the following:

Go to the Project's Build Options -> Search Directories -> Linker and add the debug and release build directories (e.g. tbb44_20160128oss/build/linux_intel64_gcc_cc4.8_libc2.19_kernel3.19.0_debug).

Go to the Project's Build Options -> Search Directories -> Compiler and add the include directory (e.g. tbb44_20160128oss/include).

Go to the Project's Build Options -> Linker settings, and, under "Other linker options" add "-ltbb" (don't include the quotes).

 To test the compilation with a simple use case, I modified one of the examples in the TBB parallel_for documentation. Here's the premise:

#include <iostream>
#include <vector>

#include "tbb/parallel_for.h"
#include "tbb/blocked_range.h"

// Example adapted from: https://www.threadingbuildingblocks.org/docs/help/reference/algorithms/parallel_for_func.htm
class AverageWithVectors {
public:
    std::vector<float> *input;
    std::vector<float> *output;
    void operator()( const tbb::blocked_range<int>& range ) const {
        for( int i=range.begin(); i!=range.end(); ++i )
            (*output)[i] = ( (*input)[i-1] + (*input)[i] + (*input)[i+1]) * (1/3.f);
    }
};

int main() {
    std::vector<float> inputVector = {30.0, 70.0, 90.0, 120.0, 133.0, 199.0, 245.0, 266.0, 289.0};
    std::vector<float> outputVector = {0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0};
    AverageWithVectors avg;
    avg.input = &inputVector;
    avg.output = &outputVector;
    tbb::parallel_for( tbb::blocked_range<int>( 1, 7 ), avg );
    for (uint32_t i = 0; i < (*avg.output).size(); i++) {
        std::cout << "output of element " << i << ": " << (*avg.output)[i];
    }

    return(0);
}

The output should be something like:
output of element 0: 0
output of element 1: 63.3333
output of element 2: 93.3333
output of element 3: 114.333
output of element 4: 150.667
output of element 5: 192.333
output of element 6: 236.667
output of element 7: 266.667
output of element 8: 0

Note that this doesn't actually prove anything is being executed in parallel, only that it compiles and successfully calls tbb::parallel_for().

Finally, I came across a helpful post on the Intel forums from a user named Jim Dempsey. Most of it is reproduced below.

Generally speaking: 
Use for (or for_each) when the amount of computation is small (not worth the overhead to parallelize). 
Use parallel_for when you have a large number of objects and each object has equal work of small work. 
Use parallel_for with appropriate grain size when you have a large number of objects and each object has un-equal work of small work. 
Use parallel_for_each when each objects work is relatively large and number of objects is few and each object has un-equal work. 
When number of objects is very small and known, consider using switch(nObjects) and cases using parallel_invoke in each case that you wish to impliment and parallel_for_each for the default case

Saturday, April 9, 2016

Access point correspondences from registration method in Point Cloud Library (a.k.a. access protected member in library class without modifying library source).

This post was inspired by an answer by user D.J.Duff on StackOverflow at this post

The goal was to get point correspondences between two point clouds that were identified during registration using the Point Cloud Library (PCL). The correspondences appear to be saved, but as a protected member of the Registration class. I wasn't interested in modifying the PCL source to add a getter function, so I took the route of exposing it via inheritance. This should work with any registration method that inherits from the Registration class. I think this is considered poor form to violate the class in this manner, but I'm not sure what other options are available without modifying the source class.
/** \brief This is a mock class with the sole purpose of accessing a protected member of a class it inherits from.
*
* Some of the relevant documentation for correspondences: http://docs.pointclouds.org/trunk/correspondence_8h_source.html#l00092
*/
template <typename PointSource, typename PointTarget, typename Scalar = float>
class IterativeClosestPointNonLinear_Exposed : public pcl::IterativeClosestPointNonLinear<PointSource, PointTarget, Scalar> {
  public:
    pcl::CorrespondencesPtr getCorrespondencesPtr() {
      for (uint32_t i = 0; i < this->correspondences_->size(); i++) {
        pcl::Correspondence currentCorrespondence = (*this->correspondences_)[i];
        std::cout << "Index of the source point: " << currentCorrespondence.index_query << std::endl;
        std::cout << "Index of the matching target point: " << currentCorrespondence.index_match << std::endl;
        std::cout << "Distance between the corresponding points: " << currentCorrespondence.distance << std::endl;
        std::cout << "Weight of the confidence in the correspondence: " << currentCorrespondence.weight << std::endl;
      }
      return this->correspondences_;
    }
}

And a bit of additional content from my answer on StackOverflow:

Here are some lines drawn between corresponding points between two point clouds obtained in this manner. Notably, the pcl::Correspondence.distance value between two points doesn't always seem to be strictly less than the value of the registration method's maxCorrespondenceDistance, so you may have to check the distances to make sure you're getting only the correspondences you want.