Monday, August 22, 2016

Postmortem: 3D sorghum reconstructions from depth images identify QTL regulating shoot architecture

One of the approaches that researchers are using to improve plant productivity is high-throughput plant phenotyping. The objective is to use computing and robotics to increase the number of plants that can be evaluated for traits of interest. As the number of plants that can be screened increases, the rate at which the genetic basis of traits can be identified increases, and we can make better selection decisions for breeding. The way I see it, the future of plant phenotyping is going to be large numbers of cheap, autonomous robots that can crawl or fly a field or greenhouse, gather data, and send that data for genetic analysis and breeding decisions (as well as precision agriculture); I think the major hurdles are a lack of software, both for robot autonomy and data analysis, and the energy density of batteries.

The project that ultimately led to our recent manuscript "3D sorghum reconstructions from depth images identify QTL regulating shoot architecture" originated sometime around October 2014. Given our results with the RIG manuscript, I agreed with the general sentiment in the field that genotyping was not the bottleneck for crop improvement, and that phenotyping was the limitation. While industry platforms like Lemnatec and Traitmill and public platforms like PlantScan had been been out for a little while, we were looking for something that required less static infrastructure, especially since one of our major interests is bioenergy sorghum; 3+ meter tall bioenergy sorghum plants require more flexible acquisition platforms.

Sometime around Fall every year, I tend to get on an annual game development kick (notice that last Fall in November 2015 there are posts on procedural content generation, and this year the Llama and I are registered for the September Chillenium game jam at Texas A&M). Similarly, in Fall 2014, I was on my annual game development kick, and I was interested in the idea of the Microsoft Kinect as an input device (version 2 to be specific). Since the Kinect (and related cameras that use structured light or time-of-flight) directly samples information about the geometry of scenes at high rates and was relatively cheap, it seemed like an good fit for building a plant morphology phenotyping platform. Even if it didn't work, we only put the boss out ~$150, and we still learned a little about hardware interfaces and image processing.

3D sorghum - we have more than a thousand of these things now.
As it turns out, it worked pretty well, and after about 6 months of tooling around with some prototypes, we had a workflow in place and had enough working test cases to merit a larger scale experiment. This was around the time of the ARPA-E TERRA project proposals, so plant phenotyping was starting to get more attention in the United States, and our boss was onboard.

We grew the plants, imaged them loads of times, processed the data, and wrote the manuscript. The review process was productive; the editor and reviewers provided very helpful and critical feedback that improved the manuscript substantially.

The publication of this manuscript marks the completion of much of what the Llama and I had hoped to accomplish in our graduate studies. When we entered the TAMU Genetics program 4 years ago, we wanted to become proficient in interdisciplinary research and use mathematics and computing to answer questions in genetics. We now have publications that use modeling, high-performance computing, and image processing to understand the genetic basis of quantitative traits and plant physiology, we can program in multiple languages, and we can comfortably talk shop with a range of disciplines. We're happy with how things turned out in graduate school, and we're grateful for the training opportunities and interactions that we've had while we've been here. Over the next year we'll try to polish off some additional manuscripts in the queue and try to get a job in research.

Now for the what worked and what didn't.

What worked:
  • Open source libraries - Particularly RapidXML, OpenCV, and PCL. Open source is the best. Being able to browse the source code to step through the code was priceless for learning and debugging. In this spirit, we put all of our code on GitHub.
  • GitHub for code hosting - Free, simple, publicly accessible, and has a good user interface. If you write non-trival code for an academic publication, please just put it online somewhere so that folks like me can look at it. Ideas are cheap; implementations are valuable.
  • Dryad for data hosting - Acting as a data repository for research data seems to be an impossible and thankless task, particularly as data sets are exploding in size. Dryad was professional and efficient.
What didn't:
  • Getting bogged down in the details - The first draft of the manuscript that we sent for review was a bit heavy handed with extraneous detail. We didn't develop any elegant algorithms, but that didn't stop me from talking about our methods excessively and using far too much jargon; these got in the way of the overall message of our manuscript. Pruning this out and moving some of it to the supplemental benefited the manuscript.
  • Not merging RGB and depth and not using templating properly - Originally, I was only interested in the depth images since I figured the RGB data wouldn't provide much useful information (the plant is rather uniformly green after all, and variation in environmental lighting was a complexity I didn't want to handle). This ended up as most of the code base being designed specifically for clouds with point types of pcl::PointXYZ rather than using a template. If I were to do it again, I would have gone ahead and mapped the RGB to the depth data for figure generation and used templates to make the code base more generic (the same way PCL uses templates).