Friday, September 27, 2013

Journal club: publishable units?

Paper: UV-C-irradiated Arabidopsis and Tobacco emit volatiles that trigger genomic instability in neighboring plants from Yao et al. (2011).

After reading the paper, Frogee told me that the HRF, high recombination frequency, detection method used as a proxy of genomic instability was not well supported, and actually seems to be unaccepted.

This paper's story hinges on plant signalling that induces genomic instability, however it seems that they have not validated that the detection assay is a reliable one. That is, there is no evidence that "a single recombination event in the recombination cassette restores luciferase activity".


The authors' interpretations are hard to swallow, I don't believe they showed what they claim, that is "a single recombination event in the recombination cassette [will] restore luciferase activity". Nonetheless, the genetic experimentation with mutants compromised for either SA and/or JA synthesis and/or perception supported that these two compounds were volatile, excreted under certain stresses, and perceived by plants. Whether or not these findings were a publishable unit without the stretch to high frequency recombination, I'm not sure.

Presenter did a good job; relevant background information, fielded the questions phenomenally well.

Journal Club: Homologous recombination reporter lines

This week's journal club paper: UV-C-irradiated Arabidopsis and Tobacco emit volatiles that trigger genomic instability in neighboring plants from Yao et al. (2011).

I was pretty surprised that this one got through peer review. Everything in this paper hinges on the ability of the reporter line to faithfully report homologous recombination events, but I was unable find any evidence that suggested this was the case. I was also unable to find this evidence in any of the citations.

 Fortunately, I wasn't the only one. A strong rebuttal regarding publications using these somatic homologous recombination reporter lines was published a year later in the same journal: Reevaluation of the reliability and usefulness of the somatic homologous recombination reporter lines from Ulker et al. (2012).

I suspect everyone's comment in journal club will be the same, and something along the lines of:

"This paper presents an interesting idea, but I think it fails to demonstrate that their reporter line actually reports only homologous recombination. I think that a 2012 rebuttal paper from Ulker et al. about these somatic homologous recombination report lines successfully defends the idea that there are many other interpretations of reporter activity in these lines, including stress induced read-through transcription."

Monday, September 23, 2013

Change MySQL data file directory (Ubuntu 12.10 64-bit)

Since our MySQL database is stored on an individual hard drive, and the database is growing beyond the size limits of the hard drive, a temporary solution is to move it to a larger drive. Another temporary measure is to clear the binary logs as described in a previous post. I suspect the real solution is to use a distributed database (also see the Wikipedia entry), but I don't think our use case is quite ready for that yet. This post is on how to change the MySQL data file directory.

This is using a MySQL 5.5.29 server installed via the apt package manager on Ubuntu 12.10 64-bit (MySQL server version 5.5.29-0ubuntu0.12.10.1-log)

Credit goes to this post and this forum post for the the information.

If you don't already know where on the hard drive your MySQL database files are stored, identify their location, and then stop the server:
#Enter the mysql shell
$ mysql


| Variable_name | Value           |
| datadir       | /var/lib/mysql/ |

#Exit the shell and shut down the server
> exit
$ sudo service mysql stop
Now we want to copy the database over to the new location and retain the previous permissions. I'm choosing to copy in this case instead of moving in case something goes wrong (although I recommend that you should have additional backups up your database for the event of something like drive failure).
sudo cp -p -r /var/lib/mysql /path/to/new/location
Next we want to tell mysql where to look for the data directory on startup. I prefer to modify the my.cnf configuration file to save having to specify it when starting. My text editor of choice is vim.
sudo vim /etc/mysql/my.cnf
Here you want to change the datadir line under the [mysqld] section
#I'm choosing to comment out the old line in case disaster strikes and I need to get back to start.
#datadir     = /var/lib/mysql
datadir      = /path/to/new/location/mysql
Save those edits. We also need to modify the apparmor profile:
sudo vim /etc/apparmor.d/usr.sbin.mysqld
Add the following near the end before the closing bracket:
#You should see similar directives for your previous data directory, so you can follow their lead (i.e. /var/lib/mysql/ r, and /var/lib/mysql/** rwk,)
/path/to/new/location/mysql/ r,
/path/to/new/location/mysql/** rwk,
There was a report that having lines for the previous data directory would cause problems. Leaving those lines didn't seem to cause any adverse effects for me, but they may need to be removed or commented out in other cases.
Save the edits and then restart apparmor:
sudo /etc/init.d/apparmor restart
If that went well, you can you can start the mysql server and make sure everything is as expected.
sudo service mysql start

Saturday, September 21, 2013

Journal club: sweet plots

This week's journal club paper is: Genome-wide analysis of histone H3.1 and H3.3 variants in Arabidopsis thaliana Stroud et al. (2012)

I thought it was very resourceful of Stroud et al. to pool together large datasets to make connections for H3.1 and H3.3 functional depositions. I have a couple of questions:

Is it possible that their use of H3.1/H3.3 Myc-tags transformed into the plant limits their ability to assay native H3.1/H3.3. So, maybe their absence of presence isn't necessarily evidence for absence? Is their concern for differential binding of the Myc-tags?

Frogee said that making Myc-tags is probably easier than finding specific antibodies for H3.1/H3.3. He's probably right, but I'm still curious how much more difficult making antibodies and if doing CHIP-seq with those antibodies would have yielded differences.

This is the first time I've encountered or maybe paid attention to plots that show values plotted against enrichment and distances from enrichment. I have no understanding of how this is done, but I think that I need to explore this technique of clustering these regions, because at least for me the figures look convincing. 

I'm curious as to how the H3.1 and H3.3 have been shown to evolve separately in plants and animals.

My comment:

This is the first time I've encountered plots that show values plotted against enrichment and distances from enrichment. I don't know how this technique of clustering these regions is done, but I want to find out, because, at least for me, the figures look like a convincing way to summarize data from many different regions of the genome.

Friday, September 20, 2013

Journal Club: Nucleosome prediction

This week's journal club paper was: Genome-wide analysis of histone H3.1 and H3.3 variants in Arabidopsis thaliana Stroud et al. (2012)

The number of existing datasets that Stroud et al. were able to pull genomic data from was, I think, a nice example of putting Arabidopsis' rich research history to good use. It's also nice to see when some general chromatin features are conserved between plants and animals, although its quite surprising/borderline unbelievable that H3.1 and H3.3's functions in both systems are a product of convergent evolution.

My comment:

"It's not really the purpose of the paper, but I was excited to see that there exists a nucleosome prediction algorithm that appears to work well in C. elegans and, as this paper suggests, in Arabidopsis.  I think the idea that DNA sequence is a good predictor of nucleosomal location across large evolutionary distances could permit searching for higher order chromatin structures on the basis of DNA sequence."

Monday, September 16, 2013

Gamasutra Post: Overview of Motion Planning by Matthew Klingensmith

Motion planning is one of the research interests of one of my committee members. This post on Gamasutra written by Matthew Klingensmith discusses the use of motion planning for AI pathfinding; it was a helpful introduction for me. This stuff is amazing.

An excerpt from the article:

"In game development literature, the act of getting from point A to point B is usually called "pathfinding," yet I've been calling it "motion planning." Basically, motion planning is a generalization of pathfinding with a looser definition of "point" and "path" to include higher dimensional spaces (like rotations, or joints).

Motion planning can be thought of as pathfinding in the configuration space of the Agent."

Saturday, September 14, 2013

Journal club: Specialist phenomena stems from differential concentrations

Journal article for 09/13/2013:
An amino acid substitiution inhibits specialist herbivore production of an antagonist effector and recovers insect-induced plant defenses by Schmelz et al. (2012)

Overall, the research presented seems reasonably executed as well as interpreted.

It's a misnomer to call  [Vu-In$^{-A}$] "inactive" when it has antagonistic repressive functionality.

The dynamics of plant reaction due to different ratio of inceptin concentration produced by FAW and VBC reminded me of a somewhat recent news article, Plants perform molecular maths. While not as complex as the rate of starch degradation discussed in the news article as these inceptin classes might just be competing for receptor real-estate, the data suggest that their might be something more complex occuring.

After finding that VBG repressed plant reaction to predation by itself through production of more  [Vu-In $^{-A}$] than [Vu-In], they performed a titration of the two compounds to gauge plant reaction dependent on different ratios.

Figure 3B shows the results of different concentrations. I found it interesting that when [Vu-In $^{-A}$] : [Vu-In] were 1:1, the ET production was the same as when the entire treatment was [Vu-In].  More interestingly, the pseudo dose-response of each of inceptins can be seen in Figure 3A which can describe the paired dose response.

So, since this specialist phenomena seems to be the result of different concentrations of inceptins. It would have been nice to see the paired dose response for Vu-In $^{-A}$ and Vu-In $^{\bigtriangleup V}$ to see if the defense response is considerably antagonized by the ~77% Vu-In $^{-A}$ as it was in Fig 3B.

TL;DR (comment):
With respect to Figure 3B, I found it interesting that in the paired-dose response when the two inceptins were 1:1, the ET production was similar as when the entire treatment was [Vu-In].  I agree with the authors, that, based on their data, it looks like this specialist phenomena stems from differential concentrations of inceptins in the OS. I think it would have been nice to see the paired dose response for Vu-In $^{-A}$ and Vu-In $^{change V}$ to see if the defense response is again considerably antagonized by the ~77% Vu-In $^{-A}$ as it was in Fig 3B.

Friday, September 13, 2013

Journal club - Green leafy volatiles

Green leafy volatiles got voted in as one of the topics for this semester's journal club.

Today's journal club paper is An amino acid substitution inhibits specialist herbivore production of an antagonist effector and recovers insect-induced plant defenses by Schmelz et al. (2012).

In my opinion, this paper was reasonably solid but poorly delivered. The experiments and concepts presented were simple, but the authors seemed to go out of their way to make them opaque; the introduction and discussion were a bit aimless. I think a simple re-write would increase the cogency and leave more room to make more intuitive figures.

The conclusions seem fine, although I'm surprised the authors didn't also follow up with the aspartic acid substitution (Fig 4B) a bit further since the 11-mer dramatically reduced the Vu-In$^{-A}$ recovery (even if the 19-mer version didn't perform as well).

The Llama, Precocious as she is, suggested that it would have also been nice to see a plot like Fig 3B (the paired dose response) for Vu-In$^{-A}$ and Vu-In$^{\bigtriangleup V}$ to see if the defense response is considerably antagonized by the ~77% Vu-In$^{-A}$ as it was in Fig 3B. However, I suppose we'd expect it to be similar to the Vi-In$^{-A}$ and Vu-In in Fig 3B since it looks like the response difference is caused by differences in proteolysis efficiency that changes the relative abundance in the saliva (Fig 4B and C).

And for my comment:

"I thought the authors' interpretations of their findings were reasonable, although showing what caused the VBC larvae to produce the altered inceptin ratio would be more convincing. This paper reminded me of the Red Queen hypothesis for host/parasite or predator/prey relationships, and for me its amusing to think that plants evolved to detect digested bits of themselves; it seems reasonable that this provides more information regarding the source of the damage and what the appropriate defense response should be rather than just general mechanical damage sensing."

Thursday, September 12, 2013

Gamasutra and Ars Technica posts on WebGL and JavaScript performance

From a post on Gamasutra by Jasmine Kent about 3D rendering in the browser using WebGL (found here), I ended up on a post on Ars Technica by Peter Bright claiming Mozilla's ability to produce near native performance using JavaScript (found here).

An excerpt from the Gamasutra article:

"WebGL is OpenGL for the browser, providing access to the power of the GPU, but with some constraints. Importantly, CPU-side data processing with JavaScript is slower than in a native app. Transferring data to the GPU involves more security checks than in a native app, to keep Web users safe. However, once the data is on the GPU, drawing with it is fast."

An excerpt from the Ars Technica article:

"The fact that asm.js doesn't look like JavaScript any human would produce might seem like a problem. Scant few developers of native code programs use assembler, and asm.js is even more feature-deprived than most real assembly languages. Mozilla doesn't really intend for developers to write asm.js programs directly, however. Instead, the idea is that compilers use asm.js as the target, with programs themselves written in some other language.

That language is typically C or C++, and the compiler used to produce asm.js programs is another Mozilla project: Emscripten. Emscripten is a compiler based on the LLVM compiler infrastructure and the Clang C/C++ front-end. The Clang compiler reads C and C++ source code and produces an intermediate platform-independent assembler-like output called LLVM Intermediate Representation. LLVM optimizes the LLVM IR. LLVM IR is then fed into a backend code generator—the part that actually produces executable code. Traditionally, this code generator would emit x86 code. With Emscripten, it's used to produce JavaScript."

I'd only come across WebGL in passing, and I had never heard of asm.js, nor of the ability to compile C/C++ to JavaScript. 3D rending and native performance, all in the browser? Awesome-o-meter set to maximum indeed if this stuff works as intended.

Tuesday, September 10, 2013

Gamasutra post: Watch the intriguing '3-Sweep' 3D modeling technique in action

I read some pretty interesting articles today, mostly on DNA methylation, but this post from Gamasutra containing a video of a 3D modeling technology called the "3-Sweep" method topped the charts of the awesome-o-meter for today.

Friday, September 6, 2013

Questions about models to analyze selective pressures

We attend a journal club where we read an article and watch a presentation given by the student that chose the paper, once a week. This is my second semester in a journal club, and I want to start documenting my thoughts/questions on certain papers.

Today's paper is Comparative transcriptomics reveals patterns of selection in domesticated and wild tomato by Koenig et al. (2013) . This paper has presented some new-to-me analysis of gene expression, and in particular I hope to learn more about the models of evolution that they used in their analysis of selective pressures.

Currently, I'm having a difficult time understanding the conclusions they draw with respect to their results of fitting three models of evolution: Brownian motion single rate, Ornstein-Uhlenbeck, and Brownian motion two rate model. As a disclaimer, I'm not all that familiar with these models, or their analyses, so my concern may be totally off-base.  

My interpretation: After fitting the models and using Akaike information criteria statistical test they found genes that best fit the Brownian motion two rate model, and for those genes:
  •  S. pennellii branch had the largest number of genes 
  •  S. lycopersicum branch had the largest proportion of unique genes 
(Like I said, I'm unclear on how those were calculated.) So, on the off chance that I understood that correctly, my question is why they only hypothesized that the "rapid divergence in gene expression that has occurred in S. pennelli can be explained by neutral process."  

My confusion:
  • Divergence is calculated/assessed by comparing two objects, in this case branches. So, shouldn't there be an equally reasonable argument for evolution on the other branch, e.g. that human selection (domestication) occurred on the other branches that leads to the "rapid divergence"? 
  • And, why do they choose the neutral process? My limited understanding of Brownian motion says that BM does not only describe random drift, but other mechanisms follow this trend such as randomly changing selective regimes and continued change in independent additive factors of small effects. 

Another reason I am totally aware that I could be way off-base is because in their discussion, given what I believe are results of the same type of analysis, they draw conclusions the way I expect them to be drawn:

"The most extensive network requiring that we discovered in S. lycopersicum relates to light responsiveness. Loss of connectivity in this network may reflect selection for reduced light response in S. lycopersicum or may reflect a more robust response in the desert-adapted S. pennellii..."

Their acknowledgement that either of those changes may be what we are seeing is reasonable, whereas before their seemingly one-sided conclusion seems unreasonable.

Halp, please.

TL;DR (comment for journal club)

This paper presents a lot of new-to-me analyses. In particular I think it was innovative to use models of evolution on gene expression data when I believe that they are more commonly used on more traditional phenotype data like physical characteristics. However, their interpretation of the results with respect to fitting the models caused some confusion for me:

My limited understanding of the Brownian motion model is that it does not only describe random drift. Other mechanisms follow this trend such as randomly changing selective regimes or continued change in independent additive factors of small effects. However, of the multiple interpretations of the Brownian motion model, the authors only chose the n
eutral process interpretation without defending it.

Non-synonymous base substitutions

The Precocious Llama had the idea to keep track of some thoughts for journal club, and I'm joining in because I thought it was a good idea.

The paper was Comparative transcriptomics reveals patterns of selection in domesticated and wild tomato by Koenig et al. (2013).

I thought the paper was a fun introduction to quite a few analyses I hadn't encountered before, including the models the Precocious Llama discusses in her post. I can't speak to their technical merits since I know very little about them, but it didn't seem like terribly many actionable conclusions were drawn. The Llama pointed out these examples of vague conclusions: "Enrichment for these categories indicates that abiotic and biotic stresses have played a major role driving transcriptional variation among these species" (do any stresses exist outside of these two categories?), and "Taken together, our studies highlight both parallels and contrasts between natural and artificial selection and their effects on genome evolution".

My specific comment for journal club:
  • At least in the mammalian and yeast literature, there is evidence that some synonymous base substitutions (which the authors call neutral divergence) are actually under selection, especially in regions of condensed chromatin structure. I think this provides an alternate explanation for what the authors say is a reduction in neutral divergence near the centromere shown in Figure 1. 
References for my comment:

Thursday, September 5, 2013

Nature Jobs article: Two-body blessing by J.T. Neal

This post was written by J.T. Neal and posted at Nature Jobs. It's commonly referred to as the "two-body problem", but I agree with Dr. Neal; it's completely awesome to get to work together all the time. I wouldn't trade it for anything.

Wednesday, September 4, 2013

Gamasutra post: Modeling by numbers by Jayelinda Suridge

This post on procedural geometry posted on Gamasutra written by Jayelinda Suridge looks to have a useful tutorial for building 3D models from code. I suspect C++ has mesh building libraries as well.


Additional parts:

Sunday, September 1, 2013

Postmortem: stacksToVCF

Now that the stacksToVCF program is, for the most part, complete, I wanted to do a brief postmortem on it.

What worked:

  • External library: It was the first time I've used a library outside of the C++ Standard Library. I used mysql++, which is a wrapper for the MySQL C API. Using an external library was much easier than I had thought it would be, probably thanks to the excellent documentation for mysql++.
  • Object Oriented Programming: It was the first time I've tried to organize and implement a program using classes. I had an easier time thinking about the code with this abstraction in place.
  • C++: I had considered doing this in Python, but since it was a slightly bigger project than my typical throw away code, I went with C++ because I believe it forces me to be less sloppy (that's not to say that I write clean code in C++). With Python, I almost never think about memory or data structures, and I like that C/C++ are a little less forgiving. For example, I had an issue where I was getting really strange behavior using sprintf(); it was changing the value of a different variable. As it turns out, it was because I was accessing memory outside of the bounds of the array I was interested in, and this was causing undefined behavior. I think programming in C/C++ is much more instructive than Python (although I think it's tough to beat Python in terms of time to write).
  • vim: I used gvim as an IDE, using the c.vim plugin. For my purposes, it worked great, and I hope to get better with it.
  • Git: Using git and GitHub were useful and relatively painless. I hope to get more familiar with these tools. 

What didn't work:

  • My understanding of C/C++: I definitely need to work on my understanding of the relationship between pointers, pointers to arrays and strings; i.e. char * , char array[], and std::string. This caused me a bit of trouble, and I eventually got lazy and went with std::string even though that required a bit of type conversion between the mysqlpp::String to the std::string (mysqlpp::String has an automatic type conversion to a const char * , but not to a std::string). 

Lastly, I asked Precocious Llama's father, who spent his career doing systems software engineering, to take a look at my code and provide suggestions on how to improve. He suggested the following:
Here are a few things that I would do differently:

1)  Command line error outputs:  You have 3 different error outputs.  I would only have one.  When any part of the command line is incorrect, I would just output the "manual".  The manual just basically says:  Here's how to use this program...

2)  for (int i = 0; ...)
While this is perfectly correct for C++, it's not acceptable by most C compilers.  And it offers no advantage over the traditional C way.  So I would do:

main (...)
   int i;

   for (i = 0; ...)

3) C++ allows variable redefinition.  I don't see it as an advantage.  That feature can lead to very confusing (and thus hard to debug) code.  While you are not doing it now, it could become a habit later on:  You declare new variables inside your code and just before they are needed:

  int outputCounter = 0;
  int errCounter = 0;

I like the following better:
routine (...)
   // All variables declared here

   // Code begins here.  No variable definition from this point on