October, 2010 - Chad Burrus's Blog

Last week, I spent Thursday and Friday at the Strange Loop conference in St. Louis, Missouri. It turned out to be a pretty great conference, even if I missed out on a lot of interesting talks due to a school project I had to finish and a lot of time spent volunteering. The ones I did make it to were amazing, however. (You can access the slides for each of the talks at the Strange Loop presentations page).

Thursday

I started off with Hilary Mason‘s talk on machine learning, and even through I missed the first 10-15 minutes or so, I arrived in time to hear her (a) run through Bayes’ Theorem in about a minute without going too far over the audience’s heads (or so I thought), (b) give a high-level overview of her Twitter Commander (Github link) Python application (which uses machine learning techniques to filter tweets from the 640 people she follows on Twitter) (c) say “Cheat as much as you can” in regards to solving difficult problems (which is basically one of my own programming mantras) and (d) mention both of my machine learning textbooks (Machine Learning by Thomas Mitchell and Pattern Recognition and Machine Learning by Christopher Bishop) casually in the Q&A session after her talk. Not bad for only catching 45 minutes or so of her talk.

Then, I had to take a hiatus to finish up my project proposal for, ironically enough, my Machine Learning class. That took a while.

The next session I attended was by Kyle Simpson, better known as Getify in the webdev world. (I hadn’t heard of him, but it’s a big place and I’ve been out of the loop for a while.) He gave a talk promoting using Javascript for what he referred to as the “middle end” of webapp development–the part of the webapp where the browser code meets the server code. The basic idea is rooted in the DRY (Don’t Repeat Yourself) principle–rewriting code is bad, but when writing web applications, we typically use two different programming languages (one on the client, one on the server) and repeat ourselves all over the place. Getify’s premise is that since we can’t run PHP, ASP, or any other server-side language in the browser, we should change the server to run the browser’s language, Javascript. He actually built a proof of concept JS-based server from the V8 Javascript engine called BikehainJS (Github link). He built a basic, but functional URL shortener called shortie.me using BikechainJS and walked us through the basic code behind the app. Pretty cool stuff, and I like the idea, though I think he’s got a long way to go to get this off the ground.

The next talk I heard was on Clojure‘s solution to the Expression Problem by Chris Houser (or Chouser, as I knew him at Sentry Data Systems), co-author of the new The Joy of Clojure book. The Expression Problem is kind of complicated (I’m still not sure I understand all the nuances), but the basic idea is that when we try to make an existing data type do something it’s not meant to do (to “express” a new trait), there are major issues. Chouser’s example involved extending two custom classes (which we have complete control over) with methods to display themselves in a report, and then trying to back-port those display methods to built-in classes (such as Vectors in Java–see his talk for full details, because I’m glossing over a lot). Most languages provide some way to do this (wrapper classes, monkey patching), but Clojure has a couple of really cool methods called multimethods and protocols. Multimethods are a sort of function overloading while protocols feel like advanced monkey patching with some extra safeguards. While those methods are cool, I think I’ll stick with monkey patching since the languages I use most often don’t have those tricks and I haven’t run into any of monkey patching’s problems. (Yet.)

The last talk on Thursday was by Guy Steele, one of the original Lisp hackers and one of the original authors of the Jargon File, along with numerous other things. His talk was on parallel programming and how not to do it. After a long breakdown of a program he wrote on a punch card (which was fascinating and hugely relevant to the talk, but probably a little on the long side), he introduced a pretty simple idea: instead of manually mucking around with the details of parallel programming (analogous to what he had to do with his punch card), why don’t we just let the compiler figure it out? We already let the compiler (or really low level libraries) do most of the really annoying work like register management, anyway (unless we’re writing C or C++), and the compilers do a great job, much better than most people can do and tremendously faster. Why can’t we do that with parallel programming? We have to be a little smarter about how we design programs in the first place, and many (if not all) of the tricks we’ve been learning for the past 50 years or so no longer work, but the benefits are usually worth it in the long run, except for the smallest of toy programs. Once we’ve done the high-level work for the compiler, which it can’t do, we let it handle the nitty-gritty details of the memory and processor management. While Steele made his case, he also introduced his current work at Sun Labs: a programming language called Fortress where running code in parallel is almost trivial (so long as you design it right from the beginning, which is usually the problem). It does show a ton of potential and is really very cool–Fortress is definitely on my list of languages to check out.

Ok, that’s enough for now, as this post is getting quite long. I’ve covered all the talks I saw on Thursday, so now is a good time to take a break. I’ll finish up the talks from Friday later this week.

A couple of weeks ago, I ran across an article in the New York Times on some recent discoveries that totally rewrote the known history of HIV. Needless to say, I was intrigued. The article was actually about simian immunodeficiency virus (or SIV), the precursor for HIV that affects other primates. It is thought that SIV crossed into humans (thus becoming HIV) sometime in the early part of the 20th century, though the exact time frame isn’t known.

According to the NYT article, the scientific community thought that SIV was a relatively new virus, emerging in primates sometime in the last few hundred years, and that this new research basically trumps that idea. However, that’s only half the story, as you’ll discover if you actually read the paper presenting the research, which was published in Science on September 17 of this year.

In the paper, Worobey, et al., (the authors) set out to clear up the history of SIV. It turns out there were two competing theories for the evolution of SIV, one being the few centuries version mentioned in the NYT article. The other theory, which the new research supports, basically just said that SIV was old–probably very old, as some research suggests that similar viruses arose as much as 14 million years ago (see http://www.pnas.org/content/105/51/20362).

To solve this mystery, Worobey, et al., looked into six monkey species from Bioko Island, an island that separated from mainland Africa 10-12 thousand years ago. Interestly enough, they found 4 species-specific strains of SIV. Each of those monkey species (the red-eared guenon, the black colobus, the drill, and Preuss’s guenon) has a relative on the mainland that also possesses a strain of SIV, which makes it relatively easy to build a phylogenetic tree to help figure out just how long ago the strains split off from each other.

From what Worobey, et al., discovered, some strains of SIV have been around approximately 33,000 years ago, and quite possibly as much as 133,000. Here’s how.

Ignoring the possibility of human contamination, we know that the island strains diverged from their mainland counterparts at least 10,000 years ago, since that’s when the island became an island.
They then used amino acid sequence differences to estimate the most recent common ancestor (TMRCA) of the SIV variants. (Presumably using standard methods to make the estimates, but it’s not spelled out how they get their figures.) That gives us our estimate for sequence divergence roughly 77,000 years ago, with a 95% confidence interval ranging from ~33,000 to ~133,000 years ago.
They repeated their analysis using nucleotide differences and third codon differences. Each of those estimates came in much lower than the amino acid estimate (though still considerably more than the supposed SIV age of a few hundred years–so much for that theory).

So, why is this important? The authors give two main reasons.

If we’ve just pushed back the origin of SIV 33 thousand years or more, what does that mean for HIV, which we think only arose about 100 years ago? Could that also have an even longer history than we can even imagine right now? We should investigate.
We now know (or are extremely confident) that SIV is ancient, giving its hosts thousands of years to adapt to its effects. This probably explains why monkeys infected with SIV seem to exhibit such relatively minor problems from the virus, at least for some strains. All other things being equal (which is a huge assumption), that means that humans are not likely to develop a major resistance to HIV any time soon. That’s frustrating, but not totally surprising.

Needless to say, this research gives us a lot of insight into the origins of SIV and HIV, even if it doesn’t really help us from a practical perspective. It does bring up a couple new points for research, as mentioned by the authors, plus at least one more I thought of while researching some background information for this article. It seems the strain of SIV that actually causes major problems for its host resides primarily in chimpanzees. Since chimpanzees are the closest relatives of humans, genetically speaking, I wonder if the solution to stopping SIV or HIV might lie in the regions of the genome where we overlap, yet are different from the other primate species that serve as hosts for SIV. It’s a stretch, I’ll admit, but it could narrow the search space for a cure substantially, and I, at least, think that’s a good option to have.

Chad Burrus's Blog

Monthly Archives: October 2010

Strange Loop 2010: Thursday

Thursday

Article Summary: Island Biogeography Reveals the Deep History of SIV