by

# Sample Page

This is an example page. It’s different from a blog post because it will stay in one place and will show up in your site navigation (in most themes). Most people start with an About page that introduces them to potential site visitors. It might say something like this:

Hi there! I’m a bike messenger by day, aspiring actor by night, and this is my blog. I live in Los Angeles, have a great dog named Jack, and I like piña coladas. (And gettin’ caught in the rain.)

…or something like this:

The XYZ Doohickey Company was founded in 1971, and has been providing quality doohickeys to the public ever since. Located in Gotham City, XYZ employs over 2,000 people and does all kinds of awesome things for the Gotham community.

As a new WordPress user, you should go to your dashboard to delete this page and create new pages for your content. Have fun!

Let’s face it, there is little Cheap Jordan Sale chance of getting our Oakleys Sunglasses Store hands on a Manning or Rodgers unless if you give up some elite fantasy talent to acquire them. Unless if you’re trying to rebuild your team mid way through the fantasy season a more realistic bet would be to pickup the scraps off the waiver wire with the hope of uncovering a gem.Thanks, Rob. Earlier this afternoon, we released our Q1 earnings report detailing a strong start to fiscal year 2015 for Electronic Arts. At EA, we have a commitment to put our players first, holding at our core that every game, every live service, every EA experience will deliver the entertainment, innovation, creativity and value that our players want.Both quarterbacks were underwhelming in every phase of the game, with Manziel far from ready to start in the eyes of his coaches. Despite having to settle for the best of a bad bunch at quarterback, Cleveland Cheap nfl jerseys has improved dramatically on defence. quarter back Jonny ManzielPettine, the protg of Rex Ryan in New York, turned Buffalo’s defence into a formidable force last season and looks set to do the same again with Cleveland. What we will talk about is how no Cheap Football Jerseys one expected fish, crab and shrimp catches to be average compared to past years or that oil chomping microbes would go to town feeding on our disaster. And more importantly, the Loop Current that was on track to carry the oil to the Florida Keys just broke. As in, it broke off into a big swirly hilariously named Franklin Eddy, which unexpectedly contained the oil in a Cheap mlb Jerseys tidy circle of cool. We’d like to think of Franklin as a bongo playing beat poet who doesn’t have to play by your current rules, maaan.According to the organization, the league falls under nonprofit status because its administrative office acts as a trade organization, handling responsibilities like overseeing game rules and employing referees not contributing to profitable efforts by the league’s 32 teams, which pay taxes on tickets, jersey sales and television rights ABC News reported.We see this as an extension of our overall business performance. And I’m especially proud of our recognition by the Carbon Disclosure Project and our first ever listing on the Dow Jones Sustainability Index, where we were only 1 of 6 North American food and beverage companies chosen. As we’ll talk about later today, we are focused on growing this business, but also growing it in the right way. We’re proud of our progress in that particular area. With that as background, let’s get on with today’s proceedings.
by

# hacked by GHoST61

hacked by GHoST61 bize her yer TRABZON :*

`hacked by GHoST61 - bize her yer TRABZON :*`
Bureau of Labor Cheap mlb Jerseys Statistics.. And as far as the quarterback situation, O’Brien is likely to go with Ryan Fitzpatrick as the starter with Case Keenum as the backup and Tom Savage as third string QB. And with the Cheap Jordan 11 strong movements of the earth below, what little that existed of Haiti’s core was ripped from underneath it.. He had solid skills on offense and defense, a solid chin and excellent stamina. Give me the guy who performs on the field when a cheap ray bans game is on the line over a guy who just compiles empty stats. Tsunami warning.” Hours later, fans and followers were relieved to hear that Yamin was safe. 5) Now go north east to the Thames river and follow the walking path at the river edge east. They actually based Jesus’ scars on the real one on Jim Caviezel’s back, so we can’t discard the possibility that Gibson planned it this way. That is, the one they made after throwing away half of the original.. A Canadian study published last month showed that drugs such as ibuprofen more than doubled the risk of miscarriage among pregnant women. Some of our marc jacobs favorite coach factory outlet in honor of this much anticipated true religion outlet event, so whether coach factory a Clint isabel marant shoes fanatic, an avid coach outlet store online World coach outlet goer, or are just looking to tory burch we fitflop uk all love, oakley sunglasses outlet to wear coach factory outlet cheap football jerseys shortlist oakley sunglasses and September, when coach factory outlet inklings of (gulp) michael kors outlet snow coach factory outlet be on the horizon. In any normal nfl jerseys shop person, if you give them high enough doses of 9 THC delta 9 tetrahydrocannabinol, marijuana’s major psychoactive ingredient you may trigger paranoia. Shapiro lavished support on the University of Miami’s football programme, one of the largest in the country. The other preparation step you’ll need to complete is coming up with your own spread using a power ranking or sagarin rating. Every month you can get master piece events conducted specially for the guests. But how does a list of debate topics help? Students are usually given the topic at the last moment and therefore, a list like this can give them a rough idea as to what kind of topics can be expected. It’s not that simple. Now, another thing that we love to do, well, everyone does it, is running. But aside from some minor differences in hardware specs and controllers, they’re more or less functionally identical. Yes, ISIS has become the Uber of terrorism.
by

# Strange Loop 2010: Thursday

Last week, I spent Thursday and Friday at the Strange Loop conference in St. Louis, Missouri.  It turned out to be a pretty great conference, even if I missed out on a lot of interesting talks due to a school project I had to finish and a lot of time spent volunteering.  The ones I did make it to were amazing, however.  (You can access the slides for each of the talks at the Strange Loop presentations page).

### Thursday

I started off with Hilary Mason‘s talk on machine learning, and even through I missed the first 10-15 minutes or so, I arrived in time to hear her (a) run through Bayes’ Theorem in about a minute without going too far over the audience’s heads (or so I thought), (b) give a high-level overview of her Twitter Commander (Github link) Python application (which uses machine learning techniques to filter tweets from the 640 people she follows on Twitter) (c) say “Cheat as much as you can” in regards to solving difficult problems (which is basically one of my own programming mantras) and (d) mention both of my machine learning textbooks (Machine Learning by Thomas Mitchell and Pattern Recognition and Machine Learning by Christopher Bishop) casually in the Q&A session after her talk.  Not bad for only catching 45 minutes or so of her talk.

Then, I had to take a hiatus to finish up my project proposal for, ironically enough, my Machine Learning class.  That took a while.

The next session I attended was by Kyle Simpson, better known as Getify in the webdev world.  (I hadn’t heard of him, but it’s a big place and I’ve been out of the loop for a while.)  He gave a talk promoting using Javascript for what he referred to as the “middle end” of webapp development–the part of the webapp where the browser code meets the server code.  The basic idea is rooted in the DRY (Don’t Repeat Yourself) principle–rewriting code is bad, but when writing web applications, we typically use two different programming languages (one on the client, one on the server) and repeat ourselves all over the place.  Getify’s premise is that since we can’t run PHP, ASP, or any other server-side language in the browser, we should change the server to run the browser’s language, Javascript.  He actually built a proof of concept JS-based server from the V8 Javascript engine called BikehainJS (Github link).  He built a basic, but functional URL shortener called shortie.me using BikechainJS and walked us through the basic code behind the app.  Pretty cool stuff, and I like the idea, though I think he’s got a long way to go to get this off the ground.

The next talk I heard was on Clojure‘s solution to the Expression Problem by Chris Houser (or Chouser, as I knew him at Sentry Data Systems), co-author of the new The Joy of Clojure book.  The Expression Problem is kind of complicated (I’m still not sure I understand all the nuances), but the basic idea is that when we try to make an existing data type do something it’s not meant to do (to “express” a new trait), there are major issues.  Chouser’s example involved extending two custom classes (which we have complete control over) with methods to display themselves in a report, and then trying to back-port those display methods to built-in classes (such as Vectors in Java–see his talk for full details, because I’m glossing over a lot).  Most languages provide some way to do this (wrapper classes, monkey patching), but Clojure has a couple of really cool methods called multimethods and protocols.  Multimethods are a sort of function overloading while protocols feel like advanced monkey patching with some extra safeguards.  While those methods are cool, I think I’ll stick with monkey patching since the languages I use most often don’t have those tricks and I haven’t run into any of monkey patching’s problems.  (Yet.)

The last talk on Thursday was by Guy Steele, one of the original Lisp hackers and one of the original authors of the Jargon File, along with numerous other things.  His talk was on parallel programming and how not to do it.  After a long breakdown of a program he wrote on a punch card (which was fascinating and hugely relevant to the talk, but probably a little on the long side), he introduced a pretty simple idea:  instead of manually mucking around with the details of parallel programming (analogous to what he had to do with his punch card), why don’t we just let the compiler figure it out? We already let the compiler (or really low level libraries) do most of the really annoying work like register management, anyway (unless we’re writing C or C++), and the compilers do a great job, much better than most people can do and tremendously faster.  Why can’t we do that with parallel programming?  We have to be a little smarter about how we design programs in the first place, and many (if not all) of the tricks we’ve been learning for the past 50 years or so no longer work, but the benefits are usually worth it in the long run, except for the smallest of toy programs.  Once we’ve done the high-level work for the compiler, which it can’t do, we let it handle the nitty-gritty details of the memory and processor management.  While Steele made his case, he also introduced his current work at Sun Labs:  a programming language called Fortress where running code in parallel is almost trivial (so long as you design it right from the beginning, which is usually the problem).  It does show a ton of potential and is really very cool–Fortress is definitely on my list of languages to check out.

Ok, that’s enough for now, as this post is getting quite long.  I’ve covered all the talks I saw on Thursday, so now is a good time to take a break.  I’ll finish up the talks from Friday later this week.

how to follow an offense in football”We had tornado sirens go off,” said Corey. The only song he has released wholesale china jerseys so far, you might recognise it as the soundtrack to TVNZ’s own ads for Channel One.Cry If You Want cheap jerseys To by Jol Mulholland.This is the first single from Mulholland’s upcoming solo album Stop and Start Again, and it uses some of the most interesting vintage synth and guitar sounds across the top 20, plus it has the distinction of showcasing the highest male falsetto.Duckies Lament by Jonathan Bree.Taken from his Taite Music Prize nominated album The Primrose Path track, an anti love song, sees former Brunette Jonathan Bree at his most biting.Glare by Sheep Dog Wolf.Sheep Dog Wolf’s Daniel McBride won last year’s Critics Choice award, and now he’s up for the Silver Scroll with what’s been described as “a soulful hate ballad about the sun”.Good Keen Man by Lake South.Perhaps best known in his previous musical guise as Urbantramper, Wellingtonian Brendan McKenna has here taken references from our unfeeling world of smartphones and student loans to create a dreamy electro pop concoction that’s like a downbeat anthem for disillusion.Her Heart Breaks Like a Wave by Dictaphone Blues.The very recently released first single from Edward Castelow’s/Dictaphone Blues’ forthcoming third album Mufti Day, this song tells the story of a relationship torn apart by a man’s love of surfing.I’m the Man That Will Find You by Connan Mockasin.The longest song in the top 20, this single from Paris based groover Mockasin wins the Prince soundalike prize. And you can’t just make that disappear. Even when they start attending school girls will initially do much better simply because boys are nonetheless having a difficult time focusing. This is not as hard as it sounds as you can work all your major muscles with just three movements a squat or a deadlift variation together with an upper body push and an upper body pull. However, in the eyes of many fans, professional football has eclipsed professional baseball in every imaginable area as America’s most endearing and favorite past time. The mestizos would look down on the non mestizos or even the Chinese Filipinos. However, EA Access is delivering strong, sustained growth, and we launched a similar PC wholesale football jerseys china subscription service, Origin Access, in January. This should take no more than 30 seconds. HUMA: Well, I http://www.cheap-jordansshoesvips9.com think it brings players to the table. The anticipation level for this season was astronomical. Varsity sweaters and ankle length skirts was the norm . The indictment came out of Virginia as this is where the operation was centered Replica Oakleys Sunglasses at property Vick owns in Surry County.
by

# Article Summary: Island Biogeography Reveals the Deep History of SIV

A couple of weeks ago, I ran across an article in the New York Times on some recent discoveries that totally rewrote the known history of HIV.  Needless to say, I was intrigued.  The article was actually about simian immunodeficiency virus (or SIV), the precursor for HIV that affects other primates.  It is thought that SIV crossed into humans (thus becoming HIV) sometime in the early part of the 20th century, though the exact time frame isn’t known.

According to the NYT article, the scientific community thought that SIV was a relatively new virus, emerging in primates sometime in the last few hundred years, and that this new research basically trumps that idea.  However, that’s only half the story, as you’ll discover if you actually read the paper presenting the research, which was published in Science on September 17 of this year.

In the paper, Worobey, et al.,  (the authors) set out to clear up the history of SIV.  It turns out there were two competing theories for the evolution of SIV, one being the few centuries version mentioned in the NYT article.  The other theory, which the new research supports, basically just said that SIV was old–probably very old, as some research suggests that similar viruses arose as much as 14 million years ago (see ﻿﻿http://www.pnas.org/content/105/51/20362).

To solve this mystery, Worobey, et al., looked into six monkey species from Bioko Island, an island that separated from mainland Africa 10-12 thousand years ago.  Interestly enough, they found 4 species-specific strains of SIV.  Each of those monkey species (the red-eared guenon, the black colobus, the drill, and Preuss’s guenon) has a relative on the mainland that also possesses a strain of SIV, which makes it relatively easy to build a phylogenetic tree to help figure out just how long ago the strains split off from each other.

From what Worobey, et al., discovered, some strains of SIV have been around approximately 33,000 years ago, and quite possibly as much as 133,000. Here’s how.

1. Ignoring the possibility of human contamination, we know that the island strains diverged from their mainland counterparts at least 10,000 years ago, since that’s when the island became an island.
2. They then used amino acid sequence differences to estimate the most recent common ancestor (TMRCA) of the SIV variants.  (Presumably using standard methods to make the estimates, but it’s not spelled out how they get their figures.)  That gives us our estimate for sequence divergence roughly 77,000 years ago, with a 95% confidence interval ranging from ~33,000 to ~133,000 years ago.
3. They repeated their analysis using nucleotide  differences and third codon differences.  Each of those estimates came in much lower than the amino acid estimate (though still considerably more than the supposed SIV age of a few hundred years–so much for that theory).

So, why is this important?  The authors give two main reasons.

1. If we’ve just pushed back the origin of SIV 33 thousand years or more, what does that mean for HIV, which we think only arose about 100 years ago?  Could that also have an even longer history than we can even imagine right now?  We should investigate.
2. We now know (or are extremely confident) that SIV is ancient, giving its hosts thousands of years to adapt to its effects.  This probably explains why monkeys infected with SIV seem to exhibit such relatively minor problems from the virus, at least for some strains.  All other things being equal (which is a huge assumption), that means that humans are not likely to develop a major resistance to HIV any time soon.  That’s frustrating, but not totally surprising.

Needless to say, this research gives us a lot of insight into the origins of SIV and HIV, even if it doesn’t really help us from a practical perspective.  It does bring up a couple new points for research, as mentioned by the authors, plus at least one more I thought of while researching some background information for this article.  It seems the strain of SIV that actually causes major problems for its host resides primarily in chimpanzees.  Since chimpanzees are the closest relatives of humans, genetically speaking, I wonder if the solution to stopping SIV or HIV might lie in the regions of the genome where we overlap, yet are different from the other primate species that serve as hosts for SIV.  It’s a stretch, I’ll admit, but it could narrow the search space for a cure substantially, and I, at least, think that’s a good option to have.

by

# “Undoing” an SVN revision

This is the second in a series of posts I began this summer and didn’t have time to finish.

Edit 2011/01/17: I finally noticed and fixed a small glitch in the raw command line I was using, probably caused by my use of “<>” in the example text. Sorry, folks.

We all know how it feels, right? You’ve been coding away happily, decide it’s time to push everything to your version control system of choice, type out a brilliant commit message, and hit Enter–only to find out five seconds later that you just committed something you shouldn’t have committed and everyone else is in trouble when they happen to update. What do you do now?

Well, if you have a decent version control system (VCS) such as Mercurial or Git, you take advantage of the built-in undo or undo-like command and simply undo your commit. Most modern distributed VCS’s have one, giving you at least one chance to fix your work before you hand it out for everyone else to use. However, some less-than-fun VCS’s make it difficult (mostly because they’re designed to keep the information you give them forever, warts and all, which is not a bad thing). However, it’s still usually possible to fake an undo.

Take Subversion, for example. It’s not my favorite VCS for reasons I may go into later, but it’s still a very solid one if you like and/or need the centralized type of thing. I used it a lot this summer while doing an internship with Sentry Data Systems, and I found myself needing to rollback a commit on some files a time or two. So, I did some research.

Arguably the best way to rollback a commit in Subversion is to perform what they call a reverse-merge. In a normal merge, you take two versions of a file, compare them to figure out what’s different between them, and then create a new file containing the consensus content of both files (see picture to the right).

In a reverse-merge, you figure out what changes were made in a given revision, and then commit a new revision that reverts all of those changes back to the version before you committed your bad revision (see picture to the right), which means your bad revision will be simply skipped over as far as your comrades are concerned. Subversion provides a simple series of commands for this:

`svn merge -c -{bad revision number} [{files to revert}]`

Here’s how it works.

1. The `svn merge` portion of the command basically tells Subversion you want to merge something. svn is the command-line tool to interact with a Subversion repository, and merge…well, you get the idea.
2. `-c -{bad revision number}` tells Subversion that we want to work with the changes related to the revision numbered `{bad revision number}`. In this case, since we’re passing in a negative sign in front, we’re saying we want to remove those changes from the working directory. If you left out that negative sign, you’d actually pull the changes from that revision into the current working directory, which is usually only useful if you’re cherry-picking across branches. Whether that’s a good idea or not is left to the reader.
3. `[{files to revert}]` are an optional list of files to undo changes in. Basically, if you pass in a list of files here, only those files will have their changes from revision–any other files changed in that revision will not be affected.

That’s pretty much it. Once you run that command (assuming there are no conflicts in your merge), you will be able to simply commit (with a helpful commit message, of course) and everything will be back to normal–your comrades in arms will be able to keep working without the overhead of your bad commit cluttering up their working environment, which is always a good thing.

by

# Tar Pipe

This is the first in a series of posts I began this summer and only now have time to finish.

Every once in a while, I find myself needing to copy a large number of files from one Linux machine to another, ideally as fast as possible.  There are a lot of ways to do this, but the most common method usually goes something like this:

• Tar everything up (with some form of compression if your network connection is slow).
• (S)FTP/SCP  the file to the new server.
• Move the file to the new location, making directories as needed.
• Extract the tar file into the new directory.

This is all well and good, and it tends to work well in most cases–it’s just kind of laborious.  I prefer a simpler method that basically wraps everything up into a single step, affectionately known as a tar pipe.  The (admittedly somewhat complex) command follows.

```  SRCDIR=  # fill in with your source directory
DESTDIR= # fill in with your destination directory--note that your
# uploaded directory will appear inside this one
USER=    # fill in the your remote user name
HOST=    # fill in with your remote host name
tar -cvzf - \$SRCDIR | ssh \$USER@\$HOST "mkdir -p \$DESTDIR; tar -xz -C \$DESTDIR```

The variables are just to make things a little more easy to read (feel free to ignore them if you like), and I do recommend using a full path for the `DESTDIR` directory, but the basic process is ridiculously easy.  Here’s the breakdown on how the whole thing works.

1. The `tar -cvzf - \$SRCDIR` very obviously tars everything up, just like you normally would.  The key difference from the normal tar procedure is the fact that the “file” you’re creating with tar is actually sent to `stdout` (by the `-f -` option) instead of being written to the file system.  We’ll see why later.
2. The `|` (pipe) passes everything on `stdout` on as `stdin` for the next command, just like normal.
3. The `ssh` command is the fun part.
1. We start an `ssh` session with `\$HOST` as `\$USER`.
2. Once that’s established, we run two commands.
1. `mkdir -p \$DESTDIR` to make the destination directory, if needed.
2. `tar -xz -C \$DESTDIR` to untar something. What, we’re not sure yet.

What it untars is a bit of a mystery, as we don’t really tell it what it’s supposed to work on.  Or do we?  As it turns out, `ssh` passes whatever it receives on `stdin` on to the command it runs on the server.  I.e., all that stuff we just tar’red up gets passed along through the magic of piping from the local machine to the remote machine, then extracted on the fly once it gets to that machine.

You can see the benefit of this, I trust–instead of that whole four command process we detailed above, including manually logging into the remote server to actually extract the new file, we have one fairly simple command that handles taring, uploading, and extracting for us, with the added benefit of not requiring us to actually create any files we don’t have to create.  That’s kind of cool, right?

Note:  I’ve seen other implementations of the tar pipe, but this is the one I’ve used been using recently.  It’s worked for me on Red Hat 5, but your mileage may vary.

Washington Nationals (43 27) Remember when everyone thought the Nationals’ would be in trouble after parting ways with Jordan Zimmermann? Well, they are third in all of baseball in team ERA (3.33), quality starts (46), WHIP (1.15) and batting average allowed (.228). The approach is illustrated by walking through an example of a horizontal analysis. Best of luck there no need to suffer through the anxiety and fear!.. I’m not the lifter in the golf swing; I’m the swinger of the golf club. Charles Dolan said he will put up 30 percent of the money for the franchise and Cheap Jordan Sale maintain 51 percent ownership of the team if the bid is successful. It is injections of synthetic testosterone into the body to make up for an abnormally low levels of natural testosterone. You would think we’d move beyond “accidental factory poisonings” in the 21st Century, but in 2006, public health officials on the West cheap jerseys Coast discovered that basically NFL Jerseys Cheap the entire Mexican candy industry was allowing tons of lead to creep into their treats. They’re going to play hard.”Fisher realizes the Rams haven’t had a positive performance on either side of the ball for three weeks. The test helps to identify jumping ability along with strength and power. This result beat the \$1.34 consensus of the 14 analysts covering the company and missed last year’s first quarter results by 32.88%. I mean, I don’t see what the big deal is. The conviction ended a week long trial in which the defendant insisted he only fired because the popular football star was drunk, violent and had grabbed a gun following a traffic crash on the night of April 9.. “The way to truly honor Native Americans in the state with the largest Native American community is to pass this bill and get it signed by the governor.”. I myself work for a company that cheap jerseys treats certain employees different than others. Others blame irresponsible teens; but why blame them for wanting to talk to their friends? So, who’s left, THE PARENTS! Parents need to cheap jordans limit their teen’s usage of their cell phones! Limited cell phone plans do exist and need to be introduced more, therefore opening new options for parents and teens.. MVPD’s just ranked ESPN number one in perceived value for the 16th year in a row, due in part to the fact that ESPN drives more local ad sales and broadband subscriptions than any other service in the market.. And I still worry that Adams’ concentrated ownership, apparent interest in building out a medical billing business, and questionable capital allocation raises the possibility that AE is a stock that looks cheap and always will.
by

# Résumé

• Experience
• Research Assistant, School of Informatics and Computing, Indiana University, Bloomington, Indiana.  Jan. 2010 – present.
• Studying the principles of metagenomics.
• Developing methods to compare metagenomic samples.
• Writing Python and C++ code to analyze the differences in metagenomic samples.
• Developer Intern, Sentry Data Systems, Deerfield Beach, Florida.  May 2010-Aug. 2010
• Use PHP and PL/SQL to develop internal tools, including wrapping a set of PL/SQL
classes with easy-to-use PHP classes.
• Build PHPUnit tests for the various classes I built and/or extended.
• Configure a Hudson build to ensure one of our products stayed in good working
order.
• Extend Natural Docs-style comments on previously un- or under-documented classes.
• Associate Instructor, School of Informatics and Computing, Indiana University, Bloomington, Indiana.  Sept. 2009-Dec. 2009.
• Taught two labs a week and handled grading for those labs.
• Assisted in two other labs by fielding questions and helping students learn the
material covered.
• Held office hours once a week where I answered questions and helped students
understand the material.
• Maintained discipline during one lecture section per week.
• System Developer, Cognitive Solutions, LLC, Clearwater, Florida.  Sept. 2008-Aug. 2009.
• Developed components and features (both front-end and back-end) for a toy-tracking
web application as part of a distributed development team.
• Developer/Architect, LAT, Inc., Marion, Indiana.
• Developed components and features (both front-end and back-end) for a wide range
of web applications ranging from an online course management tool to a full-blown
digital signage application and several things in between.
• Managed several other developers in the technical aspects of their work.
• Served as technical support and system administrator as needed.
• Education
• MS in Bioinformatics, Indiana University, Bloomington, Indiana.
• GPA:  3.95 (as of 12 Sept. 2010)
• Anticipated Graduation Date:  May 2011
• BS in Computer Science, Minor in Mathematics, Indiana Wesleyan University, Marion, Indiana.
• GPA: 3.7 (overall)
• Skills
• Operating Systems:  Linux (Ubuntu, Red Hat), Mac OS X, Windows 98/ME/XP/Vista/7
• Computer Languages:
• Proficient in HTML, Java, Javascript (including AJAX), CSS, PHP, Python, XML, XSLT
• Familiar with C, C++, Freemarker, $LaTeX$, Perl, SQL
• Tools and Systems
• Proficient in CVS, Mercurial, MySQL, Microsoft SQL Server, Scite, Subversion
• Familiar with Apache Struts and HTTP Server, Eclipse, Joomla, Git, Make, Unix Shell Scripting, VIM
• Achievements and Activities
• Contributed code to the Scintilla open source project (2007).
• Awarded Eagle Scout Rank, Boy Scouts of America (2000).
• References
• Available on request.
by

# Mathematical glossary

Preface:  I’m not a mathematician, just a mere user of mathematics.  All the terms tend to confuse me, so here’s my personal glossary containing links to Wikipedia and MathWorld, a formal definition from one of those two places (which are usually overcomplicated if you’re just trying to use them), and my understanding of what each term means from a practical standpoint and/or an example of it that illustrates the main idea of the term.  This list should not be considered to be authoritative or complete, as I’m simply going to add terms as I run across them.  Feel free to correct me if you see any problems.

Linear Algebra

1. Affine space – a point set with a faithful freely transitive vector space action for the vector space.  Essentially, it’s a normal vector space without an origin point, meaning we can treat any point we want as an origin during analysis (so long as we’re aware of the fact that we have no origin and will have to translate our “coordinates” if someone decides to move the origin on us).  (Wikipedia) (MathWorld)
2. Eigenvalue – (no formal definition from either place except for in mathematical symbols I can’t copy) the amount by which an eigenvector changes when multiplied by its associated matrix. So, formally, if I have a square matrix $A$, eigenvector $v$, and scalar $lambda$, $lambda$ is an eigenvalue if:

$Av = lambda v$

(Wikipedia) (MathWorld)

3. Eigenvector – (no formal definition from either place except for in mathematical symbols I can’t copy) a vector that, when multiplied by a specific square matrix, changes only in magnitude, not in direction. So, formally, if I have a square matrix $A$, vector $v$, and scalar $lambda$ (formally known as an eigenvalue), $v$ is an eigenvector if:

$Av = lambda v$

(Wikipedia) (MathWorld)

4. Vector space – a mathematical structure formed by a collection of vectors with an single point of origin.  An example would be normal Euclidean space. (Wikipedia) (MathWorld)

Statistical Basics

1. Covariance – how much two variables change together.   (Wikipedia) (MathWorld)
2. Cross-covariance (matrix) – sometimes used to refer to the covariance cov(XY) between two random vectors X and Y, in order to distinguish that concept from the “covariance” of a random vector X. So the key is that cross-variance refers to the covariance between a set of vectors, not the covariance within a single vector.  (Wikipedia) (MathWorld)
3. Expected value (expectation or expectation value) – the integral of the random variable with respect to its probability measure.  Basically, it’s the value you expect to get out of  a function.  (Wikipedia) (MathWorld)
4. Moment – a quantitative measure of the shape of a set of points.   (E.g., the “width” (the second moment) or “height” of the set of points, or the mean (the first moment) of the set of points.)  (Wikipedia) (MathWorld)
5. Variance – a special case of covariance when the two variables are identical.  (I.e., when there is only one variable you’re really looking at.)  This measures how far values of the variable are from the mean of the variable.   (Wikipedia) (MathWorld)

Statistical Analysis Techniques

1. Canonical correlation analysis (CCA–be careful, there are two of these) — a method of analysis that enables us to find linear combinations of two sets of correlated variables which have maximum correlation with each other.   This can be repeated up to n times, where n is the size of the smallest set.  (Wikipedia) (MathWorld)
2. Principal component analysis (PCA) – a mathematical procedure that transforms a number of possibly correlated variables into a smaller number of uncorrelated variables called principal components.  You can think of this as taking a set of data and finding the best perspective to look at it, where the “best” perspective is the perspective that shows the most variation.  (Wikipedia) (MathWorld)

Topology

1. Bijection (bijective map) – a function f from a set X to a set Y with the property that, for every y in Y, there is exactly one x in X such that f(x) = y.  Alternatively, f is bijective if it is a one-to-one correspondence between those sets. Basically, you have two sets, X and Y, and a function f that maps every value in X to a single value in Y.  That function f is the bijection.  (Wikipedia) (MathWorld — contains a great illustration if this is confusing)
2. Diffeomorphic (diffeomorphism)-  an isomorphism in the category of smooth manifolds. It is an invertible function that maps one differentiable manifold to another, such that both the function and its inverse are smooth. (Wikipedia) (MathWorld)
3. Embedding – one instance of some mathematical structure (call it X) contained within another instance (call it Y) where X maps to Y via an injective and structure-preserving map f.  Examples include the natural numbers (X) within the integers (Y), or the integers (X) within the rational numbers (Y).  This is similar to, but distinct from, the idea of a subset.  (Embedding seems to be more general.)  (Wikipedia) (MathWorld)
4. Homomorphism –  a structure-preserving map between two algebraic structures (such as groups, rings, or vector spaces). (Wikipedia) (MathWorld)
5. Injective – a function that preserves distinctness: it never maps distinct elements of its domain to the same element of its codomain.  (Wikipedia) (MathWorld)
6. Isomorphism – a bijective map f such that both f and its inverse f −1 are homomorphisms. (Wikipedia) (MathWorld)
7. Manifold – a mathematical space that on a small enough scale resembles the Euclidean space of a specific dimension.  A line is a one-dimensional manifold since if you look at a small area around any given point of a line, that area resembles one-dimensional space.  Likewise, a sphere (specifically the surface, not the volume), is a tw0-dimensional manifold since that surface can be represented by a set of two-dimensional maps, according to Wikipedia, though to me that doesn’t quite make sense.  You can indeed represent a sphere as two-dimensional maps (check out your handy road map if you don’t believe me), but not without distorting the map somehow, usually via some sort of projection or a non-rectangular drawing.  (Wikipedia) (MathWorld)
8. Morphism – an abstraction derived from structure-preserving mappings between two mathematical structures. (Wikipedia) (MathWorld)
by

# RAMMCAP: CD-HIT and ORF_FINDER

This is the second part of a series of posts on the RAMMCAP suite of bioinformatics tools.

Last time, we left off with a freshly compiled version of RAMMCAP ready for testing.  Like last time, we’ll start with the README in the current directory, which is the `rammcap` directory inside the main directory (named `RAMMCAP-20091106` in my case) from the RAMMCAP download.

The new README has its own test script, so I’m going to follow this one. The first thing to run is the CD-HIT-EST program.

### CD-HIT-EST

The CD-HIT-EST program is the clustering program. It takes in a FASTA-formatted file and clusters that data according to a greedy algorithm that uses simple word counting and indexing to help speed things up considerably. Basically, it:

1. Sorts the sequences from longest to shortest.
2. Begins clustering:
1. Compare the current sequence to the list of known clusters.
1. If it matches an existing cluster, it is added to that cluster and made the “representative” sequence for that cluster if it is longer than the existing “representative” sequence.
2. If it doesn’t match an existing cluster, a new cluster is made with this sequence as the representative sequence.
2. Repeat until all the sequences are clustered.

(None of which I knew at the time I ran this for the first time, but it’s information that makes sense here.)

I pull out the really long test command (written below), prepare myself, and hit Enter. It takes a long time to complete.

```../cd-hit/cdhit-est -i testdb -o testdb_95 -M 50 -B 1 -d 0 -n 10
-l 11 -r 1 -p 1 -g 1 -G 0 -c 0.95 -aS 0.8 > testdb_95.log```

While that’s going, here’s what that command means:

• `../cd-hit/cd-hit-est` – use the CD-HIT-EST command, which is the CD-HIT command for DNA/RNA comparison tasks. The original CD-HIT was written for protein comparison.
• `-i testdb` – use the testdb file as the input file. This file is a FASTA file with 1,000,000 sequences at most 361 bases long pulled from various metagenomic samples by the author.
• `-o testdb_95` – write the output to a file called testdb_95
• `-M 50` – I have 50 MB of RAM free (which I don’t think really makes sense, but that’s what the README says)–turns out it’s actually the maximum available memory, not the amount of free memory according to the rest of the documentation.
• `-B 1` – sequences are stored on the hard drive (1) instead of in RAM (0)
• `-d 0` – The length of the description in .clster file. Since it’s 0, it just takes the FASTA sequence description up to the first space.
• `-n 10` – the word length
• `-l 11` – length of the throw_away_sequences
• `-r 1` – compare both strands of DNA
• `-p 1` – print the alignment overlap in the .clstr file (if 0, it’s not printed)
• `-g 1` – clusters the strings into the most similar cluster, not the first one CD HIT finds, which is the default (0).
• `-G 0` – don’t use global sequence identity, i.e., treat each input sequence individually when calculating identity.
• `-c 0.95` – cluster at 95% sequence identity
• `-aS 0.8` – alignment coverage for the shorter sequence. In this case, it means that the alignment between the longer and shorter sequence must cover at least 80% of the shorter sequence.
• `> testdb_95.log` – write the standard output to the testdb_95.log file
• There are other options available, which I’m not going to go into much for right now.

Long story short, it takes around 200 minutes to complete its test data processing. That’s roughly 3 hours and 20 minutes, which I think is pretty long for a test, though I did limit the amount of memory it could use to 50 MB–upping that limit will probably speed things up substantially, but according to what I read in the README I couldn’t. In retrospect, though, I think upping the limit is just fine, especially with the information I’m reading in the other files. (UPDATE: Upping the memory limit to 400MB, the default, drops the execution time down to about 67 minutes. Eight fold memory increase, three fold time decrease–not great, but not too shabby.)

### ORF_FINDER

The next program to run is the ORF_FINDER. This program apparently is a program to scan the sequences for ORFs (not surprising given the name), which are what most people mean when they talk about their “genes” and their DNA (kind of).

`../orf_finder/orf_finder -l 30 -L 30 -t 11  -i testdb -o testorf`

This command takes far fewer options.

• `../orf_finder/orf_finder` – runs the orf_finder command.
• `-l 30` – the minimal length of the ORF.
• `-L 30` – the minimal length of the ORF between stop codons. I’m not sure exactly how this differs from the -l option. Maybe you need to move at least 30 bases past the end of the previous stop codon before starting to look for another stop codon?
• `-t 11` – Translation table?. Again, not sure what this represents.
• `-i testdb` – Use the testdb file as input.
• `-o testorf` – Write the output to the testorf file.
• Again, there are other options I’m not going to talk about much.

In addition to having far fewer options, ORF_FINDER takes far less time to execute. Roughly a minute and a half on the same dataset, which is a pretty huge upgrade. ORF finding is a much simpler task than sequence clustering, obviously. Now for the fun part–clustering the ORFs with the real CD-HIT program.

### CD-HIT

Now, CD-HIT works basically the same way that CD-HIT-EST does, except it scans for amino acids instead of nucleotides. The full commands I’m running are:

```../cd-hit/cdhit -i testorf -o testorf_95 -M 400 -d 0 -n 5 -p 1
-g 1 -G 0 -c 0.95 -aS 0.8 > testorf_95.log
../cd-hit/cdhit -i testorf_95 -o testorf_60 -M 400 -d 0 -n 4 -p 1
-g 1 -G 0 -c 0.60 -aS 0.8 > testorf_60.log
../cd-hit/clstr_rev.pl testorf_95.clstr testorf_60.clstr
> testorf_60full.clstr```

It has much the same options as CD-HIT-EST, so I’m not going go into a huge amount of detail on those. Instead, notice how I’m going to:

1. Cluster the ORFS at 95% identity.
2. Re-cluster the ORFs with less stringent criteria (60% identity and a shorter word length) to help cluster the non-redundant sequences.
3. Combine the two clustering runs into a single cluster file.

Running things this way (according to the documentation) can help generate a sort of hierarchical structure of clusters. This makes sense, since the ones at 95% identity at the very least are closely related and may actually be redundant sequences while those at 60% are more distant relatives and may be homologs from divergent species or something similar.

The first run of CD-HIT took around 98 minutes, roughly an extra half an hour than just clustering the sequences as a whole. I’m guessing that since the ORFs are more similar to each other than just the sequences as a whole that they take longer to cluster.

The second run of CD-HIT took around 235 minutes, probably because the less stringent criteria took longer to process. Combining the two cluster files only took around 15 seconds, so at least that’s an easy task.

That takes care of the basic tools included with RAMMCAP.  I’ll explore some of the graphical tools in a later post.

by

This is the first part of a series of posts on the RAMMCAP suite of bioinformatics tools.

The Rapid Analysis of Multiple Metagenomes with a Clustering and Annotation Pipeline (RAMMCAP) is a tool used for analysis of metagenomic data. What it does is try to cluster and functionally annotate a set of metagenomic data, which means it takes the data, groups like pieces of data together into clusters, and then tries to figure out what the various clusters do. It’s made up of several tools, only two of which I’ve actually used and I’ll talk about later. Camera, the organization behind RAMMCAP, provides a web service where you can use RAMMCAP without installing it, but it has data limits that I’ll break very easily with my datasets plus they require registration, which seems non-functional right now (at least, I can’t get a new account set up and I’ve tried several times over the course of the last few days). So, I downloaded it a few weeks ago and worked on getting it to run over the course of several days.  This post, and the others in this series, is a record of that process, including some initial miss-steps.  If you have any questions or see other places where I stepped wrongly, leave a comment and let me know.

### First Impressions

First off, the RAMMCAP download (found on the page linked above) is huge–the source code alone was a roughly 760 MB download, which extracts to around 3 gigs. That three gigs might contain some duplicate data–the folder structure is pretty disorganized and a casual glance shows a lot of folders with the same names. There are a lot of symlinks, though, so I could be wrong there. (The more I see, the more I’m convinced I’m right, though.)

Second, it looks like the source bundled a bunch of tools along with the main RAMMCAP code, including versions of BLAST, HMMER, and Metagene. That added a lot to the bulk of the download (roughly 450 MB). There’s also a huge amount of data here, including a version of the Pfam and TIGRFAM libraries (772 and 444 MB, respectively), and a couple other tools I haven’t heard of before that might be part of RAMMCAP.

### Compiling, Phase 1

The README file in the main directory contains basic information on how to compile some of the tools including CD-HIT, ORF_FINDER, and CD-HIT-454, as well as the HMMERHEAD extension to the HMMER, which is optional. The instructions are pretty basic–just do the standard “make clean;make” and things should be good. I wrote a little build script to handle this, just in case I need to do this again for some reason. Everything seems to build fine with the exception of HMMERHEAD, but I’m just going to ignore that for now. Time for testing this puppy out.

### Testing, Phase 1

The README indicates that there should be an examples folder somewhere with some basic test data I can use, but I don’t see it anywhere. Looking around…not seeing it. Turns out, it’s inside the rammcap directory inside the main directory.

### Compiling, Phase 2

Inside the rammcap directory, I find a new README with some major differences from the one outside this directory, plus what looks like symlinks with the same names as some of the directories outside. Looks like they point to the same directories as the other ones, but I’ll recompile things, anyway, just in case. Good thing I wrote that build script.

Except that the build instructions aren’t the same–I don’t have to build CD-HIT-454, but I do need to make sure gnuplot and ImageMagick are installed. They are, which is good, because I’d either have to contact one of the tech guys to install it on this machine or I’d have to install it to my user directory, which I’ve done with several tools I need or don’t want to do without. Once I pull out that CD-HIT-454 reference, the build works fine.

Since my second round of testing was a much larger task, I’ll leave that for a later post.