This is the second part of a series of posts on the RAMMCAP suite of bioinformatics tools.
Last time, we left off with a freshly compiled version of RAMMCAP ready for testing. Like last time, we’ll start with the README in the current directory, which is the
rammcap directory inside the main directory (named
RAMMCAP-20091106 in my case) from the RAMMCAP download.
The new README has its own test script, so I’m going to follow this one. The first thing to run is the CD-HIT-EST program.
The CD-HIT-EST program is the clustering program. It takes in a FASTA-formatted file and clusters that data according to a greedy algorithm that uses simple word counting and indexing to help speed things up considerably. Basically, it:
- Sorts the sequences from longest to shortest.
- Begins clustering:
- Compare the current sequence to the list of known clusters.
- If it matches an existing cluster, it is added to that cluster and made the “representative” sequence for that cluster if it is longer than the existing “representative” sequence.
- If it doesn’t match an existing cluster, a new cluster is made with this sequence as the representative sequence.
- Repeat until all the sequences are clustered.
(None of which I knew at the time I ran this for the first time, but it’s information that makes sense here.)
I pull out the really long test command (written below), prepare myself, and hit Enter. It takes a long time to complete.
../cd-hit/cdhit-est -i testdb -o testdb_95 -M 50 -B 1 -d 0 -n 10
-l 11 -r 1 -p 1 -g 1 -G 0 -c 0.95 -aS 0.8 > testdb_95.log
While that’s going, here’s what that command means:
../cd-hit/cd-hit-est – use the CD-HIT-EST command, which is the CD-HIT command for DNA/RNA comparison tasks. The original CD-HIT was written for protein comparison.
-i testdb – use the testdb file as the input file. This file is a FASTA file with 1,000,000 sequences at most 361 bases long pulled from various metagenomic samples by the author.
-o testdb_95 – write the output to a file called testdb_95
-M 50 – I have 50 MB of RAM free (which I don’t think really makes sense, but that’s what the README says)–turns out it’s actually the maximum available memory, not the amount of free memory according to the rest of the documentation.
-B 1 – sequences are stored on the hard drive (1) instead of in RAM (0)
-d 0 – The length of the description in .clster file. Since it’s 0, it just takes the FASTA sequence description up to the first space.
-n 10 – the word length
-l 11 – length of the throw_away_sequences
-r 1 – compare both strands of DNA
-p 1 – print the alignment overlap in the .clstr file (if 0, it’s not printed)
-g 1 – clusters the strings into the most similar cluster, not the first one CD HIT finds, which is the default (0).
-G 0 – don’t use global sequence identity, i.e., treat each input sequence individually when calculating identity.
-c 0.95 – cluster at 95% sequence identity
-aS 0.8 – alignment coverage for the shorter sequence. In this case, it means that the alignment between the longer and shorter sequence must cover at least 80% of the shorter sequence.
> testdb_95.log – write the standard output to the testdb_95.log file
- There are other options available, which I’m not going to go into much for right now.
Long story short, it takes around 200 minutes to complete its test data processing. That’s roughly 3 hours and 20 minutes, which I think is pretty long for a test, though I did limit the amount of memory it could use to 50 MB–upping that limit will probably speed things up substantially, but according to what I read in the README I couldn’t. In retrospect, though, I think upping the limit is just fine, especially with the information I’m reading in the other files. (UPDATE: Upping the memory limit to 400MB, the default, drops the execution time down to about 67 minutes. Eight fold memory increase, three fold time decrease–not great, but not too shabby.)
The next program to run is the ORF_FINDER. This program apparently is a program to scan the sequences for ORFs (not surprising given the name), which are what most people mean when they talk about their “genes” and their DNA (kind of).
../orf_finder/orf_finder -l 30 -L 30 -t 11 -i testdb -o testorf
This command takes far fewer options.
../orf_finder/orf_finder – runs the orf_finder command.
-l 30 – the minimal length of the ORF.
-L 30 – the minimal length of the ORF between stop codons. I’m not sure exactly how this differs from the -l option. Maybe you need to move at least 30 bases past the end of the previous stop codon before starting to look for another stop codon?
-t 11 – Translation table?. Again, not sure what this represents.
-i testdb – Use the testdb file as input.
-o testorf – Write the output to the testorf file.
- Again, there are other options I’m not going to talk about much.
In addition to having far fewer options, ORF_FINDER takes far less time to execute. Roughly a minute and a half on the same dataset, which is a pretty huge upgrade. ORF finding is a much simpler task than sequence clustering, obviously. Now for the fun part–clustering the ORFs with the real CD-HIT program.
Now, CD-HIT works basically the same way that CD-HIT-EST does, except it scans for amino acids instead of nucleotides. The full commands I’m running are:
../cd-hit/cdhit -i testorf -o testorf_95 -M 400 -d 0 -n 5 -p 1
-g 1 -G 0 -c 0.95 -aS 0.8 > testorf_95.log
../cd-hit/cdhit -i testorf_95 -o testorf_60 -M 400 -d 0 -n 4 -p 1
-g 1 -G 0 -c 0.60 -aS 0.8 > testorf_60.log
../cd-hit/clstr_rev.pl testorf_95.clstr testorf_60.clstr
It has much the same options as CD-HIT-EST, so I’m not going go into a huge amount of detail on those. Instead, notice how I’m going to:
- Cluster the ORFS at 95% identity.
- Re-cluster the ORFs with less stringent criteria (60% identity and a shorter word length) to help cluster the non-redundant sequences.
- Combine the two clustering runs into a single cluster file.
Running things this way (according to the documentation) can help generate a sort of hierarchical structure of clusters. This makes sense, since the ones at 95% identity at the very least are closely related and may actually be redundant sequences while those at 60% are more distant relatives and may be homologs from divergent species or something similar.
The first run of CD-HIT took around 98 minutes, roughly an extra half an hour than just clustering the sequences as a whole. I’m guessing that since the ORFs are more similar to each other than just the sequences as a whole that they take longer to cluster.
The second run of CD-HIT took around 235 minutes, probably because the less stringent criteria took longer to process. Combining the two cluster files only took around 15 seconds, so at least that’s an easy task.
That takes care of the basic tools included with RAMMCAP. I’ll explore some of the graphical tools in a later post.
Subscribe to USA TODAYAlready a print edition subscriber, but don’t have a login?Activate your digital access.Manage your account settings. ‘Yes,’ he says, when asked how close a deal was. Drug(s) They Were Probably On: Difficult to tell. With too much cold, they meet their quota early, bloom too soon and forfeit their blossoms and fruit. “The line is really a gray blur,” Klieman said. He has been everything that everyone thought he was going to be. That my excuse for even poorer focus. Moncler insists of this information for mmore than 20 years. And when you lack power, nothing changes.. By adulthood, they’re locking their doors every time a Mexican walks by.. We actually got into an argument so bad. Here is a video of such an experiment. The action apparently was taken not because of the content in the channel but because the North Korean. He has to learn when to and when not to.”. Although he is a talented artist, he was not necessarily a crowd pleaser aside from his fashion statements. Apply the iron on backing to the logo. I a player first, and I will always be a player. He is also a well known composer. And I think that’s probably where most people would come down.”. You’ve eaten some outrageous things by wholesale nfl jersyes
Western standards, such as a raw seal eyeball with the Inuit and cobra with a still beating heart in Vietnam. If you don’t have a serger, a straight up sewing machine is really all you need. As we all know, Disney has also had a profitable parks and resorts segment. My ray bans sale
adventure started when my grandson, who is disabled was hospitalized. “Tom Brady. A 140 foot Coast Guard cutter joined a search that covered Cheap NFL Jerseys
128 square miles of the lake on Friday.. The humans then make the trades at the computer’s command, and are not allowed to modify them in any way.. He will ensure that operations run smoothly and according to company policy. But when I make decorating decisions be honest like getting my way, and I don think I would if my husband didn have his domestic domain in the basement.. For a sport with only one significant continental market, the figures are staggering in relatition to the English Premier League, which, with wholesale jerseys china
its massive global following, raked in a total revenue of only 2.36bn in 2011/12.. In an even greater expansion, there are now 32 national football teams playing 16 games, with 11 wholesale jerseys china
playoff games ending in the Super Bowl, which now takes place as late as February. That’s only going to increase. The master bedroom has luxurious adjoining bath that was added in 2009 and boasts cherry cabinets, marble counter top and luxurious oversized whirlpool tub. 2 in the 2011 draft immediately after the Panthers took Newton.
Meanwhile the report had spread in the town where to buy ray ban wayfarer that the old man, Monsieur De Beaulieu, and the whole coach outlet sunglasses online Duparc family had been poisoned by their servant. He was so skinny and light not even on the growth curves at a year. Prioritizing is critical for us to move forward http://www.cheapjerseysupplyforyou.com
daily towards success. As a baseball fan who is only a very sloppy card collector (picture the decaying shoe box in Mom’s garage), I also enjoyed the informative nature of Topp’s cards. Gus Bradley thought he had his Earl Thomas when the Jags signed Gipson. There is no magic pill or marketing plan http://www.nfljerseysshow.com
to becoming a top producer. The best teams have the coaches spend enormous amounts of time making sure that each player is prepared to do the small things correctly when it comes to their line of duty. It also applies to any of the employees, contractors or others affiliated with a group charged with any of those things.. In an infographic published in March, Intel decoded “what happens in an Internet minute” in which it revealed that in a minute, the Internet transferred close to 639,800 GB of IP data across the globe.. At least Osweiler is actually playing for his team. PESCA: But they have to contend yeah, so Colin Kaepernick who is the quarterback, the mobile quarterback for the San Francisco 49ers set a record, not just a playoff record, an every time ever record for quarterback rushing yards last week. There is a moral one, ethics and morality not being the same thing. That http://www.footballjerseysuppliers.com
case remains Fake Ray Bans
under investigation by the NFL, league spokesman Brian McCarty told USA TODAY Sports.Should Manziel sign with a team, he would be subject to additional punishment under the personal conduct policy, pending the outcome of the league’s investigation.. 10.Which brings me to Steve Hansen, and Mark Ella.Ella, the legendary Aussie inside back of old, reckons that Mickey Mouse could coach the current All Black team to victory. Owner Bud Adams lobbied for a new stadium for the team to play in, but never received the necessary funding. That’s what I do. After the Packers running back was lost for the season in the first game, the Packers running game was virtually nonexistent cheap china jerseys
for most of the season. Rather will anchor the CBS News coverage, which will include all Presidential, Senate and key House of Representatives election results, as well as votes on key referenda. Know your heart rate. Let’s talk about the most common gymnastics injuries. By the time they’re adults, it halves again. In its past two games, the group has allowed conversions on65.6% of third down opportunities. The newly promoted Minister of Administrative Affairs, Jim Hacker (played by Paul Eddington), was the idealistic MP seen trying to shake up his department with Churchillian zest while meeting firm opposition from the Permanent Under Secretary, Sir Humphrey Appleby (Nigel Hawthorne), doggedly blocking change and speaking in long, incomprehensible sentences.
And you’re seeing that through all of our medio offerings. So what we will continue to try to do. make those offerings even more engaging. In the end, when looking for NFL Replica Oakleys
expert picks against the spread, people should do their research so that they have the best possible information. When game day finally rolls around, they can settle in for a passionate day of football watching with their close friends. With luck, they will also bring in some money.. The terms cheap nfl jerseys
of Kroenke’s arrangement cheap football jerseys
with Stockbridge Capital the Bay Area investment firm that’s been financing the redevelopment of the Hollywood Park property for a decade haven’t been disclosed. But Chris Meany, a senior vice president for the project, confirmed that Kroenke has bought a stake in Hollywood Park Land Co. And that his involvement extends beyond the stadium. Recognizing this, Nelson offered him food. The boy declined. Not sure what else he cheap jerseys
could do for the boy, Nelson remembered that had bubbles with him from spending time with other orphans in the country. ViaSat has a major shareholder in Seth Klarman’s Baupost, which last reported owning slightly less than one quarter of the company. Baupost is a highly respected value oriented investment company that has a history of outperforming the market. Baupost first acquired ViaSat in 2008, when most equities were at a deep discount, but the fund substantially Fake ray bans
increased its stake in 2012, and has marginally increased the position in recent quarters. Parlays are some of the most fun types of betting out there. Again, once you’ve mastered the bookmaking basics, you can move on up to the parlay betting. A football parlay can get pretty strange. Stay in Replica Oakleys
touch with the game. Good players always have the latest information on hot topics relating to football right at their fingertips. It could be a new training cheap nfl jerseys
technique or game strategy that is running rampant throughout the league. When I was around kids without a mom and dad who were living in orphanages, I was captivated by their experiences in life. Hearing story after story shifted pieces of my heart. After all of that, I still saw that they had joy and hope for life. The Patriots made Jim Plunkett the No. 1 pick in the 1971 NFL draft. Plunkett, a Heisman Trophy winner at Stanford who later won a pair of Super Bowls with the Oakland Raiders, never managed to reach his potential with New England. Match your specialized area of interest with a required practicum in fields such as combat ministries, garrison ministries and chaplaincy recruiting. You also must complete a bachelor’s degree and theological training before being approved for the position. Learn and be comfortable meeting the needs of a diverse group of men and women from a host of cultural and religious traditions.