|
|
-
Announcing Seal 0.1.0: BWA alignment on HadoopLuca Pireddu 2011-05-09, 10:47
Hello everyone. If you're working on short DNA read alignment, then you may
be interested in this message. We've just released Seal (http://biodoop-seal.sourceforge.net/), a Hadoop- based distributed short read alignment and analysis toolkit. Currently SEAL includes tools for: read alignment (based on BWA), duplicate read removal, and sorting read mappings. SEAL scales, easily handling TB of data. If you’re aligning read data sets of more than a couple of hundred MB, and you have a cluster of computers (even a small one, say 4 or 5 nodes, and up to hundreds of nodes) then Seal might be for you. On a 16-node Hadoop cluster, with 8 cores and 16 GB of RAM per node, we have measured map+rmdup throughputs of 13 Gbp / hour, and 19 Gbp / hour in map-only mode. Scalability tests show that the throughput per node is maintained as the number of nodes increases through to 128. We have been working on Seal to support the needs of the CRS4 Sequencing laboratory, which operates 5 Illumina sequencing machines and thus generates lots of data to process. The regular workflow was being overwhelmed notwithstanding the increased number of computers made available and was regularly overloading our Lustre shared storage volume. Now all data processing at the lab starts with Seal, with very positive results with respect to speed and maintenance effort. We're eager to get people to try our new tool. Please visit the Seal web site (http://biodoop-seal.sourceforge.net/) and feel free to contact myself or the other Seal authors if you have any question or problems. -- Luca Pireddu CRS4 - Distributed Computing Group Loc. Pixina Manna Edificio 1 Pula 09010 (CA), Italy Tel: +39 0709250452 |