-Re: Hadoop for Bioinformatics
Luca Pireddu 2011-03-29, 13:39
On March 28, 2011 04:51:14 Franco Nazareno wrote:
> Good day everyone!
And a good day to you Franco!
> First, I want to congratulate the group for this wonderful project. It did
> open up new ideas and solutions in computing and technology-wise. I'm
> excited to learn more about it and discover possibilities using Hadoop and
> its components.
> Well I just want to ask this with regards to my study. Currently I'm
> studying my PhD course in Bioinformatics, and my question is that can you
> give me a (rough) idea if it's possible to use Hadoop cluster in achieving
> a DNA sequence alignment? My basic idea for this goes something like a
> string search out of a huge data files stored in HDFS, and the application
> uses MapReduce in searching and computing. As the Hadoop paradigm impies,
> it doesn't serve well in interactive applications, and I think this kind
> of searching is a "write-once, read-many" application.
> I hope you don't mind my question. And it'll be great hearing your comments
> or suggestions about this.
> Thanks and more power!
The short answer is yes! At CRS4 we are working on this very problem.
We have implemented a Hadoop-based workflow to perform short read alignment to
support DNA sequencing activities in our lab. Its alignment operation is
based on (and therefore equivalent to) BWA. We have written a paper about it
which will appear in the coming months, and we are working on an open source
release, but alas we haven't completed that task yet.
We have also implemented a Hadoop-based distributed blast alignment program,
in case you're working with long fragments. It's currently being used by our
collaborators to align viral DNA segments.
In either case, if you're interested we can let you have an advance release of
either program so you can try them out.
CRS4 - Distributed Computing Group
Loc. Pixina Manna Edificio 1
Pula 09010 (CA), Italy
Tel: +39 0709250452