Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Hadoop, mail # user - Past meeting: July Houston Hadoop Meetup - Genomic data analysis with hadoop


Copy link to this message
-
Past meeting: July Houston Hadoop Meetup - Genomic data analysis with hadoop
Mark Kerzner 2012-07-17, 21:29
Hi, all,

that's what it was about

July Houston Hadoop Meetup - Genomic data analysis with
hadoop<http://shmsoft.blogspot.com/2012/07/july-houston-hadoop-meetup-genomic-data.html>

<http://2.bp.blogspot.com/-LQOZ0kppE7Y/UATvSSC-CyI/AAAAAAAAKT0/3cVl_S83Tkg/s1600/Genome.png>Dianhui
(Dennis) Zhu  presented "Genomic data analysis with hadoop".  He talked
about using Hadoop framework to do pattern search in genomic sequence
datasets. This is based on his three-year project at Baylor, which started
using Hadoop a year ago. Dennis is Senior Scientific Programmer at HGSC.

Dianhui told us about the following issues

1. Setup a Hadoop test cluster with 4 nodes.
2. Code walk through and unit testing with Mokito and MRUnit
3. Live demo: running our Hadoop application on the  4-node cluster.

The interesting technical problem that Dennis showed was to break sequence
into chunks, before it gets to the Mapper - which is usually trivial in the
regular applications, but is quite hard with unlimited unstructured data of
the genome. The audience analyzed the actual code, asked many questions,
and wanted to compare to the existing open source projects.

Indeed, that is an article on the Cloudera blog,
http://www.cloudera.com/blog/2009/10/analyzing-human-genomes-with-hadoop/,
and it refers to the Crossbow open source project,
http://bowtie-bio.sourceforge.net/crossbow/index.shtml. It will interested
to see how that compares to Dennis's work.