Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
MapReduce >> mail # user >> Extension points available for data locality


+
Tharindu Mathew 2012-08-21, 09:06
+
Harsh J 2012-08-21, 09:39
Copy link to this message
-
Re: Extension points available for data locality
Hi Tharindu

May you can try the Gora,The Apache Gora open source framework provides an
in-memory data model and persistence for big data. Gora supports persisting
to column stores, key value stores, document stores and RDBMSs, and
analyzing the data with extensive Apache Hadoop MapReduce support.

Now it support MySQL in gora-sql model.

 http://gora.apache.org/

On Tue, Aug 21, 2012 at 5:39 PM, Harsh J <[EMAIL PROTECTED]> wrote:

> Tharindu,
>
> (Am assuming you've done enough research to know that there's benefit
> in what you're attempting to do.)
>
> Locality of tasks are determined by the job's InputFormat class.
> Specifically, the locality information returned by the InputSplit
> objects via InputFormat#getSplits(…) API is what the MR scheduler
> looks at when trying to launch data local tasks.
>
> You can tweak your InputFormat (the one that uses this DB as input?)
> to return relevant locations based on your "DB Cluster", in order to
> achieve this.
>
> On Tue, Aug 21, 2012 at 2:36 PM, Tharindu Mathew <[EMAIL PROTECTED]>
> wrote:
> > Hi,
> >
> > I'm doing some research that involves pulling data stored in a mysql
> cluster
> > directly for a map reduce job, without storing the data in HDFS.
> >
> > I'd like to run hadoop task tracker nodes directly on the mysql cluster
> > nodes. The purpose of this being, starting mappers directly in the node
> > closest to the data if possible (data locality).
> >
> > I notice that with HDFS, since the name node knows exactly where each
> data
> > block is, it uses this to achieve data locality.
> >
> > Is there a way to achieve my requirement possibly by extending the name
> node
> > or otherwise?
> >
> > Thanks in advance.
> >
> > --
> > Regards,
> >
> > Tharindu
> >
> > blog: http://mackiemathew.com/
> >
>
>
>
> --
> Harsh J
>

--
Don't Grow Old, Grow Up... :-)
+
Tharindu Mathew 2012-08-21, 12:44
+
Michael Segel 2012-08-21, 13:28
+
Tharindu Mathew 2012-08-21, 13:54
+
Michael Segel 2012-08-21, 14:19
+
Tharindu Mathew 2012-08-21, 18:40
+
Harsh J 2012-08-22, 02:16
+
Tharindu Mathew 2012-08-22, 06:30
+
Dino Kečo 2012-08-21, 09:22
+
Minh Duc Nguyen 2012-08-21, 19:17
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB