Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce >> mail # user >> Reading from HDFS from inside the mapper


Copy link to this message
-
Re: Reading from HDFS from inside the mapper
Hi,

You could check DistributedCache (
http://hadoop.apache.org/common/docs/stable/mapred_tutorial.html#DistributedCache).
It would allow you to distribute data to the nodes where your tasks are run.

Thanks
Hemanth

On Mon, Sep 10, 2012 at 3:27 PM, Sigurd Spieckermann <
[EMAIL PROTECTED]> wrote:

> Hi,
>
> I would like to perform a map-side join of two large datasets where
> dataset A consists of m*n elements and dataset B consists of n elements.
> For the join, every element in dataset B needs to be accessed m times. Each
> mapper would join one element from A with the corresponding element from B.
> Elements here are actually data blocks. Is there a performance problem (and
> difference compared to a slightly modified map-side join using the
> join-package) if I set dataset A as the map-reduce input and load the
> relevant element from dataset B directly from the HDFS inside the mapper? I
> could store the elements of B in a MapFile for faster random access. In the
> second case without the join-package I would not have to partition the
> datasets manually which would allow a bit more flexibility, but I'm
> wondering if HDFS access from inside a mapper is strictly bad. Also, does
> Hadoop have a cache for such situations by any chance?
>
> I appreciate any comments!
>
> Sigurd
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB