Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
MapReduce >> mail # user >> partition as block?


+
Jay Vyas 2013-04-30, 18:46
Copy link to this message
-
Re: partition as block?
Hello Jay,

    What are you going to do in your custom InputFormat and partitioner?Is
your InputFormat is going to create larger splits which will overlap with
larger blocks?If that is the case, IMHO, then you are going to reduce the
no. of mappers thus reducing the parallelism. Also, much larger block size
will put extra overhead when it comes to disk I/O.

Warm Regards,
Tariq
https://mtariq.jux.com/
cloudfront.blogspot.com
On Wed, May 1, 2013 at 12:16 AM, Jay Vyas <[EMAIL PROTECTED]> wrote:

> Hi guys:
>
> Im wondering - if I'm running mapreduce jobs on a cluster with large block
> sizes - can i increase performance with either:
>
> 1) A custom FileInputFormat
>
>  2) A custom partitioner
>
> 3) -DnumReducers
>
> Clearly, (3) will be an issue due to the fact that it might overload tasks
> and network traffic... but maybe (1) or (2) will be a precise way to "use"
> partitions as a "poor mans" block.
>
> Just a thought - not sure if anyone has tried (1) or (2) before in order
> to simulate blocks and increase locality by utilizing the partition API.
>
> --
> Jay Vyas
> http://jayunit100.blogspot.com
>
+
Jay Vyas 2013-05-01, 00:00
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB