Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce >> mail # user >> partition as block?

Copy link to this message
partition as block?
Hi guys:

Im wondering - if I'm running mapreduce jobs on a cluster with large block
sizes - can i increase performance with either:

1) A custom FileInputFormat

2) A custom partitioner

3) -DnumReducers

Clearly, (3) will be an issue due to the fact that it might overload tasks
and network traffic... but maybe (1) or (2) will be a precise way to "use"
partitions as a "poor mans" block.

Just a thought - not sure if anyone has tried (1) or (2) before in order to
simulate blocks and increase locality by utilizing the partition API.

Jay Vyas