Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop, mail # user - Which strategy is proper to run an this enviroment?


Copy link to this message
-
Re: Which strategy is proper to run an this enviroment?
Ted Dunning 2011-02-12, 19:33
This sounds like it will be very inefficient.  There is considerable
overhead in starting Hadoop jobs.  As you describe it, you will be starting
thousands of jobs and paying this penalty many times.

Is there a way that you could process all of the directories in one
map-reduce job?  Can you combine these directories into a single directory
with a few large files?

On Fri, Feb 11, 2011 at 8:07 PM, Jun Young Kim <[EMAIL PROTECTED]> wrote:

> Hi.
>
> I have small clusters (9 nodes) to run a hadoop here.
>
> Under this cluster, a hadoop will take thousands of directories sequencely.
>
> In a each dir, there is two input files to m/r. Size of input files are
> from
> 1m to 5g bytes.
> In a summary, each hadoop job will take an one of these dirs.
>
> To get best performance, which strategy is proper for us?
>
> Could u suggest me about it?
> Which configuration is best?
>
> Ps) physical memory size is 12g of each node.
>