Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop >> mail # user >> Which strategy is proper to run an this enviroment?


Copy link to this message
-
Re: Which strategy is proper to run an this enviroment?
This sounds like it will be very inefficient.  There is considerable
overhead in starting Hadoop jobs.  As you describe it, you will be starting
thousands of jobs and paying this penalty many times.

Is there a way that you could process all of the directories in one
map-reduce job?  Can you combine these directories into a single directory
with a few large files?

On Fri, Feb 11, 2011 at 8:07 PM, Jun Young Kim <[EMAIL PROTECTED]> wrote:

> Hi.
>
> I have small clusters (9 nodes) to run a hadoop here.
>
> Under this cluster, a hadoop will take thousands of directories sequencely.
>
> In a each dir, there is two input files to m/r. Size of input files are
> from
> 1m to 5g bytes.
> In a summary, each hadoop job will take an one of these dirs.
>
> To get best performance, which strategy is proper for us?
>
> Could u suggest me about it?
> Which configuration is best?
>
> Ps) physical memory size is 12g of each node.
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB