Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop >> mail # user >> Running 145K maps, zero reduces- does Hadoop scale?


Copy link to this message
-
Re: Running 145K maps, zero reduces- does Hadoop scale?
In this particular example, the record reader emits a single number per
split as both key and value.
Regards
S

On Fri, Jul 31, 2009 at 1:55 AM, Saptarshi Guha <[EMAIL PROTECTED]>wrote:

> Hello,
> Does Hadoop scale well for 100K+ input splits?
> I have not tried with sequence files. My custom inputformat, generates 145K
> splits.
> The record reader emits about 15 bytes as key and 8 bytes as value.
> It doesn't do anything else, in fact it doesn't read from disk (basically
> it emits splitbeginning ... splitend for every split,)
> So essentially, my inputformat is creating 145K InputSplit objects.(see
> below)
>
> However I got this
> 09/07/31 01:41:41 INFO mapred.JobClient: Running job: job_200907251335_0005
> 09/07/31 01:41:42 INFO mapred.JobClient:  map 0% reduce 0%
> 09/07/31 01:43:06 INFO mapred.JobClient: Job complete:
> job_200907251335_0005
> And the job does not end! Hangs here.
>
> Very strange. The jobtracker does not respond to web requests.
> This is on Hadoop 0.20 though am using 0.19.1. api.
> The  master is 64 bit with 4 cores and 16GB ram and not running any
> tasktrackers.
>
> Any pointers would be appreciated
>
> Regards
> Saptarshi
>
>
>     //Basically FileInputSplit reworded
>     public InputSplit[] getSplits(JobConf job, int numSplits) throws
> IOException {
>     long n = the_length_of_something ; //==145K
>     long chunkSize = n / (numSplits == 0 ? 1 : numSplits);
>     InputSplit[] splits = new InputSplit[numSplits];
>     for (int i = 0; i < numSplits; i++) {
>         MyInputSplit split;
>         if ((i + 1) == numSplits)
>         split = new MySplit(i * chunkSize, n);
>         else
>         split = new MySplit(i * chunkSize, (i * chunkSize) + chunkSize);
>         splits[i] = split;
>     }
>     return splits;
>     }
>
>
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB