Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop, mail # user - Running 145K maps, zero reduces- does Hadoop scale?


Copy link to this message
-
Re: Running 145K maps, zero reduces- does Hadoop scale?
Saptarshi Guha 2009-07-31, 06:37
In this particular example, the record reader emits a single number per
split as both key and value.
Regards
S

On Fri, Jul 31, 2009 at 1:55 AM, Saptarshi Guha <[EMAIL PROTECTED]>wrote:

> Hello,
> Does Hadoop scale well for 100K+ input splits?
> I have not tried with sequence files. My custom inputformat, generates 145K
> splits.
> The record reader emits about 15 bytes as key and 8 bytes as value.
> It doesn't do anything else, in fact it doesn't read from disk (basically
> it emits splitbeginning ... splitend for every split,)
> So essentially, my inputformat is creating 145K InputSplit objects.(see
> below)
>
> However I got this
> 09/07/31 01:41:41 INFO mapred.JobClient: Running job: job_200907251335_0005
> 09/07/31 01:41:42 INFO mapred.JobClient:  map 0% reduce 0%
> 09/07/31 01:43:06 INFO mapred.JobClient: Job complete:
> job_200907251335_0005
> And the job does not end! Hangs here.
>
> Very strange. The jobtracker does not respond to web requests.
> This is on Hadoop 0.20 though am using 0.19.1. api.
> The  master is 64 bit with 4 cores and 16GB ram and not running any
> tasktrackers.
>
> Any pointers would be appreciated
>
> Regards
> Saptarshi
>
>
>     //Basically FileInputSplit reworded
>     public InputSplit[] getSplits(JobConf job, int numSplits) throws
> IOException {
>     long n = the_length_of_something ; //==145K
>     long chunkSize = n / (numSplits == 0 ? 1 : numSplits);
>     InputSplit[] splits = new InputSplit[numSplits];
>     for (int i = 0; i < numSplits; i++) {
>         MyInputSplit split;
>         if ((i + 1) == numSplits)
>         split = new MySplit(i * chunkSize, n);
>         else
>         split = new MySplit(i * chunkSize, (i * chunkSize) + chunkSize);
>         splits[i] = split;
>     }
>     return splits;
>     }
>
>
>