Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop, mail # user - Running 145K maps, zero reduces- does Hadoop scale?


Copy link to this message
-
Running 145K maps, zero reduces- does Hadoop scale?
Saptarshi Guha 2009-07-31, 05:55
Hello,
Does Hadoop scale well for 100K+ input splits?
I have not tried with sequence files. My custom inputformat, generates 145K
splits.
The record reader emits about 15 bytes as key and 8 bytes as value.
It doesn't do anything else, in fact it doesn't read from disk (basically it
emits splitbeginning ... splitend for every split,)
So essentially, my inputformat is creating 145K InputSplit objects.(see
below)

However I got this
09/07/31 01:41:41 INFO mapred.JobClient: Running job: job_200907251335_0005
09/07/31 01:41:42 INFO mapred.JobClient:  map 0% reduce 0%
09/07/31 01:43:06 INFO mapred.JobClient: Job complete: job_200907251335_0005
And the job does not end! Hangs here.

Very strange. The jobtracker does not respond to web requests.
This is on Hadoop 0.20 though am using 0.19.1. api.
The  master is 64 bit with 4 cores and 16GB ram and not running any
tasktrackers.

Any pointers would be appreciated

Regards
Saptarshi
    //Basically FileInputSplit reworded
    public InputSplit[] getSplits(JobConf job, int numSplits) throws
IOException {
    long n = the_length_of_something ; //==145K
    long chunkSize = n / (numSplits == 0 ? 1 : numSplits);
    InputSplit[] splits = new InputSplit[numSplits];
    for (int i = 0; i < numSplits; i++) {
        MyInputSplit split;
        if ((i + 1) == numSplits)
        split = new MySplit(i * chunkSize, n);
        else
        split = new MySplit(i * chunkSize, (i * chunkSize) + chunkSize);
        splits[i] = split;
    }
    return splits;
    }