Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop >> mail # user >> Running 145K maps, zero reduces- does Hadoop scale?

Copy link to this message
Running 145K maps, zero reduces- does Hadoop scale?
Does Hadoop scale well for 100K+ input splits?
I have not tried with sequence files. My custom inputformat, generates 145K
The record reader emits about 15 bytes as key and 8 bytes as value.
It doesn't do anything else, in fact it doesn't read from disk (basically it
emits splitbeginning ... splitend for every split,)
So essentially, my inputformat is creating 145K InputSplit objects.(see

However I got this
09/07/31 01:41:41 INFO mapred.JobClient: Running job: job_200907251335_0005
09/07/31 01:41:42 INFO mapred.JobClient:  map 0% reduce 0%
09/07/31 01:43:06 INFO mapred.JobClient: Job complete: job_200907251335_0005
And the job does not end! Hangs here.

Very strange. The jobtracker does not respond to web requests.
This is on Hadoop 0.20 though am using 0.19.1. api.
The  master is 64 bit with 4 cores and 16GB ram and not running any

Any pointers would be appreciated

    //Basically FileInputSplit reworded
    public InputSplit[] getSplits(JobConf job, int numSplits) throws
IOException {
    long n = the_length_of_something ; //==145K
    long chunkSize = n / (numSplits == 0 ? 1 : numSplits);
    InputSplit[] splits = new InputSplit[numSplits];
    for (int i = 0; i < numSplits; i++) {
        MyInputSplit split;
        if ((i + 1) == numSplits)
        split = new MySplit(i * chunkSize, n);
        split = new MySplit(i * chunkSize, (i * chunkSize) + chunkSize);
        splits[i] = split;
    return splits;