Here number of maps for a Job depends on the splits return by InputFormat.getSplits() API. We can have an input format which decides the number of maps(by returning the splits) for a Job according to the need.
If we use FileInputFormat, these number of splits depend on the input data for the Job, that's why you see no of mappers is proportional to the Job input size.
From: Austin Chungath [mailto:[EMAIL PROTECTED]]
Sent: 16 July 2013 14:40
To: [EMAIL PROTECTED]
Subject: spawn maps without any input data - hadoop streaming
I am trying to generate random data using hadoop streaming & python. It's a map only job and I need to run a number of maps. There is no input to the map as it's just going to generate random data.
How do I specify the number of maps to run? ( I am confused here because, if I am not wrong, the number of maps spawned is related to the input data size )
Any ideas as to how this can be done?