Radim,
Alternatively you could write your own input format that will split the data differently, but that is going to take some real digging into the sequence file format and is going to be error prone. I would suggest that you create several smaller input files, as Justin said.
--Bobby Evans
On 11/9/11 6:53 AM, "Justin Woody" <[EMAIL PROTECTED]> wrote:
Radim,
In this case, it doesn't matter how many mappers you provide in your
job configuration. Hadoop will only give 1 mapper per split. Since
your files are less than 64MB (assuming you're using the default block
size of HDFS), you only have 2 splits. If you really need more
mappers, you need to create smaller input files.
Paragraph 1 under the Map heading on this page explains it as well:
http://wiki.apache.org/hadoop/HadoopMapReduceJustin
2011/11/9 Radim Kolar <[EMAIL PROTECTED]>:
> I have 2 input seq files 32MB each. I want to run them on as many mappers as
> possible.
>
> i appended -D mapred.max.split.size=1000000 as command line argument to
> job, but there is no difference. Job still runs on 2 mappers.
>
> How split size works? Is max split size used for reading or writing files?
>
> it works like this?: set maxsplitsize, write files and you will get bunch
> of seq files as output. then you will get same number of mappers as input
> files.
>