Real-life experience of forcing smaller input splits?

David Morel 2013-01-25, 06:16

I have seen many posts on various sites and MLs, but didn't find a firm
answer anywhere: is it possible yes or no to force a smaller split size
than a block on the mappers, from the client side? I'm not after
pointers to the docs (unless you're very very sure :-) but after
real-life experience along the lines of 'yes, it works this way, I've
done it like this...'

All the parameters that I could find (especially specifying a max input
split size) seem to have no effect, and the files that I have are so
heavily compressed that they completely saturate the mappers' memory
when processed.

A solution I could imagine for this specific issue is reducing the block
size, but for now I simply went with disabling in-file compression for
those. And changing the block size on a per-file basis is something I'd
like to avoid if at all possible.

All the hive settings that we tried only got me as far as raising the
number of mappers from 5 to 6 (yay!) where I would have needed at least
ten times more.