Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive >> mail # user >> Real-life experience of forcing smaller input splits?

Copy link to this message
Re: Real-life experience of forcing smaller input splits?
Not all files are split-table Sequence Files are. Raw gzip files are not.

On Fri, Jan 25, 2013 at 1:47 AM, Nitin Pawar <[EMAIL PROTECTED]>wrote:

> set mapred.min.split.size=1024000;
> set mapred.max.split.size=4096000;
> set hive.merge.mapfiles=false;
> I had set above value and setting max split size to a lower value  did
> increase my # number of maps.  My blocksize was 128MB
> Only thing was my files on hdfs were not heavily compressed and I was
> using RCFileFormat
> I would suggest if you have heavily compressed files then you may want to
> do check what will be size after uncompression and allocate more memory to
> maps
> On Fri, Jan 25, 2013 at 11:46 AM, David Morel <[EMAIL PROTECTED]> wrote:
>> Hello,
>> I have seen many posts on various sites and MLs, but didn't find a firm
>> answer anywhere: is it possible yes or no to force a smaller split size
>> than a block on the mappers, from the client side? I'm not after
>> pointers to the docs (unless you're very very sure :-) but after
>> real-life experience along the lines of 'yes, it works this way, I've
>> done it like this...'
>> All the parameters that I could find (especially specifying a max input
>> split size) seem to have no effect, and the files that I have are so
>> heavily compressed that they completely saturate the mappers' memory
>> when processed.
>> A solution I could imagine for this specific issue is reducing the block
>> size, but for now I simply went with disabling in-file compression for
>> those. And changing the block size on a per-file basis is something I'd
>> like to avoid if at all possible.
>> All the hive settings that we tried only got me as far as raising the
>> number of mappers from 5 to 6 (yay!) where I would have needed at least
>> ten times more.
>> Thanks!
>> D.Morel
> --
> Nitin Pawar