Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive >> mail # user >> Real-life experience of forcing smaller input splits?


Copy link to this message
-
Re: Real-life experience of forcing smaller input splits?
set mapred.min.split.size=1024000;
set mapred.max.split.size=4096000;
set hive.merge.mapfiles=false;

I had set above value and setting max split size to a lower value  did
increase my # number of maps.  My blocksize was 128MB
Only thing was my files on hdfs were not heavily compressed and I was using
RCFileFormat

I would suggest if you have heavily compressed files then you may want to
do check what will be size after uncompression and allocate more memory to
maps
On Fri, Jan 25, 2013 at 11:46 AM, David Morel <[EMAIL PROTECTED]> wrote:

> Hello,
>
> I have seen many posts on various sites and MLs, but didn't find a firm
> answer anywhere: is it possible yes or no to force a smaller split size
> than a block on the mappers, from the client side? I'm not after
> pointers to the docs (unless you're very very sure :-) but after
> real-life experience along the lines of 'yes, it works this way, I've
> done it like this...'
>
> All the parameters that I could find (especially specifying a max input
> split size) seem to have no effect, and the files that I have are so
> heavily compressed that they completely saturate the mappers' memory
> when processed.
>
> A solution I could imagine for this specific issue is reducing the block
> size, but for now I simply went with disabling in-file compression for
> those. And changing the block size on a per-file basis is something I'd
> like to avoid if at all possible.
>
> All the hive settings that we tried only got me as far as raising the
> number of mappers from 5 to 6 (yay!) where I would have needed at least
> ten times more.
>
> Thanks!
>
> D.Morel
>

--
Nitin Pawar
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB