Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Hive >> mail # user >> Real-life experience of forcing smaller input splits?


+
David Morel 2013-01-25, 06:16
+
Mathieu Despriee 2013-01-25, 07:44
+
Nitin Pawar 2013-01-25, 06:47
+
Edward Capriolo 2013-01-25, 07:46
+
Bertrand Dechoux 2013-01-25, 09:37
+
David Morel 2013-01-25, 09:53
+
David Morel 2013-01-25, 12:28
+
Dean Wampler 2013-01-25, 13:39
Copy link to this message
-
Re: Real-life experience of forcing smaller input splits?
In most cases you want bigger splits because having lots of small tasks
plays havoc on the job tracker. I have found that jobs with thousands of
short lived map tasks tend to monopolize the slots. in other versions of
hive the default was not CombineHiveInputFormat I think in most cases you
want to make sure that is your default.

On Fri, Jan 25, 2013 at 1:47 AM, Nitin Pawar <[EMAIL PROTECTED]>wrote:

> set mapred.min.split.size=1024000;
> set mapred.max.split.size=4096000;
> set hive.merge.mapfiles=false;
>
> I had set above value and setting max split size to a lower value  did
> increase my # number of maps.  My blocksize was 128MB
> Only thing was my files on hdfs were not heavily compressed and I was
> using RCFileFormat
>
> I would suggest if you have heavily compressed files then you may want to
> do check what will be size after uncompression and allocate more memory to
> maps
>
>
> On Fri, Jan 25, 2013 at 11:46 AM, David Morel <[EMAIL PROTECTED]> wrote:
>
>> Hello,
>>
>> I have seen many posts on various sites and MLs, but didn't find a firm
>> answer anywhere: is it possible yes or no to force a smaller split size
>> than a block on the mappers, from the client side? I'm not after
>> pointers to the docs (unless you're very very sure :-) but after
>> real-life experience along the lines of 'yes, it works this way, I've
>> done it like this...'
>>
>> All the parameters that I could find (especially specifying a max input
>> split size) seem to have no effect, and the files that I have are so
>> heavily compressed that they completely saturate the mappers' memory
>> when processed.
>>
>> A solution I could imagine for this specific issue is reducing the block
>> size, but for now I simply went with disabling in-file compression for
>> those. And changing the block size on a per-file basis is something I'd
>> like to avoid if at all possible.
>>
>> All the hive settings that we tried only got me as far as raising the
>> number of mappers from 5 to 6 (yay!) where I would have needed at least
>> ten times more.
>>
>> Thanks!
>>
>> D.Morel
>>
>
>
>
> --
> Nitin Pawar
>