Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive, mail # user - Real-life experience of forcing smaller input splits?


Copy link to this message
-
Re: Real-life experience of forcing smaller input splits?
Mathieu Despriee 2013-01-25, 07:44
Hi David,

What file format and compression type are you using ?

Mathieu

Le 25 janv. 2013 à 07:16, David Morel <[EMAIL PROTECTED]> a écrit :

> Hello,
>
> I have seen many posts on various sites and MLs, but didn't find a firm
> answer anywhere: is it possible yes or no to force a smaller split size
> than a block on the mappers, from the client side? I'm not after
> pointers to the docs (unless you're very very sure :-) but after
> real-life experience along the lines of 'yes, it works this way, I've
> done it like this...'
>
> All the parameters that I could find (especially specifying a max input
> split size) seem to have no effect, and the files that I have are so
> heavily compressed that they completely saturate the mappers' memory
> when processed.
>
> A solution I could imagine for this specific issue is reducing the block
> size, but for now I simply went with disabling in-file compression for
> those. And changing the block size on a per-file basis is something I'd
> like to avoid if at all possible.
>
> All the hive settings that we tried only got me as far as raising the
> number of mappers from 5 to 6 (yay!) where I would have needed at least
> ten times more.
>
> Thanks!
>
> D.Morel