David Morel 2013-01-25, 06:16
Mathieu Despriee 2013-01-25, 07:44
Nitin Pawar 2013-01-25, 06:47
Edward Capriolo 2013-01-25, 07:46
Bertrand Dechoux 2013-01-25, 09:37
-Re: Real-life experience of forcing smaller input splits?
David Morel 2013-01-25, 09:53
On 25 Jan 2013, at 10:37, Bertrand Dechoux wrote:
> It seems to me the question has not been answered :
> "is it possible yes or no to force a smaller split size
> than a block on the mappers"
> Not that I know (but you could implement something to do it) but why would
> you do it?
> By default if the split is set under the size of a block, it will be a
> One of the reason is data-locality. The second is that a block is written
> into a single hard-drive (leaving replicas aside) so if n mappers were
> reading n parts from the same block well they would share the hard-drive
> bandwidth... So it is not a clear win.
> You can change the block size of the file you want to read but using
> smaller block size is really an anti-pattern. Most people increase the
> block size.
> (Note : block size of files are fixed when writing the files and it can be
> different between two different files.)
> Are you trying to handle data which are too small?
> If hive supports multi-threading for mapper it might be an solution. But I
> don't the configuration for that.
The files are RCFiles with a block size of 128MB IIRC, but the file
compression achieves a ratio of nearly 1 to 100. When going through the
mapper, there is simply not enough memory available to it. Since the
compression scheme is BLOCK, I expected it would be possible to instruct
hive to process only a limited number of fragments instead of everything
that's in the file in 1 go.
David Morel 2013-01-25, 12:28
Dean Wampler 2013-01-25, 13:39
Edward Capriolo 2013-01-25, 07:44