-Re: Changing the Java heap
Michael Segel 2012-04-26, 20:56
Not sure of your question.
Java child Heap size is independent of how files are split on HDFS.
I suggest you look at Tom White's book on HDFS and how files are split in to blocks.
Blocks are split on set sizes. 64MB by default.
Your record boundaries are not necessarily on file block boundaries so one process may read the rest of the last record in block A and then complete reading it at the start of block B. A different task may start with block B and skip the first n bytes until it hits the start of a record.
On Apr 26, 2012, at 3:46 PM, Barry, Sean F wrote:
> Within my small 2 node cluster I set up my 4 core slave node to have 4 task trackers and I also limited my java heap size to -Xmx1024m
> Is there a possibility that when the data gets broken up that it will break it at a place in the file that is not a whitespace? Or is that already handled when the data on HDFS is broken up into blocks?