Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> PutSortReducer memory threshold


Copy link to this message
-
Re: PutSortReducer memory threshold
So, If I have a lot of puts per row, say 100 times the number of memory
threshold, 100 different store files will be written to the same region (at
least...) ?
Will this trigger major compaction for the region during/after bulk load ?
Is the trigger #storeFiles >  hbase.hstore.compactionThreshold ?
On Wed, Nov 6, 2013 at 1:01 PM, rajeshbabu chintaguntla <
[EMAIL PROTECTED]> wrote:

>
> When we execute context.write(null,null),we will close the current
> writer(which opened a storefile) and on next write request we will create
> new writer for other storefile.
> If a row key has puts of size more than the threshold, then they will be
> written to multiple store files. So same rowkey data will be distributed to
> multiple storefiles.
> In outer while loop we will continue the reduce from the point at which we
> have flushed or rolled. We will not omit any data.
>
> ________________________________________
> From: Amit Sela [[EMAIL PROTECTED]]
> Sent: Wednesday, November 06, 2013 3:54 PM
> To: [EMAIL PROTECTED]
> Subject: PutSortReducer memory threshold
>
> Looking at the code of PutSortReducer I see that if my key has puts with
> size bigger than memory, the iteration stops and all puts up to the
> threshold point will be written to context.
> If iterator has more puts,  context.write(null,null) is executed.
> Does this tell the bulk load tool to re-execute the reduce from that point
> in some way (if so, how ?) or the rest of the data is just omitted ?
>
> Thanks,
>
> Amit.
>