So, If I have a lot of puts per row, say 100 times the number of memory
threshold, 100 different store files will be written to the same region (at
Will this trigger major compaction for the region during/after bulk load ?
Is the trigger #storeFiles > hbase.hstore.compactionThreshold ?
On Wed, Nov 6, 2013 at 1:01 PM, rajeshbabu chintaguntla <
[EMAIL PROTECTED]> wrote:
> When we execute context.write(null,null),we will close the current
> writer(which opened a storefile) and on next write request we will create
> new writer for other storefile.
> If a row key has puts of size more than the threshold, then they will be
> written to multiple store files. So same rowkey data will be distributed to
> multiple storefiles.
> In outer while loop we will continue the reduce from the point at which we
> have flushed or rolled. We will not omit any data.
> From: Amit Sela [[EMAIL PROTECTED]]
> Sent: Wednesday, November 06, 2013 3:54 PM
> To: [EMAIL PROTECTED]
> Subject: PutSortReducer memory threshold
> Looking at the code of PutSortReducer I see that if my key has puts with
> size bigger than memory, the iteration stops and all puts up to the
> threshold point will be written to context.
> If iterator has more puts, context.write(null,null) is executed.
> Does this tell the bulk load tool to re-execute the reduce from that point
> in some way (if so, how ?) or the rest of the data is just omitted ?