Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> [HBase 0.92.1] Too many stores files to compact, compaction moving slowly


Copy link to this message
-
Re: [HBase 0.92.1] Too many stores files to compact, compaction moving slowly
These M files will have to contain globally sorted entries (first
entry in 0th file will be smallest key and last entry of M-1th file
will be the largest key), No?
configureIncrementalLoad achieves this by peeking into existing table
and prepares a file to enforce total order (by reading split points
via  table.getStartKeys())

Like you said , in my case - table will be created after MR job
completes. So I guess what I need to do is come up with a split file.
Give it to both the MR job's partitioner and create table command
(create 't1', 'f1', {SPLITS_FILE => 'splits.txt'}) . Finally use bulk
import.

Unless there is a way in bulk import to enforce total order even if
the output of MR is not that way. Coming up with this file before hand
is not a problem in my case. But just want to check if I am getting
your point correctly.

Thanks Stack.

-Shrijeet

On Mon, May 14, 2012 at 2:46 PM, Stack <[EMAIL PROTECTED]> wrote:
> On Mon, May 14, 2012 at 2:11 PM, Shrijeet Paliwal
> <[EMAIL PROTECTED]> wrote:
>> Ahh of course! Thank you. One question what partition file I give to
>> the top partitioner?
>> I am trying to parse your last comment.
>> "You could figure how many you need by looking at the output of your MR job"
>>
>> Chicken and egg? Or am I not following you correctly.
>>
>
> I was thinking that your MR job would not look to a table at all to
> figure where to partition the data.  Rather, your reducer would write
> out files of size N where size N is just under your region max file
> size.  After the MR is done, you'll then have M files.  You'll need to
> create a table w/ M region boundaries (or M+1?) to match the flies
> produced (HFiles write out their first and last keys in metadata
> IIRC).  You'll have to override the likes of the
> configureIncrementalLoad in HFileOutputFormat methinks.
>
> Its just a suggestion.  I've not dug in on viability.
>
> St.Ack