|
|
-
Re: [HBase 0.92.1] Too many stores files to compact, compaction moving slowlyShrijeet Paliwal 2012-05-14, 22:03
These M files will have to contain globally sorted entries (first
entry in 0th file will be smallest key and last entry of M-1th file will be the largest key), No? configureIncrementalLoad achieves this by peeking into existing table and prepares a file to enforce total order (by reading split points via table.getStartKeys()) Like you said , in my case - table will be created after MR job completes. So I guess what I need to do is come up with a split file. Give it to both the MR job's partitioner and create table command (create 't1', 'f1', {SPLITS_FILE => 'splits.txt'}) . Finally use bulk import. Unless there is a way in bulk import to enforce total order even if the output of MR is not that way. Coming up with this file before hand is not a problem in my case. But just want to check if I am getting your point correctly. Thanks Stack. -Shrijeet On Mon, May 14, 2012 at 2:46 PM, Stack <[EMAIL PROTECTED]> wrote: > On Mon, May 14, 2012 at 2:11 PM, Shrijeet Paliwal > <[EMAIL PROTECTED]> wrote: >> Ahh of course! Thank you. One question what partition file I give to >> the top partitioner? >> I am trying to parse your last comment. >> "You could figure how many you need by looking at the output of your MR job" >> >> Chicken and egg? Or am I not following you correctly. >> > > I was thinking that your MR job would not look to a table at all to > figure where to partition the data. Rather, your reducer would write > out files of size N where size N is just under your region max file > size. After the MR is done, you'll then have M files. You'll need to > create a table w/ M region boundaries (or M+1?) to match the flies > produced (HFiles write out their first and last keys in metadata > IIRC). You'll have to override the likes of the > configureIncrementalLoad in HFileOutputFormat methinks. > > Its just a suggestion. I've not dug in on viability. > > St.Ack |