Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
HBase >> mail # user >> [HBase 0.92.1] Too many stores files to compact, compaction moving slowly


+
Shrijeet Paliwal 2012-05-13, 23:12
+
Stack 2012-05-14, 19:29
+
Shrijeet Paliwal 2012-05-14, 21:11
+
Stack 2012-05-14, 21:46
Copy link to this message
-
Re: [HBase 0.92.1] Too many stores files to compact, compaction moving slowly
These M files will have to contain globally sorted entries (first
entry in 0th file will be smallest key and last entry of M-1th file
will be the largest key), No?
configureIncrementalLoad achieves this by peeking into existing table
and prepares a file to enforce total order (by reading split points
via  table.getStartKeys())

Like you said , in my case - table will be created after MR job
completes. So I guess what I need to do is come up with a split file.
Give it to both the MR job's partitioner and create table command
(create 't1', 'f1', {SPLITS_FILE => 'splits.txt'}) . Finally use bulk
import.

Unless there is a way in bulk import to enforce total order even if
the output of MR is not that way. Coming up with this file before hand
is not a problem in my case. But just want to check if I am getting
your point correctly.

Thanks Stack.

-Shrijeet

On Mon, May 14, 2012 at 2:46 PM, Stack <[EMAIL PROTECTED]> wrote:
> On Mon, May 14, 2012 at 2:11 PM, Shrijeet Paliwal
> <[EMAIL PROTECTED]> wrote:
>> Ahh of course! Thank you. One question what partition file I give to
>> the top partitioner?
>> I am trying to parse your last comment.
>> "You could figure how many you need by looking at the output of your MR job"
>>
>> Chicken and egg? Or am I not following you correctly.
>>
>
> I was thinking that your MR job would not look to a table at all to
> figure where to partition the data.  Rather, your reducer would write
> out files of size N where size N is just under your region max file
> size.  After the MR is done, you'll then have M files.  You'll need to
> create a table w/ M region boundaries (or M+1?) to match the flies
> produced (HFiles write out their first and last keys in metadata
> IIRC).  You'll have to override the likes of the
> configureIncrementalLoad in HFileOutputFormat methinks.
>
> Its just a suggestion.  I've not dug in on viability.
>
> St.Ack
+
Stack 2012-05-14, 22:26
+
Ramkrishna.S.Vasudevan 2012-05-14, 04:29
+
Shrijeet Paliwal 2012-05-14, 05:20
+
Ramkrishna.S.Vasudevan 2012-05-14, 05:48
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB