Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> How to merge regions in HBase?


Copy link to this message
-
Re: How to merge regions in HBase?
Shouldn't it be possible for him to have empty regions if he has a TTL on his data?

--
Bryan Beaudreault
On Wednesday, July 18, 2012 at 9:58 AM, Kevin O'dell wrote:

> Also, depending on your version of HBase that you are running you may have
> to bring down the cluster to merge and not just the table:
>
> https://issues.apache.org/jira/browse/HBASE-1621
>
> On Tue, Jul 17, 2012 at 7:26 PM, Amandeep Khurana <[EMAIL PROTECTED] (mailto:[EMAIL PROTECTED])> wrote:
>
> > You shouldn't have empty regions. Using timestamp will give you
> > regions that are always half filled except the last one to which you
> > are writing the current time range. The moment that'll fill up, split
> > and you'll again be writing to the last region. How did you end up
> > with empty regions? Did you pre-split?
> >
> > On Jul 17, 2012, at 7:15 PM, Michael Segel <[EMAIL PROTECTED] (mailto:[EMAIL PROTECTED])>
> > wrote:
> >
> > > Find a different row key?
> > >
> > > The problem with merging regions is that once you merge the regions, any
> > net new regions will still have the same problem. So you'll have to merge
> > again, and again and again.
> > > You're always filling to the left of the last key.
> > >
> > > In order to merge, you have to take the table offline. At least that's
> > my understanding. So its not a good thing.
> > >
> > >
> > > On Jul 17, 2012, at 11:08 AM, Ionut Ignatescu wrote:
> > >
> > > > My usecase: I have several tabels with key starting with a timestamp.
> > Also,
> > > > this tabels have set data retention to 30 days.
> > > > Table size is around 1Tb(3Tb replicated) and data is inserted regular(on
> > > > 5minute, ~200Mb is inserted).
> > > > File size is set to 1Gb. I have this tables in use for almost half an
> > > >
> > >
> >
> > year
> > > > and now a table has around 6k partitions and 40% of them are empty.
> > > > The problem: the number of regions per region server is now pretty high.
> > > > Questions:
> > > > Which approach is better?
> > > > - to merge adiacent empty partitions in a bigger one?
> > > > - to merge empty partitions to non-empty partitions?
> > > > Also, I'm wondering why regions merge is not part of major compactions
> > > >
> > >
> >
> > and
> > > > why it's neccesary to stop the
> > > > entire fleet to solve this problem.
> > > >
> > > >
> > > >
> > > > Regards,
> > > >
> > > > Ionut I.
>
>
>
> --
> Kevin O'Dell
> Customer Operations Engineer, Cloudera
>
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB