Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> reduce influence of auto-splitting region


Copy link to this message
-
RE: reduce influence of auto-splitting region
Yes.  The row keys generated should be falling in the range of one of the
region's start and end key .  So HBase internally can take care of
distributing to the specified region server.
As mentioned in http://hbase.apache.org/book/perf.writing.html, we also need
to take care of not making one particular region  as hot region.

If suppose the data for a span of 30 mins is collected and then it is passed
on to HBase then the client can be written in such a way like the puts are
equally distributed to the regions that comprises the 30 mins data.

Hope this helps.

Regards
Ram

> -----Original Message-----
> From: jing wang [mailto:[EMAIL PROTECTED]]
> Sent: Wednesday, September 05, 2012 8:00 PM
> To: [EMAIL PROTECTED]
> Subject: Re: reduce influence of auto-splitting region
>
> Hi Ram,
>
>   How to drive the data to the specific hourly region? Use the code
> like
> http://hbase.apache.org/book/perf.writing.html?
>
>
> Thanks,
> Jing Wang
>
> 2012/9/5 Ramkrishna.S.Vasudevan <[EMAIL PROTECTED]>
>
> > Hi JingWang
> >
> > It is not necessary that region split can cause GC problems.  Based
> on your
> > use case we may need to configure heapspace for the RS.
> > Coming back to region splits, presplit of the tables created is a
> good
> > option.
> > Assume a case where I know that the data that is going to come into
> hbase
> > is
> > on a hourly basis.  Then one option could be presplit your table
> based on
> > the hours and assign the regions in roundrobin fashion to every RS.
> > This will ensure that any particular hours data will go into one
> region
> > specified for that hour only.  So after that hour is over the data
> will be
> > moving over to another region server.
> > But here again every hour can be split equally into the different RS
> like 5
> > or 10 regions with in an hour.
> > These are some ways, but should be chosen as per the data that your
> cluster
> > will be operating upon.
> >
> > Regards
> > Ram
> >
> > > -----Original Message-----
> > > From: jing wang [mailto:[EMAIL PROTECTED]]
> > > Sent: Wednesday, September 05, 2012 6:42 PM
> > > To: [EMAIL PROTECTED]
> > > Subject: Re: reduce influence of auto-splitting region
> > >
> > > Hi Ram,
> > >
> > > Thanks for your advice. We did consider what you said.
> > > As Hbase is used as a realtime storage,just like mysql/oracle. When
> > > splitted, hbase may lead gc to 'stop the world' or some long time
> full
> > > gc.
> > > Our application can't accpet this.
> > >
> > > Thanks,
> > > Jing Wang
> > >
> > > 2012/9/5 Ramkrishna.S.Vasudevan <[EMAIL PROTECTED]>
> > >
> > > > You can use the property hbase.hregion.max.filesize.  You can set
> > > this to a
> > > > higher value and control the splits through your application.
> > > >
> > > > Regards
> > > > Ram
> > > >
> > > > > -----Original Message-----
> > > > > From: jing wang [mailto:[EMAIL PROTECTED]]
> > > > > Sent: Wednesday, September 05, 2012 3:48 PM
> > > > > To: [EMAIL PROTECTED]
> > > > > Subject: reduce influence of auto-splitting region
> > > > >
> > > > > Hi there,
> > > > >
> > > > >   Using Hbase as a realtime storage(7*24h), how to reduce the
> > > influence
> > > > > of
> > > > > region auto-splitting?
> > > > >   Any advice will be appreciated!
> > > > >
> > > > >
> > > > > Thanks,
> > > > > Jing
> > > >
> > > >
> >
> >