Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> Presplit regions when creating a table


Copy link to this message
-
Re: Presplit regions when creating a table
No, you need to know your key ranges for each split. If you don't and you guess wrong, you may end up not seeing any benefits because your data may still end up going to a single region...
(Its data dependent.)

I am personally not a fan of pre-splitting a table.

The way I look at it, you only really have to deal with this when you first create a table. However, once your application is in a steady state, the tables will split naturally and you should have enough regions to get decent performance.

Of course YMMV...
On Jul 5, 2012, at 4:16 AM, Christian Schäfer wrote:

> Hi,
>
> I didn't hear about the possibility to split by regex. May somebody else will post here if it's possible.
>
> But you could maybe workaround that by doing a mapping from regex to region in your client code.
>
> If that's not an option and it's too difficult to decide how to pre-split you could rely on auto-splitting that occurs when hbase.hregion.max.filesize is reached.
>
>
> A often helpful online reference: http://hbase.apache.org/book.html  -> see 2.8.2.7. Managed Splitting
>
> regards
> Chris
>
>
>
>
> ________________________________
> Von: Prakrati Agrawal <[EMAIL PROTECTED]>
> An: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>; Christian Schäfer <[EMAIL PROTECTED]>
> Gesendet: 6:30 Donnerstag, 5.Juli 2012
> Betreff: RE: Presplit regions when creating a table
>
> Hi
>
> Can I do splits on regular expressions instead of specific keys? For example, keys having a particular pattern go to node#1 and others go to node#2 etc.
>
> Thanks and Regards
> Prakrati
> -----Original Message-----
> From: Christian Schäfer [mailto:[EMAIL PROTECTED]]
> Sent: Wednesday, July 04, 2012 5:14 PM
> To: [EMAIL PROTECTED]
> Subject: Re: Presplit regions when creating a table
>
> Simplest way to pre-split a table is on table creation using the hbase shell by specifying the key-splits.
>
> This could look like this: create 'mytable', 'myfamily', {SPLITS => ['111111', '222222', '333333', '444444']}
>
> resulting in 5 regions: [below-111111[ , [111111-222222[, [222222-333333[, [333333-444444[, [444444-above[
>
> If you have  a limited amount of attributes you store per row you should consider using OpenTSDB that's built on top of hbase and aims on time series data.
>
> regards
> Chris
>
>
>
> ----- Ursprüngliche Message -----
> Von: Prakrati Agrawal <[EMAIL PROTECTED]>
> An: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>
> CC:
> Gesendet: 13:23 Mittwoch, 4.Juli 2012
> Betreff: Presplit regions when creating a table
>
> Dear  all,
>
> I am using Hbase 0.90.6
> I have a streaming data which I want to store in Hbase table. I thought of the row key design as "typeString_date_Id" where typeString is of 5 types.  Now the problem is that the types are not evenly distributed i.e I have 1 type a lot more than another type due to which if I start inserting the data, I will see hotspotting in some region servers as compared to others. To avoid this, I thought I will presplit the regions. I am not understanding how to use the region splitter to my benefit. Can I get a code snippet on how to do it. I am using RegionSplitter interface to do the same.
>
> Thanks
> Prakrati
>
> ________________________________
> This email message may contain proprietary, private and confidential information. The information transmitted is intended only for the person(s) or entities to which it is addressed. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipient is prohibited and may be illegal. If you received this in error, please contact the sender and delete the message from your system.
>
> Mu Sigma takes all reasonable steps to ensure that its electronic communications are free from viruses. However, given Internet accessibility, the Company cannot accept liability for any virus introduced by this e-mail or any attachment and you are advised to use up-to-date virus checking software.