|
|
-
Expert suggestion needed to create table in Hbase - Banking
Ramasubramanian Narayanan... 2012-11-26, 07:04
Hi,
I have a requirement of physicalising the logical model... I have a client model which has 600+ entities...
Need suggestion how to go about physicalising it...
I have few other doubts : 1) Whether is it good to create a single table for all the 600+ columns? 2) To have different column families for different groups or can it be under a single column family? For example, customer address can we have as a different column family?
Please help on this.. regards, Rams
+
Ramasubramanian Narayanan... 2012-11-26, 07:04
-
Re: Expert suggestion needed to create table in Hbase - Banking
Michael Segel 2012-11-26, 12:27
Rams,
I think you need to go back and think about why you want to use Hadoop and HBase in the first place. Second, you need to think about your data and how you are planning to use it.
Beyond that, we can only give you a bit of generic answers....
1) You can create a table with 600 columns, however... it depends on what you are trying to do. There are some limitations that you have to consider in your design. However for the specific use case you stated.... they are not applicable.
2) You can have models with different column families. However again it depends on what you are trying to do. However, in your example ... customer address... That's not a good example of when to use a column family. I was going to do a schema design course at a Hadoop conference next year, but it got turned down because it was considered to 'basic'. Maybe I'll propose it for the Hadoop conference in Amsterdam... sorry, I digressed.
Have you thought about using a schema on top of HBase? At a minimum, Avro, or possibly Wibidata's Kiji ? (Not that I'm plugging Aaron's project. ;-)
I am also curious... this isn't the first time this question has come up on the lists... class project?
HTH
-Mike
On Nov 26, 2012, at 1:04 AM, Ramasubramanian Narayanan <[EMAIL PROTECTED]> wrote:
> Hi, > > I have a requirement of physicalising the logical model... I have a > client model which has 600+ entities... > > Need suggestion how to go about physicalising it... > > I have few other doubts : > 1) Whether is it good to create a single table for all the 600+ columns? > 2) To have different column families for different groups or can it be > under a single column family? For example, customer address can we have as > a different column family? > > Please help on this.. > > > regards, > Rams
+
Michael Segel 2012-11-26, 12:27
-
RE: Expert suggestion needed to create table in Hbase - Banking
Li, Min 2012-11-26, 07:40
When 1 cf need to do split, other 599 cfs will split at the same time. So many fragments will be produced when you use so many column families. Actually, many cfs can be merge to only one cf with specific tags in rowkey. For example, rowkey of customer address can be uid+'AD', and customer profile can be uid+'PR'.
Min -----Original Message----- From: Ramasubramanian Narayanan [mailto:[EMAIL PROTECTED]] Sent: Monday, November 26, 2012 3:05 PM To: [EMAIL PROTECTED] Subject: Expert suggestion needed to create table in Hbase - Banking
Hi,
I have a requirement of physicalising the logical model... I have a client model which has 600+ entities...
Need suggestion how to go about physicalising it...
I have few other doubts : 1) Whether is it good to create a single table for all the 600+ columns? 2) To have different column families for different groups or can it be under a single column family? For example, customer address can we have as a different column family?
Please help on this.. regards, Rams
+
Li, Min 2012-11-26, 07:40
-
Re: Expert suggestion needed to create table in Hbase - Banking
Ramasubramanian Narayanan... 2012-11-26, 10:23
Hi, Thanks! Can we have the customer number as the RowKey for the customer (client) master table? Please help in educating me on the advantage and disadvantage of having customer number as the Row key...
Also SCD2 we may need to implement in that table.. will it work if I have like that?
Or
SCD2 is not needed instead we can achieve the same by increasing the version number that it will hold?
pls suggest...
regards, Rams
On Mon, Nov 26, 2012 at 1:10 PM, Li, Min <[EMAIL PROTECTED]> wrote:
> When 1 cf need to do split, other 599 cfs will split at the same time. So > many fragments will be produced when you use so many column families. > Actually, many cfs can be merge to only one cf with specific tags in > rowkey. For example, rowkey of customer address can be uid+'AD', and > customer profile can be uid+'PR'. > > Min > -----Original Message----- > From: Ramasubramanian Narayanan [mailto: > [EMAIL PROTECTED]] > Sent: Monday, November 26, 2012 3:05 PM > To: [EMAIL PROTECTED] > Subject: Expert suggestion needed to create table in Hbase - Banking > > Hi, > > I have a requirement of physicalising the logical model... I have a > client model which has 600+ entities... > > Need suggestion how to go about physicalising it... > > I have few other doubts : > 1) Whether is it good to create a single table for all the 600+ columns? > 2) To have different column families for different groups or can it be > under a single column family? For example, customer address can we have as > a different column family? > > Please help on this.. > > > regards, > Rams >
+
Ramasubramanian Narayanan... 2012-11-26, 10:23
-
Re: Expert suggestion needed to create table in Hbase - Banking
Mohammad Tariq 2012-11-26, 10:28
Hello sir,
You might become a victim of RS hotspotting, since the cutomerIDs will be sequential(I assume). To keep things simple Hbase puts all the rows with similar keys to the same RS. But, it becomes a bottleneck in the long run as all the data keeps on going to the same region.
HTH
Regards, Mohammad Tariq
On Mon, Nov 26, 2012 at 3:53 PM, Ramasubramanian Narayanan < [EMAIL PROTECTED]> wrote:
> Hi, > Thanks! Can we have the customer number as the RowKey for the customer > (client) master table? Please help in educating me on the advantage and > disadvantage of having customer number as the Row key... > > Also SCD2 we may need to implement in that table.. will it work if I have > like that? > > Or > > SCD2 is not needed instead we can achieve the same by increasing the > version number that it will hold? > > pls suggest... > > regards, > Rams > > On Mon, Nov 26, 2012 at 1:10 PM, Li, Min <[EMAIL PROTECTED]> wrote: > > > When 1 cf need to do split, other 599 cfs will split at the same time. So > > many fragments will be produced when you use so many column families. > > Actually, many cfs can be merge to only one cf with specific tags in > > rowkey. For example, rowkey of customer address can be uid+'AD', and > > customer profile can be uid+'PR'. > > > > Min > > -----Original Message----- > > From: Ramasubramanian Narayanan [mailto: > > [EMAIL PROTECTED]] > > Sent: Monday, November 26, 2012 3:05 PM > > To: [EMAIL PROTECTED] > > Subject: Expert suggestion needed to create table in Hbase - Banking > > > > Hi, > > > > I have a requirement of physicalising the logical model... I have a > > client model which has 600+ entities... > > > > Need suggestion how to go about physicalising it... > > > > I have few other doubts : > > 1) Whether is it good to create a single table for all the 600+ > columns? > > 2) To have different column families for different groups or can it be > > under a single column family? For example, customer address can we have > as > > a different column family? > > > > Please help on this.. > > > > > > regards, > > Rams > > >
+
Mohammad Tariq 2012-11-26, 10:28
-
Re: Expert suggestion needed to create table in Hbase - Banking
Doug Meil 2012-11-26, 13:43
Hi there, somebody already wisely mentioned the link to the # of CF's entry, but here are a few other entries that can save you some heartburn if you read them ahead of time. http://hbase.apache.org/book.html#datamodelhttp://hbase.apache.org/book.html#schemahttp://hbase.apache.org/book.html#architectureOn 11/26/12 5:28 AM, "Mohammad Tariq" <[EMAIL PROTECTED]> wrote: >Hello sir, > > You might become a victim of RS hotspotting, since the cutomerIDs will >be sequential(I assume). To keep things simple Hbase puts all the rows >with >similar keys to the same RS. But, it becomes a bottleneck in the long run >as all the data keeps on going to the same region. > >HTH > >Regards, > Mohammad Tariq > > > >On Mon, Nov 26, 2012 at 3:53 PM, Ramasubramanian Narayanan < >[EMAIL PROTECTED]> wrote: > >> Hi, >> Thanks! Can we have the customer number as the RowKey for the customer >> (client) master table? Please help in educating me on the advantage and >> disadvantage of having customer number as the Row key... >> >> Also SCD2 we may need to implement in that table.. will it work if I >>have >> like that? >> >> Or >> >> SCD2 is not needed instead we can achieve the same by increasing the >> version number that it will hold? >> >> pls suggest... >> >> regards, >> Rams >> >> On Mon, Nov 26, 2012 at 1:10 PM, Li, Min <[EMAIL PROTECTED]> wrote: >> >> > When 1 cf need to do split, other 599 cfs will split at the same >>time. So >> > many fragments will be produced when you use so many column families. >> > Actually, many cfs can be merge to only one cf with specific tags in >> > rowkey. For example, rowkey of customer address can be uid+'AD', and >> > customer profile can be uid+'PR'. >> > >> > Min >> > -----Original Message----- >> > From: Ramasubramanian Narayanan [mailto: >> > [EMAIL PROTECTED]] >> > Sent: Monday, November 26, 2012 3:05 PM >> > To: [EMAIL PROTECTED] >> > Subject: Expert suggestion needed to create table in Hbase - Banking >> > >> > Hi, >> > >> > I have a requirement of physicalising the logical model... I have a >> > client model which has 600+ entities... >> > >> > Need suggestion how to go about physicalising it... >> > >> > I have few other doubts : >> > 1) Whether is it good to create a single table for all the 600+ >> columns? >> > 2) To have different column families for different groups or can it >>be >> > under a single column family? For example, customer address can we >>have >> as >> > a different column family? >> > >> > Please help on this.. >> > >> > >> > regards, >> > Rams >> > >>
+
Doug Meil 2012-11-26, 13:43
-
Re: Expert suggestion needed to create table in Hbase - Banking
Michael Segel 2012-11-26, 12:30
If the row Key is just the customer ID, then a simple MD5 hash or SHA-1 hash would suffice. That would clear up any risk of hot spotting, once you do your initial load of data.
And that's probably a key point... hot spotting when you're first loading a very large table is really a moot point. It may be painful, but the pain lasts for less than an hour.
On Nov 26, 2012, at 4:28 AM, Mohammad Tariq <[EMAIL PROTECTED]> wrote:
> Hello sir, > > You might become a victim of RS hotspotting, since the cutomerIDs will > be sequential(I assume). To keep things simple Hbase puts all the rows with > similar keys to the same RS. But, it becomes a bottleneck in the long run > as all the data keeps on going to the same region. > > HTH > > Regards, > Mohammad Tariq > > > > On Mon, Nov 26, 2012 at 3:53 PM, Ramasubramanian Narayanan < > [EMAIL PROTECTED]> wrote: > >> Hi, >> Thanks! Can we have the customer number as the RowKey for the customer >> (client) master table? Please help in educating me on the advantage and >> disadvantage of having customer number as the Row key... >> >> Also SCD2 we may need to implement in that table.. will it work if I have >> like that? >> >> Or >> >> SCD2 is not needed instead we can achieve the same by increasing the >> version number that it will hold? >> >> pls suggest... >> >> regards, >> Rams >> >> On Mon, Nov 26, 2012 at 1:10 PM, Li, Min <[EMAIL PROTECTED]> wrote: >> >>> When 1 cf need to do split, other 599 cfs will split at the same time. So >>> many fragments will be produced when you use so many column families. >>> Actually, many cfs can be merge to only one cf with specific tags in >>> rowkey. For example, rowkey of customer address can be uid+'AD', and >>> customer profile can be uid+'PR'. >>> >>> Min >>> -----Original Message----- >>> From: Ramasubramanian Narayanan [mailto: >>> [EMAIL PROTECTED]] >>> Sent: Monday, November 26, 2012 3:05 PM >>> To: [EMAIL PROTECTED] >>> Subject: Expert suggestion needed to create table in Hbase - Banking >>> >>> Hi, >>> >>> I have a requirement of physicalising the logical model... I have a >>> client model which has 600+ entities... >>> >>> Need suggestion how to go about physicalising it... >>> >>> I have few other doubts : >>> 1) Whether is it good to create a single table for all the 600+ >> columns? >>> 2) To have different column families for different groups or can it be >>> under a single column family? For example, customer address can we have >> as >>> a different column family? >>> >>> Please help on this.. >>> >>> >>> regards, >>> Rams >>> >>
+
Michael Segel 2012-11-26, 12:30
-
Re: Expert suggestion needed to create table in Hbase - Banking
syed kather 2012-11-26, 11:55
Hello Sir , For solving RS hotspotting you can also try this below http://blog.sematext.com/2012/04/09/hbasewd-avoid-regionserver-hotspotting-despite-writing-records-with-sequential-keys/It works fine .. Regrading the Columns Family you can also try to group similar columns towards one family, based on the process which you decide . thanks and regards, Syed Abdul Kather Thanks and Regards, S SYED ABDUL KATHER On Mon, Nov 26, 2012 at 3:58 PM, Mohammad Tariq <[EMAIL PROTECTED]> wrote: > Hello sir, > > You might become a victim of RS hotspotting, since the cutomerIDs will > be sequential(I assume). To keep things simple Hbase puts all the rows with > similar keys to the same RS. But, it becomes a bottleneck in the long run > as all the data keeps on going to the same region. > > HTH > > Regards, > Mohammad Tariq > > > > On Mon, Nov 26, 2012 at 3:53 PM, Ramasubramanian Narayanan < > [EMAIL PROTECTED]> wrote: > > > Hi, > > Thanks! Can we have the customer number as the RowKey for the customer > > (client) master table? Please help in educating me on the advantage and > > disadvantage of having customer number as the Row key... > > > > Also SCD2 we may need to implement in that table.. will it work if I have > > like that? > > > > Or > > > > SCD2 is not needed instead we can achieve the same by increasing the > > version number that it will hold? > > > > pls suggest... > > > > regards, > > Rams > > > > On Mon, Nov 26, 2012 at 1:10 PM, Li, Min <[EMAIL PROTECTED]> wrote: > > > > > When 1 cf need to do split, other 599 cfs will split at the same time. > So > > > many fragments will be produced when you use so many column families. > > > Actually, many cfs can be merge to only one cf with specific tags in > > > rowkey. For example, rowkey of customer address can be uid+'AD', and > > > customer profile can be uid+'PR'. > > > > > > Min > > > -----Original Message----- > > > From: Ramasubramanian Narayanan [mailto: > > > [EMAIL PROTECTED]] > > > Sent: Monday, November 26, 2012 3:05 PM > > > To: [EMAIL PROTECTED] > > > Subject: Expert suggestion needed to create table in Hbase - Banking > > > > > > Hi, > > > > > > I have a requirement of physicalising the logical model... I have a > > > client model which has 600+ entities... > > > > > > Need suggestion how to go about physicalising it... > > > > > > I have few other doubts : > > > 1) Whether is it good to create a single table for all the 600+ > > columns? > > > 2) To have different column families for different groups or can it > be > > > under a single column family? For example, customer address can we have > > as > > > a different column family? > > > > > > Please help on this.. > > > > > > > > > regards, > > > Rams > > > > > >
+
syed kather 2012-11-26, 11:55
-
Re: Expert suggestion needed to create table in Hbase - Banking
anil gupta 2012-11-26, 07:35
Hi Rams,
The description of your use case is very abstract so i will try to answer your question to the best of my ability.
1) Whether is it good to create a single table for all the 600+ columns? Anil: Yes, it is absolutely ok to have 600+ columns in a row in HBase (you can go max upto few millions)
2) To have different column families for different groups or can it be under a single column family? For example, customer address can we have as a different column family? Anil: Usually HBase recommends not to have many column families(not more than 3 or 4). Having one column family is a very standard practice. However, in some cases creating more then one CF is justified. For example in around 95% of your lookups if you dont need to access "Customer Address" data then it would make sense to put them into a separate column family.
HTH, Anil Gupta
On Sun, Nov 25, 2012 at 11:04 PM, Ramasubramanian Narayanan < [EMAIL PROTECTED]> wrote:
> Hi, > > I have a requirement of physicalising the logical model... I have a > client model which has 600+ entities... > > Need suggestion how to go about physicalising it... > > I have few other doubts : > 1) Whether is it good to create a single table for all the 600+ columns? > 2) To have different column families for different groups or can it be > under a single column family? For example, customer address can we have as > a different column family? > > Please help on this.. > > > regards, > Rams >
-- Thanks & Regards, Anil Gupta
+
anil gupta 2012-11-26, 07:35
-
Re: Expert suggestion needed to create table in Hbase - Banking
anil gupta 2012-11-26, 07:40
More on number of column families: http://hbase.apache.org/book/number.of.cfs.htmlOn Sun, Nov 25, 2012 at 11:35 PM, anil gupta <[EMAIL PROTECTED]> wrote: > Hi Rams, > > The description of your use case is very abstract so i will try to answer > your question to the best of my ability. > > > 1) Whether is it good to create a single table for all the 600+ columns? > Anil: Yes, it is absolutely ok to have 600+ columns in a row in HBase (you > can go max upto few millions) > > > 2) To have different column families for different groups or can it be > under a single column family? For example, customer address can we have as > a different column family? > Anil: Usually HBase recommends not to have many column families(not more > than 3 or 4). Having one column family is a very standard practice. > However, in some cases creating more then one CF is justified. For example > in around 95% of your lookups if you dont need to access "Customer Address" > data then it would make sense to put them into a separate column family. > > HTH, > Anil Gupta > > > > > > > > On Sun, Nov 25, 2012 at 11:04 PM, Ramasubramanian Narayanan < > [EMAIL PROTECTED]> wrote: > >> Hi, >> >> I have a requirement of physicalising the logical model... I have a >> client model which has 600+ entities... >> >> Need suggestion how to go about physicalising it... >> >> I have few other doubts : >> 1) Whether is it good to create a single table for all the 600+ columns? >> 2) To have different column families for different groups or can it be >> under a single column family? For example, customer address can we have as >> a different column family? >> >> Please help on this.. >> >> >> regards, >> Rams >> > > > > -- > Thanks & Regards, > Anil Gupta > -- Thanks & Regards, Anil Gupta
+
anil gupta 2012-11-26, 07:40
|
|