|
kranthi reddy
2010-03-31, 10:05
Jonathan Gray
2010-03-31, 16:53
kranthi reddy
2010-04-12, 08:55
Amandeep Khurana
2010-04-12, 08:58
Jonathan Gray
2010-04-12, 10:07
Michael Segel
2010-04-12, 18:47
kranthi reddy
2010-04-13, 05:17
Amandeep Khurana
2010-04-13, 05:31
kranthi reddy
2010-04-14, 05:43
Imran M Yousuf
2010-04-14, 06:03
kranthi reddy
2010-04-14, 09:08
Michael Segel
2010-04-14, 12:30
|
-
Porting SQL DB into HBASEkranthi reddy 2010-03-31, 10:05
Hi all,
I have run into some trouble while trying to port SQL DB to Hbase. The problem is my SQL DB has around 500 tables (approx) and it is very badly designed. Around 45-50 tables could be denormalised into a single table and the remaining tables are static tables. My doubts are 1) Is it possible to port this DB (Tables) to Hbase? If possible how? 2) How many tables can Hbase support with tolerance towards failure? 3) When so many tables are inserted, how is the performance going to be effected? Will it remain same or degrade? One possible solution I think is using column family for each table. But as per my knowledge and previous experiments, I found Hbase isn't stable when column families are more than 5. Since every day large quantities of data is ported into the DataBase, stability and fail proof system is highest priority. Hoping for a positive response. Thank you, kranthi
-
RE: Porting SQL DB into HBASEJonathan Gray 2010-03-31, 16:53
Kranthi,
HBase can handle a good number of tables, but tens or maybe a hundred. If you have 500 tables you should definitely be rethinking your schema design. The issue is less about HBase being able to handle lots of tables, and much more about whether scattering your data across lots of tables will be performant at read time. 1) Impossible to answer that question without knowing the schemas of the existing tables. 2) Not really any relation between fault tolerance and the number of tables except potentially for recovery time but this would be the same with few, very large tables. 3) No difference in write performance. Read performance if doing simple key lookups would not be impacted, but most like having data spread out like this will mean you'll need joins of some sort. Can you tell more about your data and queries? JG > -----Original Message----- > From: kranthi reddy [mailto:[EMAIL PROTECTED]] > Sent: Wednesday, March 31, 2010 3:05 AM > To: [EMAIL PROTECTED] > Subject: Porting SQL DB into HBASE > > Hi all, > > I have run into some trouble while trying to port SQL DB to > Hbase. > The problem is my SQL DB has around 500 tables (approx) and it is very > badly > designed. Around 45-50 tables could be denormalised into a single table > and > the remaining tables are static tables. My doubts are > > 1) Is it possible to port this DB (Tables) to Hbase? If possible how? > 2) How many tables can Hbase support with tolerance towards failure? > 3) When so many tables are inserted, how is the performance going to be > effected? Will it remain same or degrade? > > One possible solution I think is using column family for each table. > But as > per my knowledge and previous experiments, I found Hbase isn't stable > when > column families are more than 5. > > Since every day large quantities of data is ported into the DataBase, > stability and fail proof system is highest priority. > > Hoping for a positive response. > > Thank you, > kranthi
-
Re: Porting SQL DB into HBASEkranthi reddy 2010-04-12, 08:55
HI jonathan,
Sorry for the late response. Missed your reply. The problem is, around 80% (400) of the tables are static tables and the remaining 20% (100) are dynamic tables that are updated on a daily basis. The problem is denormalising these 20% tables is also extremely difficult and we are planning to port them directly into hbase. And also denormalising these tables would lead to a lot of redundant data. Static tables have number of entries varying in hundreds and mostly less than 1000 entries (rows). Where as the dynamic tables have more than 20,000 entries and each entry might be updated/modified at least once in a week. Regards, kranthi On Wed, Mar 31, 2010 at 10:23 PM, Jonathan Gray <[EMAIL PROTECTED]> wrote: > Kranthi, > > HBase can handle a good number of tables, but tens or maybe a hundred. If > you have 500 tables you should definitely be rethinking your schema design. > The issue is less about HBase being able to handle lots of tables, and much > more about whether scattering your data across lots of tables will be > performant at read time. > > > 1) Impossible to answer that question without knowing the schemas of the > existing tables. > > 2) Not really any relation between fault tolerance and the number of > tables except potentially for recovery time but this would be the same with > few, very large tables. > > 3) No difference in write performance. Read performance if doing simple > key lookups would not be impacted, but most like having data spread out like > this will mean you'll need joins of some sort. > > Can you tell more about your data and queries? > > JG > > > -----Original Message----- > > From: kranthi reddy [mailto:[EMAIL PROTECTED]] > > Sent: Wednesday, March 31, 2010 3:05 AM > > To: [EMAIL PROTECTED] > > Subject: Porting SQL DB into HBASE > > > > Hi all, > > > > I have run into some trouble while trying to port SQL DB to > > Hbase. > > The problem is my SQL DB has around 500 tables (approx) and it is very > > badly > > designed. Around 45-50 tables could be denormalised into a single table > > and > > the remaining tables are static tables. My doubts are > > > > 1) Is it possible to port this DB (Tables) to Hbase? If possible how? > > 2) How many tables can Hbase support with tolerance towards failure? > > 3) When so many tables are inserted, how is the performance going to be > > effected? Will it remain same or degrade? > > > > One possible solution I think is using column family for each table. > > But as > > per my knowledge and previous experiments, I found Hbase isn't stable > > when > > column families are more than 5. > > > > Since every day large quantities of data is ported into the DataBase, > > stability and fail proof system is highest priority. > > > > Hoping for a positive response. > > > > Thank you, > > kranthi > -- Kranthi Reddy. B Room No : 98 Old Boys Hostel IIIT-HYD ----------- I don't know the key to success, but the key to failure is trying to impress others.
-
Re: Porting SQL DB into HBASEAmandeep Khurana 2010-04-12, 08:58
Kranthi,
Your tables seem to be small. Why do you want to port them to HBase? -Amandeep Amandeep Khurana Computer Science Graduate Student University of California, Santa Cruz On Mon, Apr 12, 2010 at 1:55 AM, kranthi reddy <[EMAIL PROTECTED]>wrote: > HI jonathan, > > Sorry for the late response. Missed your reply. > > The problem is, around 80% (400) of the tables are static tables and the > remaining 20% (100) are dynamic tables that are updated on a daily basis. > The problem is denormalising these 20% tables is also extremely difficult > and we are planning to port them directly into hbase. And also > denormalising > these tables would lead to a lot of redundant data. > > Static tables have number of entries varying in hundreds and mostly less > than 1000 entries (rows). Where as the dynamic tables have more than 20,000 > entries and each entry might be updated/modified at least once in a week. > > Regards, > kranthi > > > On Wed, Mar 31, 2010 at 10:23 PM, Jonathan Gray <[EMAIL PROTECTED]> > wrote: > > > Kranthi, > > > > HBase can handle a good number of tables, but tens or maybe a hundred. > If > > you have 500 tables you should definitely be rethinking your schema > design. > > The issue is less about HBase being able to handle lots of tables, and > much > > more about whether scattering your data across lots of tables will be > > performant at read time. > > > > > > 1) Impossible to answer that question without knowing the schemas of the > > existing tables. > > > > 2) Not really any relation between fault tolerance and the number of > > tables except potentially for recovery time but this would be the same > with > > few, very large tables. > > > > 3) No difference in write performance. Read performance if doing simple > > key lookups would not be impacted, but most like having data spread out > like > > this will mean you'll need joins of some sort. > > > > Can you tell more about your data and queries? > > > > JG > > > > > -----Original Message----- > > > From: kranthi reddy [mailto:[EMAIL PROTECTED]] > > > Sent: Wednesday, March 31, 2010 3:05 AM > > > To: [EMAIL PROTECTED] > > > Subject: Porting SQL DB into HBASE > > > > > > Hi all, > > > > > > I have run into some trouble while trying to port SQL DB to > > > Hbase. > > > The problem is my SQL DB has around 500 tables (approx) and it is very > > > badly > > > designed. Around 45-50 tables could be denormalised into a single table > > > and > > > the remaining tables are static tables. My doubts are > > > > > > 1) Is it possible to port this DB (Tables) to Hbase? If possible how? > > > 2) How many tables can Hbase support with tolerance towards failure? > > > 3) When so many tables are inserted, how is the performance going to be > > > effected? Will it remain same or degrade? > > > > > > One possible solution I think is using column family for each table. > > > But as > > > per my knowledge and previous experiments, I found Hbase isn't stable > > > when > > > column families are more than 5. > > > > > > Since every day large quantities of data is ported into the DataBase, > > > stability and fail proof system is highest priority. > > > > > > Hoping for a positive response. > > > > > > Thank you, > > > kranthi > > > > > > -- > Kranthi Reddy. B > Room No : 98 > Old Boys Hostel > IIIT-HYD > > ----------- > > I don't know the key to success, but the key to failure is trying to > impress > others. >
-
Re: Porting SQL DB into HBASEJonathan Gray 2010-04-12, 10:07
Why split all the static data across 400 tables? You could combine
things into fewer tables by prefixing keys with something (maybe the original table names?). Are your dynamic tables very large? Can you address ak's question... Why hbase? JG On Apr 12, 2010, at 9:59 AM, "Amandeep Khurana" <[EMAIL PROTECTED]> wrote: > Kranthi, > > Your tables seem to be small. Why do you want to port them to HBase? > > -Amandeep > > > Amandeep Khurana > Computer Science Graduate Student > University of California, Santa Cruz > > > On Mon, Apr 12, 2010 at 1:55 AM, kranthi reddy <[EMAIL PROTECTED] > >wrote: > >> HI jonathan, >> >> Sorry for the late response. Missed your reply. >> >> The problem is, around 80% (400) of the tables are static tables >> and the >> remaining 20% (100) are dynamic tables that are updated on a daily >> basis. >> The problem is denormalising these 20% tables is also extremely >> difficult >> and we are planning to port them directly into hbase. And also >> denormalising >> these tables would lead to a lot of redundant data. >> >> Static tables have number of entries varying in hundreds and mostly >> less >> than 1000 entries (rows). Where as the dynamic tables have more >> than 20,000 >> entries and each entry might be updated/modified at least once in a >> week. >> >> Regards, >> kranthi >> >> >> On Wed, Mar 31, 2010 at 10:23 PM, Jonathan Gray <[EMAIL PROTECTED]> >> wrote: >> >>> Kranthi, >>> >>> HBase can handle a good number of tables, but tens or maybe a >>> hundred. >> If >>> you have 500 tables you should definitely be rethinking your schema >> design. >>> The issue is less about HBase being able to handle lots of tables, >>> and >> much >>> more about whether scattering your data across lots of tables will >>> be >>> performant at read time. >>> >>> >>> 1) Impossible to answer that question without knowing the schemas >>> of the >>> existing tables. >>> >>> 2) Not really any relation between fault tolerance and the number >>> of >>> tables except potentially for recovery time but this would be the >>> same >> with >>> few, very large tables. >>> >>> 3) No difference in write performance. Read performance if doing >>> simple >>> key lookups would not be impacted, but most like having data >>> spread out >> like >>> this will mean you'll need joins of some sort. >>> >>> Can you tell more about your data and queries? >>> >>> JG >>> >>>> -----Original Message----- >>>> From: kranthi reddy [mailto:[EMAIL PROTECTED]] >>>> Sent: Wednesday, March 31, 2010 3:05 AM >>>> To: [EMAIL PROTECTED] >>>> Subject: Porting SQL DB into HBASE >>>> >>>> Hi all, >>>> >>>> I have run into some trouble while trying to port SQL DB to >>>> Hbase. >>>> The problem is my SQL DB has around 500 tables (approx) and it is >>>> very >>>> badly >>>> designed. Around 45-50 tables could be denormalised into a single >>>> table >>>> and >>>> the remaining tables are static tables. My doubts are >>>> >>>> 1) Is it possible to port this DB (Tables) to Hbase? If possible >>>> how? >>>> 2) How many tables can Hbase support with tolerance towards >>>> failure? >>>> 3) When so many tables are inserted, how is the performance going >>>> to be >>>> effected? Will it remain same or degrade? >>>> >>>> One possible solution I think is using column family for each >>>> table. >>>> But as >>>> per my knowledge and previous experiments, I found Hbase isn't >>>> stable >>>> when >>>> column families are more than 5. >>>> >>>> Since every day large quantities of data is ported into the >>>> DataBase, >>>> stability and fail proof system is highest priority. >>>> >>>> Hoping for a positive response. >>>> >>>> Thank you, >>>> kranthi >>> >> >> >> >> -- >> Kranthi Reddy. B >> Room No : 98 >> Old Boys Hostel >> IIIT-HYD >> >> ----------- >> >> I don't know the key to success, but the key to failure is trying to >> impress >> others. >>
-
RE: Porting SQL DB into HBASEMichael Segel 2010-04-12, 18:47
Just an idea, take a look at a hierarchical design like Pick. I know its doable, but I don't know how well it will perform. > Date: Mon, 12 Apr 2010 14:25:48 +0530 > Subject: Re: Porting SQL DB into HBASE > From: [EMAIL PROTECTED] > To: [EMAIL PROTECTED] > > HI jonathan, > > Sorry for the late response. Missed your reply. > > The problem is, around 80% (400) of the tables are static tables and the > remaining 20% (100) are dynamic tables that are updated on a daily basis. > The problem is denormalising these 20% tables is also extremely difficult > and we are planning to port them directly into hbase. And also denormalising > these tables would lead to a lot of redundant data. > > Static tables have number of entries varying in hundreds and mostly less > than 1000 entries (rows). Where as the dynamic tables have more than 20,000 > entries and each entry might be updated/modified at least once in a week. > > Regards, > kranthi > > > On Wed, Mar 31, 2010 at 10:23 PM, Jonathan Gray <[EMAIL PROTECTED]> wrote: > > > Kranthi, > > > > HBase can handle a good number of tables, but tens or maybe a hundred. If > > you have 500 tables you should definitely be rethinking your schema design. > > The issue is less about HBase being able to handle lots of tables, and much > > more about whether scattering your data across lots of tables will be > > performant at read time. > > > > > > 1) Impossible to answer that question without knowing the schemas of the > > existing tables. > > > > 2) Not really any relation between fault tolerance and the number of > > tables except potentially for recovery time but this would be the same with > > few, very large tables. > > > > 3) No difference in write performance. Read performance if doing simple > > key lookups would not be impacted, but most like having data spread out like > > this will mean you'll need joins of some sort. > > > > Can you tell more about your data and queries? > > > > JG > > > > > -----Original Message----- > > > From: kranthi reddy [mailto:[EMAIL PROTECTED]] > > > Sent: Wednesday, March 31, 2010 3:05 AM > > > To: [EMAIL PROTECTED] > > > Subject: Porting SQL DB into HBASE > > > > > > Hi all, > > > > > > I have run into some trouble while trying to port SQL DB to > > > Hbase. > > > The problem is my SQL DB has around 500 tables (approx) and it is very > > > badly > > > designed. Around 45-50 tables could be denormalised into a single table > > > and > > > the remaining tables are static tables. My doubts are > > > > > > 1) Is it possible to port this DB (Tables) to Hbase? If possible how? > > > 2) How many tables can Hbase support with tolerance towards failure? > > > 3) When so many tables are inserted, how is the performance going to be > > > effected? Will it remain same or degrade? > > > > > > One possible solution I think is using column family for each table. > > > But as > > > per my knowledge and previous experiments, I found Hbase isn't stable > > > when > > > column families are more than 5. > > > > > > Since every day large quantities of data is ported into the DataBase, > > > stability and fail proof system is highest priority. > > > > > > Hoping for a positive response. > > > > > > Thank you, > > > kranthi > > > > > > -- > Kranthi Reddy. B > Room No : 98 > Old Boys Hostel > IIIT-HYD > > ----------- > > I don't know the key to success, but the key to failure is trying to impress > others. _________________________________________________________________ The New Busy think 9 to 5 is a cute idea. Combine multiple calendars with Hotmail. http://www.windowslive.com/campaign/thenewbusy?tile=multicalendar&ocid=PID28326::T:WLMTAGL:ON:WL:en-US:WM_HMP:042010_5
-
Re: Porting SQL DB into HBASEkranthi reddy 2010-04-13, 05:17
Hi all,
@Amandeep : The main reason for porting to Hbase is that it is an open source. Currently the NGO is paying high licensing fee for Microsoft Sql server. So in order to save money we planned to port to Hbase because of scalability for large datasets. @Jonathan : The problem is that these static tables can't be combined. Each table describes about different entities. For Eg: One static table might contain information about all the counties in a country. And another table might contain information all the doctors present in the country. That is the reason why I don't think it is possible to combine these static tables as they don't have any primary/foreign keys referencing others. The dynamic tables are pretty huge (small when compared to what Hbase can support). But these tables will be expanded and might contain upto 100 million in the coming future. Thank you, kranthi On Tue, Apr 13, 2010 at 12:17 AM, Michael Segel <[EMAIL PROTECTED]>wrote: > > > Just an idea, take a look at a hierarchical design like Pick. > I know its doable, but I don't know how well it will perform. > > > > Date: Mon, 12 Apr 2010 14:25:48 +0530 > > Subject: Re: Porting SQL DB into HBASE > > From: [EMAIL PROTECTED] > > To: [EMAIL PROTECTED] > > > > HI jonathan, > > > > Sorry for the late response. Missed your reply. > > > > The problem is, around 80% (400) of the tables are static tables and the > > remaining 20% (100) are dynamic tables that are updated on a daily basis. > > The problem is denormalising these 20% tables is also extremely difficult > > and we are planning to port them directly into hbase. And also > denormalising > > these tables would lead to a lot of redundant data. > > > > Static tables have number of entries varying in hundreds and mostly less > > than 1000 entries (rows). Where as the dynamic tables have more than > 20,000 > > entries and each entry might be updated/modified at least once in a week. > > > > Regards, > > kranthi > > > > > > On Wed, Mar 31, 2010 at 10:23 PM, Jonathan Gray <[EMAIL PROTECTED]> > wrote: > > > > > Kranthi, > > > > > > HBase can handle a good number of tables, but tens or maybe a hundred. > If > > > you have 500 tables you should definitely be rethinking your schema > design. > > > The issue is less about HBase being able to handle lots of tables, and > much > > > more about whether scattering your data across lots of tables will be > > > performant at read time. > > > > > > > > > 1) Impossible to answer that question without knowing the schemas of > the > > > existing tables. > > > > > > 2) Not really any relation between fault tolerance and the number of > > > tables except potentially for recovery time but this would be the same > with > > > few, very large tables. > > > > > > 3) No difference in write performance. Read performance if doing > simple > > > key lookups would not be impacted, but most like having data spread out > like > > > this will mean you'll need joins of some sort. > > > > > > Can you tell more about your data and queries? > > > > > > JG > > > > > > > -----Original Message----- > > > > From: kranthi reddy [mailto:[EMAIL PROTECTED]] > > > > Sent: Wednesday, March 31, 2010 3:05 AM > > > > To: [EMAIL PROTECTED] > > > > Subject: Porting SQL DB into HBASE > > > > > > > > Hi all, > > > > > > > > I have run into some trouble while trying to port SQL DB to > > > > Hbase. > > > > The problem is my SQL DB has around 500 tables (approx) and it is > very > > > > badly > > > > designed. Around 45-50 tables could be denormalised into a single > table > > > > and > > > > the remaining tables are static tables. My doubts are > > > > > > > > 1) Is it possible to port this DB (Tables) to Hbase? If possible how? > > > > 2) How many tables can Hbase support with tolerance towards failure? > > > > 3) When so many tables are inserted, how is the performance going to > be > > > > effected? Will it remain same or degrade? > > > > > > > > One possible solution I think is using column family for each table. Kranthi Reddy. B Room No : 98 Old Boys Hostel IIIT-HYD I don't know the key to success, but the key to failure is trying to impress others.
-
Re: Porting SQL DB into HBASEAmandeep Khurana 2010-04-13, 05:31
You are mentioning 2 different reasons:
Open source... Well, get MySQL.. Large datasets? The table sizes that you reported in the earlier mails dont seem to justify a move to HBase. Keep in mind - to run HBase stably in production you would ideally want to have atleast 10 nodes. And you will have no SQL available. Make sure you are aware of the trade-offs between HBase v/s RDBMS before you decide... Even 100 millions rows can be handled by a relational database if it is tuned properly. Amandeep Khurana Computer Science Graduate Student University of California, Santa Cruz On Mon, Apr 12, 2010 at 10:17 PM, kranthi reddy <[EMAIL PROTECTED]>wrote: > Hi all, > > > @Amandeep : The main reason for porting to Hbase is that it is an open > source. Currently the NGO is paying high licensing fee for Microsoft Sql > server. So in order to save money we planned to port to Hbase because of > scalability for large datasets. > > @Jonathan : The problem is that these static tables can't be combined. Each > table describes about different entities. For Eg: One static table might > contain information about all the counties in a country. And another table > might contain information all the doctors present in the country. > > That is the reason why I don't think it is possible to combine these static > tables as they don't have any primary/foreign keys referencing others. > > The dynamic tables are pretty huge (small when compared to what Hbase can > support). But these tables will be expanded and might contain upto 100 > million in the coming future. > > Thank you, > kranthi > > On Tue, Apr 13, 2010 at 12:17 AM, Michael Segel > <[EMAIL PROTECTED]>wrote: > > > > > > > Just an idea, take a look at a hierarchical design like Pick. > > I know its doable, but I don't know how well it will perform. > > > > > > > Date: Mon, 12 Apr 2010 14:25:48 +0530 > > > Subject: Re: Porting SQL DB into HBASE > > > From: [EMAIL PROTECTED] > > > To: [EMAIL PROTECTED] > > > > > > HI jonathan, > > > > > > Sorry for the late response. Missed your reply. > > > > > > The problem is, around 80% (400) of the tables are static tables and > the > > > remaining 20% (100) are dynamic tables that are updated on a daily > basis. > > > The problem is denormalising these 20% tables is also extremely > difficult > > > and we are planning to port them directly into hbase. And also > > denormalising > > > these tables would lead to a lot of redundant data. > > > > > > Static tables have number of entries varying in hundreds and mostly > less > > > than 1000 entries (rows). Where as the dynamic tables have more than > > 20,000 > > > entries and each entry might be updated/modified at least once in a > week. > > > > > > Regards, > > > kranthi > > > > > > > > > On Wed, Mar 31, 2010 at 10:23 PM, Jonathan Gray <[EMAIL PROTECTED]> > > wrote: > > > > > > > Kranthi, > > > > > > > > HBase can handle a good number of tables, but tens or maybe a > hundred. > > If > > > > you have 500 tables you should definitely be rethinking your schema > > design. > > > > The issue is less about HBase being able to handle lots of tables, > and > > much > > > > more about whether scattering your data across lots of tables will be > > > > performant at read time. > > > > > > > > > > > > 1) Impossible to answer that question without knowing the schemas of > > the > > > > existing tables. > > > > > > > > 2) Not really any relation between fault tolerance and the number of > > > > tables except potentially for recovery time but this would be the > same > > with > > > > few, very large tables. > > > > > > > > 3) No difference in write performance. Read performance if doing > > simple > > > > key lookups would not be impacted, but most like having data spread > out > > like > > > > this will mean you'll need joins of some sort. > > > > > > > > Can you tell more about your data and queries? > > > > > > > > JG > > > > > > > > > -----Original Message----- > > > > > From: kranthi reddy [mailto:[EMAIL PROTECTED]]
-
Re: Porting SQL DB into HBASEkranthi reddy 2010-04-14, 05:43
Hi Amandeep,
I get your point. But the situation is a bit more complex. I have tried to explain it a better way below. We have around 10 databases (Each may have 20-500 tables) which maintain information about the people of a state. Each database is used to maintain information for a different kind of service (like VAN DB maintains information about users who availed the facility through parked VANS, TELECOMMUNICATION DB maintains information about users who availed the facility through TELEPHONE). Now since a user can access the service through various services, he ended up having different ID's in each database. Now we plan to combine all these databases into a single database with one master table based on a few heuristics like username,date of birth (if username and date of birth for a person matches in different databases, it means that he is single user and all his information from different databases can be stored as one single entry) etc. The problem at hand is that since we have different databases, and since the data is increasing daily, it would be highly impossible to maintain and improve the system in future. Also we might end up losing track of the databases and information about a particular user. This was the reason why we were planning to use Hbase. Hope I am a bit more clearer now :) . Regards, kranthi On Tue, Apr 13, 2010 at 11:01 AM, Amandeep Khurana <[EMAIL PROTECTED]> wrote: > You are mentioning 2 different reasons: > > Open source... Well, get MySQL.. > > Large datasets? The table sizes that you reported in the earlier mails dont > seem to justify a move to HBase. Keep in mind - to run HBase stably in > production you would ideally want to have atleast 10 nodes. And you will > have no SQL available. Make sure you are aware of the trade-offs between > HBase v/s RDBMS before you decide... Even 100 millions rows can be handled > by a relational database if it is tuned properly. > > > Amandeep Khurana > Computer Science Graduate Student > University of California, Santa Cruz > > > On Mon, Apr 12, 2010 at 10:17 PM, kranthi reddy <[EMAIL PROTECTED] > >wrote: > > > Hi all, > > > > > > @Amandeep : The main reason for porting to Hbase is that it is an open > > source. Currently the NGO is paying high licensing fee for Microsoft Sql > > server. So in order to save money we planned to port to Hbase because of > > scalability for large datasets. > > > > @Jonathan : The problem is that these static tables can't be combined. > Each > > table describes about different entities. For Eg: One static table might > > contain information about all the counties in a country. And another > table > > might contain information all the doctors present in the country. > > > > That is the reason why I don't think it is possible to combine these > static > > tables as they don't have any primary/foreign keys referencing others. > > > > The dynamic tables are pretty huge (small when compared to what Hbase can > > support). But these tables will be expanded and might contain upto 100 > > million in the coming future. > > > > Thank you, > > kranthi > > > > On Tue, Apr 13, 2010 at 12:17 AM, Michael Segel > > <[EMAIL PROTECTED]>wrote: > > > > > > > > > > > Just an idea, take a look at a hierarchical design like Pick. > > > I know its doable, but I don't know how well it will perform. > > > > > > > > > > Date: Mon, 12 Apr 2010 14:25:48 +0530 > > > > Subject: Re: Porting SQL DB into HBASE > > > > From: [EMAIL PROTECTED] > > > > To: [EMAIL PROTECTED] > > > > > > > > HI jonathan, > > > > > > > > Sorry for the late response. Missed your reply. > > > > > > > > The problem is, around 80% (400) of the tables are static tables and > > the > > > > remaining 20% (100) are dynamic tables that are updated on a daily > > basis. > > > > The problem is denormalising these 20% tables is also extremely > > difficult > > > > and we are planning to port them directly into hbase. And also > > > denormalising > > > > these tables would lead to a lot of redundant data. Kranthi Reddy. B Room No : 98 Old Boys Hostel IIIT-HYD I don't know the key to success, but the key to failure is trying to impress others.
-
Re: Porting SQL DB into HBASEImran M Yousuf 2010-04-14, 06:03
On Mon, Apr 12, 2010 at 2:55 PM, kranthi reddy <[EMAIL PROTECTED]> wrote:
> > <snip /> > The problem is denormalising these 20% tables is also extremely difficult > and we are planning to port them directly into hbase. And also denormalising > these tables would lead to a lot of redundant data. > When denormalisation is been mentioned, it is implied having redundant data. The idea is as there is no join instead of doing N lookups (to replace N joins) keeping redundant data will allow you to do a single lookup and furthermore, HBase is great in scaling huge data sets. When I started reading http://wiki.apache.org/hadoop/Hbase/FAQ#A20 it helped me understand it further. Hope this helps. Best regards, -- Imran M Yousuf Entrepreneur & Software Engineer Smart IT Engineering Dhaka, Bangladesh Email: [EMAIL PROTECTED] Blog: http://imyousuf-tech.blogs.smartitengineering.com/ Mobile: +880-1711402557
-
Re: Porting SQL DB into HBASEkranthi reddy 2010-04-14, 09:08
Hi,
The amount of data being added is around 6-8GB per day. If we keep redundant data the size increases exponentially and we are expecting it to increase by atleast twice if not more. Eg: Table 1 has 50 columns with unique entries and suppose "Column X" is the primary key. Suppose we have Table 2 with 15 columns each with foreign key "Column X". If for an entry "Y" in Table 1, we have 15 entries in Table 2 with foreign key as "Y". Here we end up having 1 row in Table 1(50 cells filled) and 15 rows in Table 2(15*15=225 cells filled). If these 2 tables are denormalized, we end up with 15 rows having redundant data (15*50 cells + 15*15 cells = 975 cells filled). Hope my example is clear. Regards, kranthi On Wed, Apr 14, 2010 at 11:33 AM, Imran M Yousuf <[EMAIL PROTECTED]> wrote: > On Mon, Apr 12, 2010 at 2:55 PM, kranthi reddy <[EMAIL PROTECTED]> > wrote: > > > > <snip /> > > The problem is denormalising these 20% tables is also extremely difficult > > and we are planning to port them directly into hbase. And also > denormalising > > these tables would lead to a lot of redundant data. > > > > When denormalisation is been mentioned, it is implied having redundant > data. The idea is as there is no join instead of doing N lookups (to > replace N joins) keeping redundant data will allow you to do a single > lookup and furthermore, HBase is great in scaling huge data sets. > > When I started reading http://wiki.apache.org/hadoop/Hbase/FAQ#A20 it > helped me understand it further. > > Hope this helps. > > Best regards, > > -- > Imran M Yousuf > Entrepreneur & Software Engineer > Smart IT Engineering > Dhaka, Bangladesh > Email: [EMAIL PROTECTED] > Blog: http://imyousuf-tech.blogs.smartitengineering.com/ > Mobile: +880-1711402557 > -- Kranthi Reddy. B http://www.setusoftware.com/setu/index.htm
-
RE: Porting SQL DB into HBASEMichael Segel 2010-04-14, 12:30
> Date: Wed, 14 Apr 2010 12:03:56 +0600 > Subject: Re: Porting SQL DB into HBASE > From: [EMAIL PROTECTED] > To: [EMAIL PROTECTED] > > On Mon, Apr 12, 2010 at 2:55 PM, kranthi reddy <[EMAIL PROTECTED]> wrote: > > > > <snip /> > > The problem is denormalising these 20% tables is also extremely difficult > > and we are planning to port them directly into hbase. And also denormalising > > these tables would lead to a lot of redundant data. > > > > When denormalisation is been mentioned, it is implied having redundant > data. The idea is as there is no join instead of doing N lookups (to > replace N joins) keeping redundant data will allow you to do a single > lookup and furthermore, HBase is great in scaling huge data sets. > >From reading his last post, I suspect its less of an issue of denormalization than one of poor database design. Paraphrasing his example, he has one table for users who access his system by phone. He has one table for users who access the system by van. Without looking at his table structures, its hard to see why he can't combine the two and then have a single field to denote access type (phone, van, etc ...) Even if there are fields that are unique to phone and fields that are unique to van, it doesn't mean that they can't be null. Again, sometimes you have to look at alternatives to how you achieve your physical model of your database. If you have a parent/child relationship between data, you can easily use a hierarchical model like Pick (U2,Revelation, etc) Not that I'm really a fan of Dick Pick (RIP) but this model would fit within HBase and work well. (I should add a caveat on column width and table size, but that's a different issue) Going back to the problem the OP is having, he really needs to rethink his design. IMHO, I think one important issue that doesn't get addressed is thinking of your database as something more than a way to persist your objects. ;-) [And that is one thing that you debate at a bar, over beers (or your favorite beverage) :-) ] HTH -Mike _________________________________________________________________ The New Busy is not the too busy. Combine all your e-mail accounts with Hotmail. http://www.windowslive.com/campaign/thenewbusy?tile=multiaccount&ocid=PID28326::T:WLMTAGL:ON:WL:en-US:WM_HMP:042010_4 |