Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> Porting SQL DB into HBASE


Copy link to this message
-
Re: Porting SQL DB into HBASE
You are mentioning 2 different reasons:

Open source... Well, get MySQL..

Large datasets? The table sizes that you reported in the earlier mails dont
seem to justify a move to HBase. Keep in mind - to run HBase stably in
production you would ideally want to have atleast 10 nodes. And you will
have no SQL available. Make sure you are aware of the trade-offs between
HBase v/s RDBMS before you decide... Even 100 millions rows can be handled
by a relational database if it is tuned properly.
Amandeep Khurana
Computer Science Graduate Student
University of California, Santa Cruz
On Mon, Apr 12, 2010 at 10:17 PM, kranthi reddy <[EMAIL PROTECTED]>wrote:

> Hi all,
>
>
> @Amandeep : The main reason for porting to Hbase is that it is an open
> source. Currently the NGO is paying high licensing fee for Microsoft Sql
> server. So in order to save money we planned to port to Hbase because of
> scalability for large datasets.
>
> @Jonathan : The problem is that these static tables can't be combined. Each
> table describes about different entities. For Eg: One static table might
> contain information about all the counties in a country. And another table
> might contain information all the doctors present in the country.
>
> That is the reason why I don't think it is possible to combine these static
> tables as they don't have any primary/foreign keys referencing others.
>
> The dynamic tables are pretty huge (small when compared to what Hbase can
> support). But these tables will be expanded and might contain upto 100
> million in the coming future.
>
> Thank you,
> kranthi
>
> On Tue, Apr 13, 2010 at 12:17 AM, Michael Segel
> <[EMAIL PROTECTED]>wrote:
>
> >
> >
> > Just an idea, take a look at a hierarchical design like Pick.
> > I know its doable, but I don't know how well it will perform.
> >
> >
> > > Date: Mon, 12 Apr 2010 14:25:48 +0530
> > > Subject: Re: Porting SQL DB into HBASE
> > > From: [EMAIL PROTECTED]
> > > To: [EMAIL PROTECTED]
> > >
> > > HI jonathan,
> > >
> > > Sorry for the late response. Missed your reply.
> > >
> > > The problem is, around 80% (400) of the tables are static tables and
> the
> > > remaining 20% (100) are dynamic tables that are updated on a daily
> basis.
> > > The problem is denormalising these 20% tables is also extremely
> difficult
> > > and we are planning to port them directly into hbase. And also
> > denormalising
> > > these tables would lead to a lot of redundant data.
> > >
> > > Static tables have number of entries varying in hundreds and mostly
> less
> > > than 1000 entries (rows). Where as the dynamic tables have more than
> > 20,000
> > > entries and each entry might be updated/modified at least once in a
> week.
> > >
> > > Regards,
> > > kranthi
> > >
> > >
> > > On Wed, Mar 31, 2010 at 10:23 PM, Jonathan Gray <[EMAIL PROTECTED]>
> > wrote:
> > >
> > > > Kranthi,
> > > >
> > > > HBase can handle a good number of tables, but tens or maybe a
> hundred.
> >  If
> > > > you have 500 tables you should definitely be rethinking your schema
> > design.
> > > >  The issue is less about HBase being able to handle lots of tables,
> and
> > much
> > > > more about whether scattering your data across lots of tables will be
> > > > performant at read time.
> > > >
> > > >
> > > > 1)  Impossible to answer that question without knowing the schemas of
> > the
> > > > existing tables.
> > > >
> > > > 2)  Not really any relation between fault tolerance and the number of
> > > > tables except potentially for recovery time but this would be the
> same
> > with
> > > > few, very large tables.
> > > >
> > > > 3)  No difference in write performance.  Read performance if doing
> > simple
> > > > key lookups would not be impacted, but most like having data spread
> out
> > like
> > > > this will mean you'll need joins of some sort.
> > > >
> > > > Can you tell more about your data and queries?
> > > >
> > > > JG
> > > >
> > > > > -----Original Message-----
> > > > > From: kranthi reddy [mailto:[EMAIL PROTECTED]]
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB