Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> RDBMS to HBASE schema migration


Copy link to this message
-
Re: RDBMS to HBASE schema migration

Hi there-

I commend your enthusiasm for the Hbase project.  For the "ground rules of
Hbase" you probably want to read this closelyŠ

http://hbase.apache.org/book.html#datamodel

Š as it covers things like having one PK per table, no secondary indexes,
etc.  With a solid understanding of these rules the table relationship and
search questions you're asking will become clearer.
On 6/28/12 3:01 PM, "grashmi13" <[EMAIL PROTECTED]> wrote:

>
>Assets table is having numeric sequential ID and a one number out of
>(1,2,3,4,5,6,7, 8, 9, 10) for AssetName. This is a master table with say
>10
>rows only.
>
>
>hmmm.. after some more surfing, i came to know that we have to manually
>denormalize a relational DB. there are no preset rules for
>denormalization.
>
>today I was doing a denormalization.. can you please confirm if rdbms to
>hbase change i have done is correct or not:
>These tables are from stock trading application:
>
>1. I have two tables...
>Assets
>=========
>AssetID | assetName
>
>
>second table is:
>
>INSTRUMENTS
>============
>InstrumentID | assetID | Symbol | name | created | modifiedAt
>
>
>While converting them into Hbase two questions were there in my mind:
>1. Will there be two tables or one? If we have a web GUI and have two
>screens, one for assets management and one for instuments management. In
>instrument management, there would be a combo having assets. An asset can
>be
>assigned to multiple instruments.
>
>So if user will first populate all assets thru assets screen. So there
>should be one master table for Assets. because it is not dependent on any
>other entity. Am I right here?
>
>And now user will populate instruments thru instrument management screen.
>Here another table would come into picture. "INSTRUMENTS". this table
>would
>be denormalized, as hbase doesnt support JOIN. Also, to maintain ACID
>properties, all assets information should be duplicated here too. Now
>question here, if I delete or update some asset in future, and there are
>corresponding records in INSTRUMENTS table. what if asset is deletion or
>updation goes successful for assets table but fails for instruments table?
>How we can maintain consistancy and integrity here?
>
>
>HBASE table conversion here:
>
>ASSETS
>=================>row key - AssetID
>================
>Columns: CF1 - AssetName
>===============================================>
>INSTRUMENTS
>====================================>row key-- instrumentID(a sequential numeric value)
>===================================>InstrumentInfo(family) -
>Symbol
>name
>AssetsInfo(family2) -
>AssetID
>AssetName
>====================================>
>
>Please confirm, two tables would be required or one only? If one only, how
>assets population and after that instruments population would be possible?
>If two, how we can maintain integrity and consistancy during assets
>deletion\updation?
>
>Please confirm if my approach is correct?
>
>Yes, as read everywhere, in hbase, we must think about purpose of data in
>our application before creating tables. So purpose here is, to
>populate\delete\update assets and instruments thru web UI and then run a
>job
>which extract instruments by asset name. Also, assets by instrument name.
>
>=====================================>
>After above conversion, I got stuck on one more point. "SELF-JOIN"
>
>Lets say, I have a table InstrumentsStock having 10 columns in it, out of
>it
>one primary key stockID and other normal column BaseCurrencyID, both
>points
>to an instrumentID.
>
>StockID is an ID of an instrument whose assetID is 1. and BaseCurrencyID
>is
>ID of an instrument whose assetID is 3.
>
>InstrumentsStock
>======================>stockID | AssetID | BaseCurrencyID | 10 more columns here
>=======================================>
>where StockID = InstrumentID of Instrument with AssetID 1
>and BaseCurrencyID = InstrumentID of Instrument with AssetID 3
>
>StockID is primary key
>
>
>E.g. I have below data in tables:
>Instruments:
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB