Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase, mail # user - RDBMS to HBASE schema migration


Copy link to this message
-
Re: RDBMS to HBASE schema migration
Doug Meil 2012-06-28, 20:54

Hi there-

I commend your enthusiasm for the Hbase project.  For the "ground rules of
Hbase" you probably want to read this closelyŠ

http://hbase.apache.org/book.html#datamodel

Š as it covers things like having one PK per table, no secondary indexes,
etc.  With a solid understanding of these rules the table relationship and
search questions you're asking will become clearer.
On 6/28/12 3:01 PM, "grashmi13" <[EMAIL PROTECTED]> wrote:

>
>Assets table is having numeric sequential ID and a one number out of
>(1,2,3,4,5,6,7, 8, 9, 10) for AssetName. This is a master table with say
>10
>rows only.
>
>
>hmmm.. after some more surfing, i came to know that we have to manually
>denormalize a relational DB. there are no preset rules for
>denormalization.
>
>today I was doing a denormalization.. can you please confirm if rdbms to
>hbase change i have done is correct or not:
>These tables are from stock trading application:
>
>1. I have two tables...
>Assets
>=========
>AssetID | assetName
>
>
>second table is:
>
>INSTRUMENTS
>============
>InstrumentID | assetID | Symbol | name | created | modifiedAt
>
>
>While converting them into Hbase two questions were there in my mind:
>1. Will there be two tables or one? If we have a web GUI and have two
>screens, one for assets management and one for instuments management. In
>instrument management, there would be a combo having assets. An asset can
>be
>assigned to multiple instruments.
>
>So if user will first populate all assets thru assets screen. So there
>should be one master table for Assets. because it is not dependent on any
>other entity. Am I right here?
>
>And now user will populate instruments thru instrument management screen.
>Here another table would come into picture. "INSTRUMENTS". this table
>would
>be denormalized, as hbase doesnt support JOIN. Also, to maintain ACID
>properties, all assets information should be duplicated here too. Now
>question here, if I delete or update some asset in future, and there are
>corresponding records in INSTRUMENTS table. what if asset is deletion or
>updation goes successful for assets table but fails for instruments table?
>How we can maintain consistancy and integrity here?
>
>
>HBASE table conversion here:
>
>ASSETS
>=================>row key - AssetID
>================
>Columns: CF1 - AssetName
>===============================================>
>INSTRUMENTS
>====================================>row key-- instrumentID(a sequential numeric value)
>===================================>InstrumentInfo(family) -
>Symbol
>name
>AssetsInfo(family2) -
>AssetID
>AssetName
>====================================>
>
>Please confirm, two tables would be required or one only? If one only, how
>assets population and after that instruments population would be possible?
>If two, how we can maintain integrity and consistancy during assets
>deletion\updation?
>
>Please confirm if my approach is correct?
>
>Yes, as read everywhere, in hbase, we must think about purpose of data in
>our application before creating tables. So purpose here is, to
>populate\delete\update assets and instruments thru web UI and then run a
>job
>which extract instruments by asset name. Also, assets by instrument name.
>
>=====================================>
>After above conversion, I got stuck on one more point. "SELF-JOIN"
>
>Lets say, I have a table InstrumentsStock having 10 columns in it, out of
>it
>one primary key stockID and other normal column BaseCurrencyID, both
>points
>to an instrumentID.
>
>StockID is an ID of an instrument whose assetID is 1. and BaseCurrencyID
>is
>ID of an instrument whose assetID is 3.
>
>InstrumentsStock
>======================>stockID | AssetID | BaseCurrencyID | 10 more columns here
>=======================================>
>where StockID = InstrumentID of Instrument with AssetID 1
>and BaseCurrencyID = InstrumentID of Instrument with AssetID 3
>
>StockID is primary key
>
>
>E.g. I have below data in tables:
>Instruments: