Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> RDBMS to HBASE schema migration

Copy link to this message
Re: RDBMS to HBASE schema migration

Hi there-

I commend your enthusiasm for the Hbase project.  For the "ground rules of
Hbase" you probably want to read this closelyŠ


Š as it covers things like having one PK per table, no secondary indexes,
etc.  With a solid understanding of these rules the table relationship and
search questions you're asking will become clearer.
On 6/28/12 3:01 PM, "grashmi13" <[EMAIL PROTECTED]> wrote:

>Assets table is having numeric sequential ID and a one number out of
>(1,2,3,4,5,6,7, 8, 9, 10) for AssetName. This is a master table with say
>rows only.
>hmmm.. after some more surfing, i came to know that we have to manually
>denormalize a relational DB. there are no preset rules for
>today I was doing a denormalization.. can you please confirm if rdbms to
>hbase change i have done is correct or not:
>These tables are from stock trading application:
>1. I have two tables...
>AssetID | assetName
>second table is:
>InstrumentID | assetID | Symbol | name | created | modifiedAt
>While converting them into Hbase two questions were there in my mind:
>1. Will there be two tables or one? If we have a web GUI and have two
>screens, one for assets management and one for instuments management. In
>instrument management, there would be a combo having assets. An asset can
>assigned to multiple instruments.
>So if user will first populate all assets thru assets screen. So there
>should be one master table for Assets. because it is not dependent on any
>other entity. Am I right here?
>And now user will populate instruments thru instrument management screen.
>Here another table would come into picture. "INSTRUMENTS". this table
>be denormalized, as hbase doesnt support JOIN. Also, to maintain ACID
>properties, all assets information should be duplicated here too. Now
>question here, if I delete or update some asset in future, and there are
>corresponding records in INSTRUMENTS table. what if asset is deletion or
>updation goes successful for assets table but fails for instruments table?
>How we can maintain consistancy and integrity here?
>HBASE table conversion here:
>=================>row key - AssetID
>Columns: CF1 - AssetName
>====================================>row key-- instrumentID(a sequential numeric value)
>===================================>InstrumentInfo(family) -
>AssetsInfo(family2) -
>Please confirm, two tables would be required or one only? If one only, how
>assets population and after that instruments population would be possible?
>If two, how we can maintain integrity and consistancy during assets
>Please confirm if my approach is correct?
>Yes, as read everywhere, in hbase, we must think about purpose of data in
>our application before creating tables. So purpose here is, to
>populate\delete\update assets and instruments thru web UI and then run a
>which extract instruments by asset name. Also, assets by instrument name.
>After above conversion, I got stuck on one more point. "SELF-JOIN"
>Lets say, I have a table InstrumentsStock having 10 columns in it, out of
>one primary key stockID and other normal column BaseCurrencyID, both
>to an instrumentID.
>StockID is an ID of an instrument whose assetID is 1. and BaseCurrencyID
>ID of an instrument whose assetID is 3.
>======================>stockID | AssetID | BaseCurrencyID | 10 more columns here
>where StockID = InstrumentID of Instrument with AssetID 1
>and BaseCurrencyID = InstrumentID of Instrument with AssetID 3
>StockID is primary key
>E.g. I have below data in tables: