Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # dev >> Limited cross row transactions


Copy link to this message
-
Re: Limited cross row transactions
>
> one could have a table that hosts a parent child relationship in a single
> table, by prefixing all child child row keys with the parent row key,
> Now it is possible to presplit the table (or use a custom local balancer)
> so that child rows are always in the same region as the parent rows.
I thought BigTable/Megastore handled this kind of thing by putting
everything into a single row with the entity group id as the hbase rowKey.
 Then you add all parent and child values to the same hbase row by pushing
their original row keys into the qualifiers.  You build the qualifiers by
concatenating the table name with the original row key.

HBase should handle the arbitrarily wide rows and prevent the row from
splitting between regions.  Having the table name as a prefix of each
qualifier adds a lot of metadata, but good prefix compression should
eliminate that.
On Tue, Jan 17, 2012 at 4:57 PM, Ted Yu <[EMAIL PROTECTED]> wrote:

> >> so that child rows are always in the same region as the parent rows
> Should the user expect abnormal growth for certain parent(s) ?
>
> I think even HFile v2 has a limit on the file size beyond which operations
> would become less efficient.
>
> On Tue, Jan 17, 2012 at 4:48 PM, lars hofhansl <[EMAIL PROTECTED]>
> wrote:
>
> > Yes, it's hard constraint, but the building blocks are there.
> > User can disable automatic splitting and pre-split the table.
> >
> > For example one could have a table that hosts a parent child relationship
> > in a single table, by prefixing all child child row keys with the parent
> > row key,
> > Now it is possible to presplit the table (or use a custom local balancer)
> > so that child rows are always in the same region as the parent rows.
> > And then it would be possible to do cross parent/child transactions.
> >
> > Using the same scheme it is possible to do consistent parent/child
> indexes
> > (consistent indexes within the same parent prefix).
> > (I just made this up, but this is somewhat similar to the Megastore
> > design, I think)
> >
> >
> > Anyway, I set out asking whether this would be a useful endeavor, seems
> > the answer is resounding "maybe". :)
> >
> >
> > -- Lars
> >
> >
> >
> > ----- Original Message -----
> > From: Mikael Sitruk <[EMAIL PROTECTED]>
> > To: [EMAIL PROTECTED]
> > Cc:
> > Sent: Tuesday, January 17, 2012 3:07 PM
> > Subject: Re: Limited cross row transactions
> >
> > Well i understand the limitation now, asking to be in the same region is
> > really hard constraint.
> > Even if this is on the same RS this is not enough, because after a
> restart,
> > regions may be allocated differently and now part of the data may be in
> one
> > region under server A and the other part under server B.
> >
> > Well perhaps we need use case for better understanding, and perhaps
> finding
> > alternative.
> >
> > The first use case i was thinking of is as follow -
> > I need to insert data with different access criteria, but the data
> inserted
> > should be inserted in atomic way.
> > In RDBMS i would have two table, insert data in the first one with key#1
> > and then in the second one with key #2 then commit.
> > In HBase i need to use different column family with key #1 (for
> atomicity)
> > then to manage a kind of secondary index to map key#2 to key #1 (perhaps
> > via co-processor) to have quick access to the data of key#2.
> > Having cross row trx, i would think of sing different keys under the same
> > table (and probably different cf too), without the need to have secondary
> > index, but again with the limitation it does not seems to be easily
> > feasible.
> >
> > Mik.
> >
> > On Wed, Jan 18, 2012 at 12:22 AM, Ted Yu <[EMAIL PROTECTED]> wrote:
> >
> > > People rely on RDBMS for the transaction support.
> > >
> > > Consider the following example:
> > > A highly de-normalized schema puts related users in the same region
> where
> > > this 'limited cross row transactions' works.
> > > After some time, the region has to be split (maybe due to good business