Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> hbase schema design


Copy link to this message
-
Re: hbase schema design
Different from the RDBMS, the data in HBase is stored as key-value pair in
HDFS. Hence, for every data version in a cell, the row key will appear.
On Tue, Sep 17, 2013 at 7:53 PM, Ted Yu <[EMAIL PROTECTED]> wrote:

> w.r.t. Data Block Encoding, you can find some performance numbers here:
>
>
> https://issues.apache.org/jira/browse/HBASE-4218?focusedCommentId=13123337&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13123337
>
>
> On Tue, Sep 17, 2013 at 10:49 AM, Adrian CAPDEFIER
> <[EMAIL PROTECTED]>wrote:
>
> > Thank you for confirming the rowkey is written for every cell value (I
> was
> > referring to 6.3.2 indeed). I have looked into data block encoding, but
> I'm
> > not sure that would help me (more so if I need to link this table to a
> > separate table later on).
> >
> > I will look into the surrogate value option.
> >
> >
> >
> >
> > On Tue, Sep 17, 2013 at 5:53 PM, Ted Yu <[EMAIL PROTECTED]> wrote:
> >
> > > I guess you were referring to section 6.3.2
> > >
> > > bq. rowkey is stored and/ or read for every cell value
> > >
> > > The above is true.
> > >
> > > bq. the event description is a string of 0.1 to 2Kb
> > >
> > > You can enable Data Block encoding to reduce storage.
> > >
> > > Cheers
> > >
> > >
> > >
> > > On Tue, Sep 17, 2013 at 9:44 AM, Adrian CAPDEFIER <
> > [EMAIL PROTECTED]
> > > >wrote:
> > >
> > > > Howdy all,
> > > >
> > > > I'm trying to use hbase for the first time (plenty of other
> experience
> > > with
> > > > RDBMS database though), and I have a couple of questions after
> reading
> > > The
> > > > Book.
> > > >
> > > > I am a bit confused by the advice to reduce "the row size" in the
> hbase
> > > > book. It states that every cell value is accomplished by the
> > coordinates
> > > > (row, column and timestamp). I'm just trying to be thorough, so am I
> to
> > > > understand that the rowkey is stored and/ or read for every cell
> value
> > > in a
> > > > record or just once per column family in a record?
> > > >
> > > > I am intrigued by the rows as columns design as described in the book
> > at
> > > > http://hbase.apache.org/book.html#rowkey.design. To make a long
> story
> > > > short, I will end up with a table to store event types and number of
> > > > occurrences in each day. I would prefer to have the event description
> > as
> > > > the row key and the dates when it happened as columns - up to 7300
> for
> > > > roughly 20 years.
> > > > However, the event description is a string of 0.1 to 2Kb and if it is
> > > > stored for each cell value, I will need to use a surrogate (shorter)
> > > value.
> > > >
> > > > Is there a built-in functionality to generate (integer) surrogate
> > values
> > > in
> > > > hbase that can be used on the rowkey or does it need to be hand code
> it
> > > > from scratch?
> > > >
> > >
> >
>