Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase, mail # user - hbase schema design


Copy link to this message
-
Re: hbase schema design
Ted Yu 2013-09-17, 17:53
w.r.t. Data Block Encoding, you can find some performance numbers here:

https://issues.apache.org/jira/browse/HBASE-4218?focusedCommentId=13123337&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13123337
On Tue, Sep 17, 2013 at 10:49 AM, Adrian CAPDEFIER
<[EMAIL PROTECTED]>wrote:

> Thank you for confirming the rowkey is written for every cell value (I was
> referring to 6.3.2 indeed). I have looked into data block encoding, but I'm
> not sure that would help me (more so if I need to link this table to a
> separate table later on).
>
> I will look into the surrogate value option.
>
>
>
>
> On Tue, Sep 17, 2013 at 5:53 PM, Ted Yu <[EMAIL PROTECTED]> wrote:
>
> > I guess you were referring to section 6.3.2
> >
> > bq. rowkey is stored and/ or read for every cell value
> >
> > The above is true.
> >
> > bq. the event description is a string of 0.1 to 2Kb
> >
> > You can enable Data Block encoding to reduce storage.
> >
> > Cheers
> >
> >
> >
> > On Tue, Sep 17, 2013 at 9:44 AM, Adrian CAPDEFIER <
> [EMAIL PROTECTED]
> > >wrote:
> >
> > > Howdy all,
> > >
> > > I'm trying to use hbase for the first time (plenty of other experience
> > with
> > > RDBMS database though), and I have a couple of questions after reading
> > The
> > > Book.
> > >
> > > I am a bit confused by the advice to reduce "the row size" in the hbase
> > > book. It states that every cell value is accomplished by the
> coordinates
> > > (row, column and timestamp). I'm just trying to be thorough, so am I to
> > > understand that the rowkey is stored and/ or read for every cell value
> > in a
> > > record or just once per column family in a record?
> > >
> > > I am intrigued by the rows as columns design as described in the book
> at
> > > http://hbase.apache.org/book.html#rowkey.design. To make a long story
> > > short, I will end up with a table to store event types and number of
> > > occurrences in each day. I would prefer to have the event description
> as
> > > the row key and the dates when it happened as columns - up to 7300 for
> > > roughly 20 years.
> > > However, the event description is a string of 0.1 to 2Kb and if it is
> > > stored for each cell value, I will need to use a surrogate (shorter)
> > value.
> > >
> > > Is there a built-in functionality to generate (integer) surrogate
> values
> > in
> > > hbase that can be used on the rowkey or does it need to be hand code it
> > > from scratch?
> > >
> >
>