Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> hbase schema design


Copy link to this message
-
Re: hbase schema design

Don't forget to look at this section for hbase schema design examples.

http://hbase.apache.org/book.html#schema.casestudies
 

On 9/17/13 1:52 PM, "Adrian CAPDEFIER" <[EMAIL PROTECTED]> wrote:

>Thanks for the tip. In the data warehousing world I used to call them
>surrogate keys - I wonder if there's any difference between the two.
>
>
>On Tue, Sep 17, 2013 at 6:41 PM, Vladimir Rodionov
><[EMAIL PROTECTED]>wrote:
>
>> > Is there a built-in functionality to generate (integer) surrogate
>>values
>> in
>> > hbase that can be used on the rowkey or does it need to be hand code
>>it
>> > from scratch?
>>
>> There is no such functionality in HBase. What are asking for is known
>>as a
>> dictionary compression :
>> unique 1-1 association between arbitrary strings and numeric values.
>>
>> Best regards,
>> Vladimir Rodionov
>> Principal Platform Engineer
>> Carrier IQ, www.carrieriq.com
>> e-mail: [EMAIL PROTECTED]
>>
>> ________________________________________
>> From: Ted Yu [[EMAIL PROTECTED]]
>> Sent: Tuesday, September 17, 2013 9:53 AM
>> To: [EMAIL PROTECTED]
>> Subject: Re: hbase schema design
>>
>> I guess you were referring to section 6.3.2
>>
>> bq. rowkey is stored and/ or read for every cell value
>>
>> The above is true.
>>
>> bq. the event description is a string of 0.1 to 2Kb
>>
>> You can enable Data Block encoding to reduce storage.
>>
>> Cheers
>>
>>
>>
>> On Tue, Sep 17, 2013 at 9:44 AM, Adrian CAPDEFIER
>><[EMAIL PROTECTED]
>> >wrote:
>>
>> > Howdy all,
>> >
>> > I'm trying to use hbase for the first time (plenty of other experience
>> with
>> > RDBMS database though), and I have a couple of questions after reading
>> The
>> > Book.
>> >
>> > I am a bit confused by the advice to reduce "the row size" in the
>>hbase
>> > book. It states that every cell value is accomplished by the
>>coordinates
>> > (row, column and timestamp). I'm just trying to be thorough, so am I
>>to
>> > understand that the rowkey is stored and/ or read for every cell value
>> in a
>> > record or just once per column family in a record?
>> >
>> > I am intrigued by the rows as columns design as described in the book
>>at
>> > http://hbase.apache.org/book.html#rowkey.design. To make a long story
>> > short, I will end up with a table to store event types and number of
>> > occurrences in each day. I would prefer to have the event description
>>as
>> > the row key and the dates when it happened as columns - up to 7300 for
>> > roughly 20 years.
>> > However, the event description is a string of 0.1 to 2Kb and if it is
>> > stored for each cell value, I will need to use a surrogate (shorter)
>> value.
>> >
>> > Is there a built-in functionality to generate (integer) surrogate
>>values
>> in
>> > hbase that can be used on the rowkey or does it need to be hand code
>>it
>> > from scratch?
>> >
>>
>> Confidentiality Notice:  The information contained in this message,
>> including any attachments hereto, may be confidential and is intended
>>to be
>> read only by the individual or entity to whom this message is
>>addressed. If
>> the reader of this message is not the intended recipient or an agent or
>> designee of the intended recipient, please note that any review, use,
>> disclosure or distribution of this message or its attachments, in any
>>form,
>> is strictly prohibited.  If you have received this message in error,
>>please
>> immediately notify the sender and/or [EMAIL PROTECTED] and
>> delete or destroy any copy of this message and its attachments.
>>