Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> Long row + column keys


Copy link to this message
-
Re: Long row + column keys
Hi Anoop,

I agree - I am not so concerned about the savings on disk - rather I am
thinking about the savings inside the block cache. I am not sure how stable
PrefixDeltaEncoding is and who else uses it. If not, are there people using
FastDiff encoding - it seems like any form of encoding scheme would get us
huge wins.

Thanks !
Varun

On Mon, Dec 3, 2012 at 8:23 PM, Anoop Sam John <[EMAIL PROTECTED]> wrote:

> Hi Varun
>                  It looks to be very clear that you need to use some sort
> of encoding scheme.  PrefixDeltaEncoding would be fine may be..  You can
> see the other algos also like the FastDiff...  and see how much space it
> can save in your case. Also suggest you can use the encoding for data on
> disk as well as in memory (block cache)
> >The total key size, as far as i know, would be 8 + 12 + 8 (timestamp) > 28 bytes
> In every KV that is getting stored the key size would be
> 4(key length) + 4(value length) + 2(rowkey length) + 8(rowkey) + 1[cf
> length] + 12(cf + qualifer) + 8(timestamp) + 1( type PUT/DELETE...)  +
> value (0 bytes???? atleast 1 byte right) = 39+  bytes...
>
> Just making it clear for you :)
>
> -Anoop-
> ________________________________________
> From: Varun Sharma [[EMAIL PROTECTED]]
> Sent: Tuesday, December 04, 2012 2:36 AM
> To: Marcos Ortiz
> Cc: [EMAIL PROTECTED]
> Subject: Re: Long row + column keys
>
> Hi Marcos,
>
> Thanks for the links. We have gone through these and thought about the
> schema. My question is about whether using PrefixDeltaEncoding makes sense
> in our situation...
>
> Varun
>
> On Mon, Dec 3, 2012 at 12:36 PM, Marcos Ortiz <[EMAIL PROTECTED]> wrote:
>
> > Regards, Varun.
> > I think that you can see the Bernoit Sigoure (@tsuna)愀 talk called
> > "Lessons learned from OpenTSDB" in the last
> > HBaseCon . [1]
> > He explained in great detail how to design your schema to obtain the best
> > performance from HBase.
> >
> > Other recommended talks are: "HBase Internals" from Lars, and "HBase
> > Schema Design" from Ian
> > [2][3]
> >
> > [1] http://www.slideshare.net/**cloudera/4-opentsdb-hbasecon<
> http://www.slideshare.net/cloudera/4-opentsdb-hbasecon>
> > [2] http://www.slideshare.net/**cloudera/3-learning-h-base-**
> > internals-lars-hofhansl-**salesforce-final/<
> http://www.slideshare.net/cloudera/3-learning-h-base-internals-lars-hofhansl-salesforce-final/
> >
> > [3] http://www.slideshare.net/**cloudera/5-h-base-**schemahbasecon2012<
> http://www.slideshare.net/cloudera/5-h-base-schemahbasecon2012>
> >
> >
> > On 12/03/2012 02:58 PM, Varun Sharma wrote:
> >
> >> Hi,
> >>
> >> I have a schema where the rows are 8 bytes long and the columns are 12
> >> bytes long (roughly 1000 columns per row). The value is 0 bytes. Is this
> >> going to be space inefficient in terms of HFile size (large index +
> >> blocks)
> >> ? The total key size, as far as i know, would be 8 + 12 + 8 (timestamp)
> > >> 28 bytes. I am using hbase 0.94.0 which has HFile v2.
> >>
> > Yes, like you said, HFile v2 is included in 0.94, but although is in
> trunk
> > right now, your should
> > keep following the development of HBase, focused on HBASE-5313 and
> > HBASE-5521, because
> > the development team is working in a new file storage format called HFile
> > v3, based on a columnar
> > format called Trevni for Avro by Dug Cutting.[4][5][6][7]
> >
> >
> > [4] https://issues.apache.org/**jira/browse/HBASE-5313<
> https://issues.apache.org/jira/browse/HBASE-5313>
> > [5] https://issues.apache.org/**jira/browse/HBASE-5521<
> https://issues.apache.org/jira/browse/HBASE-5521>
> > [6] https://github.com/cutting/**trevni<
> https://github.com/cutting/trevni>
> > [7] https://issues.apache.org/**jira/browse/AVRO-806<
> https://issues.apache.org/jira/browse/AVRO-806>
> >
> >
> >
> >
> >> Also, should I be using an encoding technique to get the number of bytes
> >> down (like PrefixDeltaEncoding) which is provided by hbase ?
> >>
> > Read the Cloudera愀 blog post called "HBase I/O - HFile" to see how Prefix