Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> HBase value design


Copy link to this message
-
Re: HBase value design
On our project we store nested record structures with 10-40 fields. We have
decided to save on storage and write throughout by writing a serialized
avro record as value. We place one byte before to allow versioning. We did
it since each column is written with its rowkey, cq, cf and timestamp. Your
write throughput can be severely impacted if you write each field as a
column as Phoenix does.
We addressed the read partially: we do read the entire record since you
can't read part of the value yet, and only send the fields we need - this
was achieved using a coprocessor we wrote.

In your case if it's only two fields, I'm not I would bother and simply use
columns.

We have plans to open source the query layer but it will only happen in
2014 :)

On Thursday, November 28, 2013, Amit Sela wrote:

> I am using some sort of schema that allows me to expand my data blob if
> needed.
> However, I'm considering testing Phoenix (or maybe prestoDB once it gets an
> HBase connector) and I was wondering if the common practice is "simple
> type" values and not data blobs because I saw that Phoenix doesn't support
> data blob values.
>
> What does it mean "If there is a possibility a new member would be added to
> the tuple" ?
>
> Thanks.
>
>
>
> On Thu, Nov 28, 2013 at 5:22 PM, Ted Yu <[EMAIL PROTECTED]<javascript:;>>
> wrote:
>
> > Amit:
> > In your example you use Writable for serialization.
> > In 0.96 and beyond, protobuf is used in place of Writable.
> >
> > If there is a possibility a new member would be added to the tuple,
> > consider using some scheme that allows the expansion.
> >
> > Please take a look at this as well:
> > HBASE-8089 Add type support
> >
> > Cheers
> >
> >
> > On Thu, Nov 28, 2013 at 5:17 AM, Jean-Marc Spaggiari <
> > [EMAIL PROTECTED] <javascript:;>> wrote:
> >
> > > Hi Amit,
> > >
> > > It all depends on your usecase ;)
> > >
> > > If you always access countIn and countFloat when you access a value,
> then
> > > put them together to avoid to have to do 2 calls or a scan or a
> multiget.
> > > But if you never access them together, you might want to separate them
> to
> > > reduce RCP transfert, etc.
> > >
> > >
> > > JM
> > >
> > >
> > > 2013/11/28 Amit Sela <[EMAIL PROTECTED] <javascript:;>>
> > >
> > > > There are a lot of discussions here regarding the row design but I
> > have a
> > > > question about the value design:
> > > >
> > > > Say I have a table t1 with rows r1,r2...rn and family f.
> > > > I also have qualifiers q1,q2...,qm
> > > >
> > > > For each (ri,fi,qi) tuple I want to store a value vi that is a data
> > blob
> > > > that implements Writable and has two members:
> > > > Integer countInt
> > > > Float countFloat
> > > >
> > > > Would you change the design so that I'll have 2m qualifiers i.e.,
> > > > q1_countInt and q1_countFloat etc.
> > > > with IntWritable and FloatWritable values (respectively) ? or stay
> with
> > > the
> > > > data blob ?
> > > >
> > > > Thanks,
> > > >
> > > > Amit.
> > > >
> > >
> >
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB