Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase, mail # user - Getting less write throughput due to more number of columns


Copy link to this message
-
Re: Getting less write throughput due to more number of columns
Ted Yu 2013-03-28, 14:35
Prefix compression would lower the cost of storing value in rowkey.

It was inspired by long rowkey, short value schema design.

PREFIX and FAST_DIFF encodings are most often used.
Cheers

On Thu, Mar 28, 2013 at 7:26 AM, Pankaj Gupta <[EMAIL PROTECTED]> wrote:

> Would prefix compression (https://issues.apache.org/jira/browse/HBASE-4676)
> improve this?
>
> This is an important question in terms of schema design. Given the choice
> of storing a value in column vs rowkey, I would many times want to store a
> value in a rowkey if I foresee it being used for constraining lookups, even
> if that it is only a weak use case at the time of schema design. But, if
> there is substantial overhead in keeping values in row vs column then I
> would want to keep only the absolutely essential identifier in row. The
> overhead of storing values in rowkey influences the choice of what to store
> in rowkey.
>
> On Mar 25, 2013, at 11:28 PM, Anoop Sam John <[EMAIL PROTECTED]> wrote:
>
> > When the number of columns (qualifiers) are more yes it can impact the
> performance. In HBase every where the storage will be in terms of KVs. The
> key will be some thing like rowkey+cfname+columnname+TS...
> >
> > So when u have 26 cells in a put then there will be repetition of many
> bytes in the key.(One KV per column) So u will end up in transferring more
> data. Within memstore more data(actual KV data size) getting written and so
> more frequent flushes.. etc..
> >
> > Have a look at Intel Panthera Document Store impl.
> >
> > -Anoop-
> > ________________________________________
> > From: Ankit Jain [[EMAIL PROTECTED]]
> > Sent: Monday, March 25, 2013 10:19 PM
> > To: [EMAIL PROTECTED]
> > Subject: Getting less write throughput due to more number of columns
> >
> > Hi All,
> >
> > I am writing a records into HBase. I ran the performance test on
> following
> > two cases:
> >
> > Set1: Input record contains 26 columns and record size is 2Kb.
> >
> > Set2: Input record contain 1 column and record size is 2Kb.
> >
> > In second case I am getting 8MBps more performance than step.
> >
> > are the large number of columns have any impact on write performance and
> If
> > yes, how we can overcome it.
> >
> > --
> > Thanks,
> > Ankit Jain
>
>