Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
HBase >> mail # user >> Getting less write throughput due to more number of columns


+
Ankit Jain 2013-03-25, 16:49
+
Anoop Sam John 2013-03-26, 06:28
+
Pankaj Gupta 2013-03-28, 14:26
Copy link to this message
-
Re: Getting less write throughput due to more number of columns
Prefix compression would lower the cost of storing value in rowkey.

It was inspired by long rowkey, short value schema design.

PREFIX and FAST_DIFF encodings are most often used.
Cheers

On Thu, Mar 28, 2013 at 7:26 AM, Pankaj Gupta <[EMAIL PROTECTED]> wrote:

> Would prefix compression (https://issues.apache.org/jira/browse/HBASE-4676)
> improve this?
>
> This is an important question in terms of schema design. Given the choice
> of storing a value in column vs rowkey, I would many times want to store a
> value in a rowkey if I foresee it being used for constraining lookups, even
> if that it is only a weak use case at the time of schema design. But, if
> there is substantial overhead in keeping values in row vs column then I
> would want to keep only the absolutely essential identifier in row. The
> overhead of storing values in rowkey influences the choice of what to store
> in rowkey.
>
> On Mar 25, 2013, at 11:28 PM, Anoop Sam John <[EMAIL PROTECTED]> wrote:
>
> > When the number of columns (qualifiers) are more yes it can impact the
> performance. In HBase every where the storage will be in terms of KVs. The
> key will be some thing like rowkey+cfname+columnname+TS...
> >
> > So when u have 26 cells in a put then there will be repetition of many
> bytes in the key.(One KV per column) So u will end up in transferring more
> data. Within memstore more data(actual KV data size) getting written and so
> more frequent flushes.. etc..
> >
> > Have a look at Intel Panthera Document Store impl.
> >
> > -Anoop-
> > ________________________________________
> > From: Ankit Jain [[EMAIL PROTECTED]]
> > Sent: Monday, March 25, 2013 10:19 PM
> > To: [EMAIL PROTECTED]
> > Subject: Getting less write throughput due to more number of columns
> >
> > Hi All,
> >
> > I am writing a records into HBase. I ran the performance test on
> following
> > two cases:
> >
> > Set1: Input record contains 26 columns and record size is 2Kb.
> >
> > Set2: Input record contain 1 column and record size is 2Kb.
> >
> > In second case I am getting 8MBps more performance than step.
> >
> > are the large number of columns have any impact on write performance and
> If
> > yes, how we can overcome it.
> >
> > --
> > Thanks,
> > Ankit Jain
>
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB