Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> HBase Writes With Large Number of Columns


Copy link to this message
-
Re: HBase Writes With Large Number of Columns
>From http://hbase.apache.org/book.html#hbase.rpc :

Optionally, Cells(KeyValues) can be passed outside of protobufs in
follow-behind Cell blocks (because we can’t protobuf megabytes of
KeyValues<https://docs.google.com/document/d/1WEtrq-JTIUhlnlnvA0oYRLp0F8MKpEBeBSCFcQiacdw/edit#>
or
Cells). These CellBlocks are encoded and optionally compressed.

>From IPCUtil, you should find this:

  ByteBuffer buildCellBlock(final Codec codec, final CompressionCodec
compressor,

      final CellScanner cells)
Cheers

On Wed, Mar 27, 2013 at 3:28 PM, Asaf Mesika <[EMAIL PROTECTED]> wrote:

> CellBlock == KeyValue?
>
>
> On Thu, Mar 28, 2013 at 12:06 AM, Ted Yu <[EMAIL PROTECTED]> wrote:
>
> > For 0.95 and beyond, HBaseClient is able to specify codec classes that
> > encode / compress CellBlock.
> > See the following in HBaseClient#Connection :
> >
> >       builder.setCellBlockCodecClass(this.codec
> > .getClass().getCanonicalName());
> >
> >       if (this.compressor != null) {
> >
> >         builder.setCellBlockCompressorClass(this.compressor
> > .getClass().getCanonicalName());
> >
> >       }
> > Cheers
> >
> > On Wed, Mar 27, 2013 at 2:52 PM, Asaf Mesika <[EMAIL PROTECTED]>
> > wrote:
> >
> > > Correct me if I'm wrong, but you the drop is expected, according to the
> > > following math:
> > >
> > > If you have a Put, for a specific rowkey, and that rowkey weighs 100
> > bytes,
> > > then if you have 20 columns you should add the following size to the
> > > combined size of the columns:
> > > 20 x (100 bytes) = 2000 bytes
> > > So the size of the Put sent to HBase should be:
> > > 1500 bytes (sum of all column qualifier size) + 20x100 (size of row
> key).
> > >
> > > I add this 20x100 since, for each column qualifier, the Put object is
> > > adding another KeyValue member object, which duplicates the RowKey.
> > > See here (take from Put.java, v0.94.3 I think):
> > >
> > >   public Put add(byte [] family, byte [] qualifier, long ts, byte []
> > value)
> > > {
> > >
> > >     List<KeyValue> list = getKeyValueList(family);
> > >
> > >     KeyValue kv = createPutKeyValue(family, qualifier, ts, value);
> > >
> > >     list.add(kv);
> > >
> > >     familyMap.put(kv.getFamily(), list);
> > >
> > >     return this;
> > >   }
> > >
> > > Each KeyValue also add more information which should also be taken into
> > > account per Column Qualifier:
> > > * KeyValue overhead - I think 2 longs
> > > * Column Family length
> > > * Timestamp - 1 long
> > >
> > > I wrote a class to calculate a rough size of the HBase List<Put> size
> > sent
> > > to HBase, so I can calculate the throughput:
> > >
> > > public class HBaseUtils {
> > >
> > >     public static long getSize(List<? extends Row> actions) {
> > >         long size = 0;
> > >         for (Row row : actions) {
> > >             size += getSize(row);
> > >         }
> > >         return size;
> > >     }
> > >
> > >     public static long getSize(Row row) {
> > >         if (row instanceof Increment) {
> > >             return calcSizeIncrement( (Increment) row);
> > >         } else if (row instanceof Put) {
> > >             return calcSizePut((Put) row);
> > >         } else {
> > >             throw new IllegalArgumentException("Can't calculate size
> for
> > > Row type "+row.getClass());
> > >         }
> > >     }
> > >
> > >     private static long calcSizePut(Put put) {
> > >         long size = 0;
> > >         size += put.getRow().length;
> > >
> > >         Map<byte[], List<KeyValue>> familyMap = put.getFamilyMap();
> > >         for (byte[] family : familyMap.keySet()) {
> > >             size += family.length;
> > >             List<KeyValue> kvs = familyMap.get(family);
> > >             for (KeyValue kv : kvs) {
> > >                 size += kv.getLength();
> > >             }
> > >         }
> > >         return size;
> > >
> > >     }
> > >
> > >     private static long calcSizeIncrement(Increment row) {
> > >         long size = 0;
> > >
> > >         size += row.getRow().length;
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB